If a problem only occurs randomly once in every N times on average, how many tests do I have to perform to be certain that it's now fixed?How can I be sure that rarely reproduced issue is fixed?How to keep track of Test Case versus Test PlanStress test mvc applicationImporting manual tests to TFSHow to apply boundary value analysis to a practically unlikely/impossible boundary?Need help in understanding the below Requirement?How many test cases required to achieve 100 % statement coverageLarge valid PDF files for testingHow to compare two files without looking at the contentsHow do you test translations are correct?Do you test methods that return queries from the database?

Is the decompression of compressed and encrypted data without decryption also theoretically impossible?

Why don’t airliners have temporary liveries?

When conversion from Integer to Single may lose precision

siunitx error: Invalid numerical input

How is it possible that Gollum speaks Westron?

Java guess the number

What risks are there when you clear your cookies instead of logging off?

How to retract the pitched idea from employer?

About the expansion of seq_set_split

Does the growth of home value benefit from compound interest?

How would a aircraft visually signal in distress?

Does the first version of Linux developed by Linus Torvalds have a GUI?

Question about JavaScript Math.random() and basic logic

Can you really not move between grapples/shoves?

Last survivors from different time periods living together

How can drunken, homicidal elves successfully conduct a wild hunt?

What can plausibly explain many of my very long and low-tech bridges?

After the loss of Challenger, why weren’t Galileo and Ulysses launched by Centaurs on expendable boosters?

Is any name of Vishnu Siva?

How Can I Tell The Difference Between Unmarked Sugar and Stevia?

Movie about a boy who was born old and grew young

What do we gain with higher order logics?

Proof that shortest path with negative cycles is NP hard

Etymology of 'calcit(r)are'?



If a problem only occurs randomly once in every N times on average, how many tests do I have to perform to be certain that it's now fixed?


How can I be sure that rarely reproduced issue is fixed?How to keep track of Test Case versus Test PlanStress test mvc applicationImporting manual tests to TFSHow to apply boundary value analysis to a practically unlikely/impossible boundary?Need help in understanding the below Requirement?How many test cases required to achieve 100 % statement coverageLarge valid PDF files for testingHow to compare two files without looking at the contentsHow do you test translations are correct?Do you test methods that return queries from the database?













38















I know that you can never be 100% sure, but is there a method to determine an appropriate number of tests?










share|improve this question









New contributor



Sam Hall is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.














  • 7





    @whoever voted to close - would you say this is primarily opinion based? or answerable with some math and probability?

    – trashpanda
    May 28 at 9:04






  • 7





    This is certainly not opinion based but related to statistical analysis. Although the question can be, and was, asked in math related SEs it is very relevant here and the context might be a bit different.

    – Rsf
    May 28 at 9:50






  • 2





    @JoãoFarias Of course, controlling said randomness can indeed be tricky. I like to look at the example of CHESS, which goes to great lengths to mock the scheduling algorithm of the OS to find multithreading bugs. Sometimes a statistical approach, while less satisfactory, can be more valuable from a business perspective.

    – Cort Ammon
    May 28 at 16:18






  • 1





    Possible duplicate of How can I be sure that rarely reproduced issue is fixed?

    – Kevin McKenzie
    May 28 at 17:51






  • 5





    As a developer, "this fails randomly" actually means "I haven't yet been able to pinpoint the set of circumstances that makes this fail, and I cannot spend more time investigating it". Likewise, "I've solved this random error" means "The error still happens, but you won't notice anymore because I've added logic to capture it and fix any wrong data or actions it caused".

    – walen
    May 29 at 8:43















38















I know that you can never be 100% sure, but is there a method to determine an appropriate number of tests?










share|improve this question









New contributor



Sam Hall is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.














  • 7





    @whoever voted to close - would you say this is primarily opinion based? or answerable with some math and probability?

    – trashpanda
    May 28 at 9:04






  • 7





    This is certainly not opinion based but related to statistical analysis. Although the question can be, and was, asked in math related SEs it is very relevant here and the context might be a bit different.

    – Rsf
    May 28 at 9:50






  • 2





    @JoãoFarias Of course, controlling said randomness can indeed be tricky. I like to look at the example of CHESS, which goes to great lengths to mock the scheduling algorithm of the OS to find multithreading bugs. Sometimes a statistical approach, while less satisfactory, can be more valuable from a business perspective.

    – Cort Ammon
    May 28 at 16:18






  • 1





    Possible duplicate of How can I be sure that rarely reproduced issue is fixed?

    – Kevin McKenzie
    May 28 at 17:51






  • 5





    As a developer, "this fails randomly" actually means "I haven't yet been able to pinpoint the set of circumstances that makes this fail, and I cannot spend more time investigating it". Likewise, "I've solved this random error" means "The error still happens, but you won't notice anymore because I've added logic to capture it and fix any wrong data or actions it caused".

    – walen
    May 29 at 8:43













38












38








38


7






I know that you can never be 100% sure, but is there a method to determine an appropriate number of tests?










share|improve this question









New contributor



Sam Hall is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I know that you can never be 100% sure, but is there a method to determine an appropriate number of tests?







manual-testing intermittent-failures






share|improve this question









New contributor



Sam Hall is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.










share|improve this question









New contributor



Sam Hall is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








share|improve this question




share|improve this question








edited May 28 at 8:51









jonrsharpe

18217




18217






New contributor



Sam Hall is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








asked May 28 at 7:03









Sam HallSam Hall

19324




19324




New contributor



Sam Hall is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




New contributor




Sam Hall is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









  • 7





    @whoever voted to close - would you say this is primarily opinion based? or answerable with some math and probability?

    – trashpanda
    May 28 at 9:04






  • 7





    This is certainly not opinion based but related to statistical analysis. Although the question can be, and was, asked in math related SEs it is very relevant here and the context might be a bit different.

    – Rsf
    May 28 at 9:50






  • 2





    @JoãoFarias Of course, controlling said randomness can indeed be tricky. I like to look at the example of CHESS, which goes to great lengths to mock the scheduling algorithm of the OS to find multithreading bugs. Sometimes a statistical approach, while less satisfactory, can be more valuable from a business perspective.

    – Cort Ammon
    May 28 at 16:18






  • 1





    Possible duplicate of How can I be sure that rarely reproduced issue is fixed?

    – Kevin McKenzie
    May 28 at 17:51






  • 5





    As a developer, "this fails randomly" actually means "I haven't yet been able to pinpoint the set of circumstances that makes this fail, and I cannot spend more time investigating it". Likewise, "I've solved this random error" means "The error still happens, but you won't notice anymore because I've added logic to capture it and fix any wrong data or actions it caused".

    – walen
    May 29 at 8:43












  • 7





    @whoever voted to close - would you say this is primarily opinion based? or answerable with some math and probability?

    – trashpanda
    May 28 at 9:04






  • 7





    This is certainly not opinion based but related to statistical analysis. Although the question can be, and was, asked in math related SEs it is very relevant here and the context might be a bit different.

    – Rsf
    May 28 at 9:50






  • 2





    @JoãoFarias Of course, controlling said randomness can indeed be tricky. I like to look at the example of CHESS, which goes to great lengths to mock the scheduling algorithm of the OS to find multithreading bugs. Sometimes a statistical approach, while less satisfactory, can be more valuable from a business perspective.

    – Cort Ammon
    May 28 at 16:18






  • 1





    Possible duplicate of How can I be sure that rarely reproduced issue is fixed?

    – Kevin McKenzie
    May 28 at 17:51






  • 5





    As a developer, "this fails randomly" actually means "I haven't yet been able to pinpoint the set of circumstances that makes this fail, and I cannot spend more time investigating it". Likewise, "I've solved this random error" means "The error still happens, but you won't notice anymore because I've added logic to capture it and fix any wrong data or actions it caused".

    – walen
    May 29 at 8:43







7




7





@whoever voted to close - would you say this is primarily opinion based? or answerable with some math and probability?

– trashpanda
May 28 at 9:04





@whoever voted to close - would you say this is primarily opinion based? or answerable with some math and probability?

– trashpanda
May 28 at 9:04




7




7





This is certainly not opinion based but related to statistical analysis. Although the question can be, and was, asked in math related SEs it is very relevant here and the context might be a bit different.

– Rsf
May 28 at 9:50





This is certainly not opinion based but related to statistical analysis. Although the question can be, and was, asked in math related SEs it is very relevant here and the context might be a bit different.

– Rsf
May 28 at 9:50




2




2





@JoãoFarias Of course, controlling said randomness can indeed be tricky. I like to look at the example of CHESS, which goes to great lengths to mock the scheduling algorithm of the OS to find multithreading bugs. Sometimes a statistical approach, while less satisfactory, can be more valuable from a business perspective.

– Cort Ammon
May 28 at 16:18





@JoãoFarias Of course, controlling said randomness can indeed be tricky. I like to look at the example of CHESS, which goes to great lengths to mock the scheduling algorithm of the OS to find multithreading bugs. Sometimes a statistical approach, while less satisfactory, can be more valuable from a business perspective.

– Cort Ammon
May 28 at 16:18




1




1





Possible duplicate of How can I be sure that rarely reproduced issue is fixed?

– Kevin McKenzie
May 28 at 17:51





Possible duplicate of How can I be sure that rarely reproduced issue is fixed?

– Kevin McKenzie
May 28 at 17:51




5




5





As a developer, "this fails randomly" actually means "I haven't yet been able to pinpoint the set of circumstances that makes this fail, and I cannot spend more time investigating it". Likewise, "I've solved this random error" means "The error still happens, but you won't notice anymore because I've added logic to capture it and fix any wrong data or actions it caused".

– walen
May 29 at 8:43





As a developer, "this fails randomly" actually means "I haven't yet been able to pinpoint the set of circumstances that makes this fail, and I cannot spend more time investigating it". Likewise, "I've solved this random error" means "The error still happens, but you won't notice anymore because I've added logic to capture it and fix any wrong data or actions it caused".

– walen
May 29 at 8:43










6 Answers
6






active

oldest

votes


















36














I'm going to take a different approach than statistics (though I think the other response answers your actual question more directly). Any time I've encountered "a problem that only happens some of the time" as either QA or a support role it's been an investigative exercise about narrowing down why the event happens irregularly or in what situations it occurs.



Some (but certainly not all) points of investigation may be:



  • Specific accounts or data.

  • Differences in hosts/environments the applications or services are running on.

  • Different versions of the application or service running on different hosts

  • Certain days, dates, times of day or time zones.

  • Certain users and their specific means of accessing the application (physical device, browser, network connection)

This sort of situation is where reproduction steps and other details from the person reporting the problem can be so valuable in resolving their issues. Telling a customer "your problem is fixed" when you're just making an educated guess can spiral in a negative direction if your experiment is based on incorrect assumptions. In my experience it's better to try to coach them about what information will help resolve their problem and how they can help you get it.






share|improve this answer


















  • 20





    Can I add a really obscure one that troubled one company I worked at? "Whether the logged-in user has an odd or an even number of characters in their username". (Which meant that the developer didn't have a problem, and the first test user did.)

    – Martin Bonner
    May 28 at 17:12











  • That's definitely an interesting one. It's always a good challenge when it gets to that granular of a difference.

    – Cherree
    May 28 at 18:09






  • 7





    Back before the turn of the century, we had an issue with phone calls counting double when transit calls (routed through the country but originating and terminating in other countries) for some specific pair of countries had happened on exactly 20 days of a calendar month. That one was completely deterministic but hell to track down. (Root cause was a buggy database driver fetching rows in batches of 20)

    – JollyJoker
    May 29 at 7:38






  • 1





    So much this; apart from hardware failures (and buggy RAM does exist) most problems are deterministic. Even a race-condition is just a fancy name for A occurs before B some of the time. If you have a flaky behavior, it means you have determined the conditions in which it happens in sufficient detail.

    – Matthieu M.
    May 29 at 11:22






  • 1





    Just another anecdotal example - we had a problem once that the customer reported happened sometimes and we couldn't reproduce. One amazing QA person we had managed to narrow it down to...network latency. She played with the network throttling options in Chrome (it's under the dev tools) and found that the problem was almost exclusively happening between some network speeds. On a fast network, you wouldn't see the problem, nor would you on a very slow network. It happened in a narrow band. It was still irregular but she found the optimum speeds where it happened around 80% of the time.

    – VLAZ
    May 30 at 11:22


















19














You must perform the same test equal number of times on both the unfixed version and the supposedly fixed version. You have to show that,



  1. The test fails once in every N times randomly, on the unfixed version.

  2. The same test passes every time, or at least fails less often, on the fixed version.

You have to show that the only difference between 1 and 2 is the "fix" itself, not any external or environmental factors.



If you only perform the test on the new, fixed version, it could very well be the case that the bug was caused by an unrelated environmental factor that simply doesn't exist in your test now.






share|improve this answer








New contributor



Double Vision Stout Fat Heavy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.














  • 9





    +1 for the principle of verifying that the old version still fails. It's way too easy for random bugs to randomly go away for days due to some external changes such as network speed.

    – jpa
    May 29 at 6:53











  • This is really nasty on less strict platforms like Javascript. Depending on how fast your computer process the scripts, race conditions of overriding functions can occur and the code can randomly call a different function depending on the network latency and particular script's load time.

    – Nelson
    May 29 at 8:58


















10














I suppose this answer could help you



You need to decide first at what probability you want to "detect" the problem.



This is a nice example to why theoretical knowledge is necessary even for testers.



The simplified version:



  • p is the probability for failure, 1/N in our case


  • then the probability for success is 1-p


  • and the probability to have N successful tries is (1-p)^N


  • so the probability to have N successful tries and and then a failure would be 1-(1-p)^N


  • extracting N and simplifying a bit assuming big enough N gives:


  • −log(1−p)⋅N

(*) "log" is sometimes referred to (for example in calculators) as ln(x), loge(x) or log(x)






share|improve this answer




















  • 5





    I would replace "probability" with "confidence". How confident you want to be? Note, that to be confident for 100%, you would need to execute infinite number of tests, because -log (1-x) goes to infinity when x goes to 1.

    – dzieciou
    May 28 at 18:20






  • 1





    And all this is based on assumption test execution are independent. For instance, the error does not accumulate and manifests as a result of accumulation of n executions.

    – dzieciou
    May 28 at 18:21











  • This is actually a valid point @Makyen, I edited the answer

    – Rsf
    May 29 at 8:07











  • This answer would be greatly improved by defining x, which just shows up in the last bullet point out of nowhere.

    – Gregor
    2 days ago


















9














While I agree with the other answers saying "dig deeper", to answer the actual math question in the title:



If the issue occurs completely at random with probability p, then the chance if it occurring at least once in n trials is 1-(1-p)^n. Setting this to x (your confidence that the issue has been fixed) and solving for n gives you



n = log(1-x)/log(1-p)



So for example, if your issue occurs 1 out of 4 times, and you want to be 95% sure it's fixed (meaning you'll incorrectly identify it as fixed 1 out of 20 times!!), then



p = 0.25
x = 0.95
n = log(0.05)/log(0.75) ≈ 10.4


so you'd need to run 11 trials




The difference between my answer and @emrys57's is that mine assumes you know the probability, while theirs assumes you know some initial sequence of results. Presumably they should both give the same answer with a large enough initial sequence.






share|improve this answer










New contributor



BlueRaja - Danny Pflughoeft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.


























    4














    A problem has been observed that sometimes, in testing, produces errors. We don’t actually know the probability that it will produce an error on any given test run, because we can only do a finite number of tests on the broken system. If 1 represents a test pass, and 0 represents a test fail, we might have a sequence of results from multiple tests that looks like this:



    011001010011


    Representing 6 passes and 6 fails in 12 tests. This gives us a guess Pf of the probability of a failure of a test for the broken system. We assume that this probability isn’t changing with time. However, there will be a large uncertainty in the actual value of Pf since we cannot do very many tests.



    Following these tests, we make a repair. We hope that this fixes the problem, but we’re not sure. We make more tests of the repaired system. If we have fixed the problem, the value of Pf measured after the repair should be 0. If we have not fixed the system, it should be the same as the value of Pf before the repair, unchanged Pf.



    If we run some tests after the repair and one fails, we immediately know the repair failed. The question is, if no tests fail after repair, does that mean the repair worked?



    Consider the case where we have results
    a zeroes (test fails)
    b ones (test passes)
    Including after repair: c ones (test passes)



    The number of ways of arranging the a + b initial results is



    Ntot = (a + b)! /(b! * a!)


    In the case where Pf does not change yet all the last C results are ones, we need to arrange b - c ones in the first a + b - c results. The number of different ways of arranging these first a + b - c results is



    Nsuc = (a + b - c)! / ( (b - c)! * a! )


    These patterns are those from all the Ntot possible patterns where the last c results are all ones.



    If the change didn’t actually repair the system, but really left its state the same, the results we observe are a random collection of zeroes and ones produced by chance, depending on the probability Pf that a single experiment will return a zero or one results. Given this, the chance that we will observe a pattern of a zeros and b ones with the last c results all ones is



    Cran = Nsuc/Ntot


    Or



    Cran = (a + b - c)! * b! * a! / ((a +b)! * (b - c)! * a!)


    Or



    Cran = (a + b - c)! * b! / ((a + b)! * (b - c)!)


    Cran is the chance that we observe c test passes after we make a repair to the system, out of a total of a fails and b passes, but in fact we have changed nothing, and we see a sequence of passes after repair by happenstance.



    As an example, with the pattern above before repair, and 6 ones (passes) after repair, Cran is slightly less than 5%. To reach a confidence of 99% that the repair has succeeded, you need 11 passes after repair.



    There's a spreadsheet that can be copied to make this computation. I hope I have it all right, it is 15 years since I last worked this out. And, back then, it took me 15 years to first find the answer.






    share|improve this answer










    New contributor



    emrys57 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.


























      0














      Let B mean "broken", F mean "fixed", E_k mean "error observed in k trials". You are saying that P(E_1|B)=1/N; the probability of seeing an error in a single observation, given that it's broken, is 1/N. Now, that itself is likely going to have some uncertainty, since likely your only way of measuring it will be by seeing how often it fails, and making an empirical estimate. However, if we take that as given, then applying Bayes' rule gives us that if our prior probability is P(B), then the posterior probability is P(~E_k|B)P(B)/P(~E_k).



      For P(~E_k|B), we have P(~E_k|B)= P(~E_1|B)^k = (1-P(E_1|B))^k = (1-1/N)^k = ((N-1)/N)^k. For large N and k, we can approximate that as e^(-k/N).



      For P(~E_k), we have P(~E_k) = P(~E_k|B)P(B)+P(~E_k|F)P(F). P(~E_k|F) =1 (If we've fixed it, we're guaranteed to see no errors). And P(F) is just 1-P(B).



      Plugging that all in, we have



      P(B|~E_k) ~= e^(-k/N)P(B)/(e^(-k/N)P(B)+1-P(B))



      This can be made easier to read by setting X = e^(-k/N), Y = P(B). Then we have



      XY/(XY+1-Y)



      or



      1-(Y-1)/(XY+1-Y)






      share|improve this answer























        Your Answer








        StackExchange.ready(function()
        var channelOptions =
        tags: "".split(" "),
        id: "244"
        ;
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function()
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled)
        StackExchange.using("snippets", function()
        createEditor();
        );

        else
        createEditor();

        );

        function createEditor()
        StackExchange.prepareEditor(
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: false,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        imageUploader:
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        ,
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        );



        );






        Sam Hall is a new contributor. Be nice, and check out our Code of Conduct.









        draft saved

        draft discarded


















        StackExchange.ready(
        function ()
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsqa.stackexchange.com%2fquestions%2f39365%2fif-a-problem-only-occurs-randomly-once-in-every-n-times-on-average-how-many-tes%23new-answer', 'question_page');

        );

        Post as a guest















        Required, but never shown

























        6 Answers
        6






        active

        oldest

        votes








        6 Answers
        6






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        36














        I'm going to take a different approach than statistics (though I think the other response answers your actual question more directly). Any time I've encountered "a problem that only happens some of the time" as either QA or a support role it's been an investigative exercise about narrowing down why the event happens irregularly or in what situations it occurs.



        Some (but certainly not all) points of investigation may be:



        • Specific accounts or data.

        • Differences in hosts/environments the applications or services are running on.

        • Different versions of the application or service running on different hosts

        • Certain days, dates, times of day or time zones.

        • Certain users and their specific means of accessing the application (physical device, browser, network connection)

        This sort of situation is where reproduction steps and other details from the person reporting the problem can be so valuable in resolving their issues. Telling a customer "your problem is fixed" when you're just making an educated guess can spiral in a negative direction if your experiment is based on incorrect assumptions. In my experience it's better to try to coach them about what information will help resolve their problem and how they can help you get it.






        share|improve this answer


















        • 20





          Can I add a really obscure one that troubled one company I worked at? "Whether the logged-in user has an odd or an even number of characters in their username". (Which meant that the developer didn't have a problem, and the first test user did.)

          – Martin Bonner
          May 28 at 17:12











        • That's definitely an interesting one. It's always a good challenge when it gets to that granular of a difference.

          – Cherree
          May 28 at 18:09






        • 7





          Back before the turn of the century, we had an issue with phone calls counting double when transit calls (routed through the country but originating and terminating in other countries) for some specific pair of countries had happened on exactly 20 days of a calendar month. That one was completely deterministic but hell to track down. (Root cause was a buggy database driver fetching rows in batches of 20)

          – JollyJoker
          May 29 at 7:38






        • 1





          So much this; apart from hardware failures (and buggy RAM does exist) most problems are deterministic. Even a race-condition is just a fancy name for A occurs before B some of the time. If you have a flaky behavior, it means you have determined the conditions in which it happens in sufficient detail.

          – Matthieu M.
          May 29 at 11:22






        • 1





          Just another anecdotal example - we had a problem once that the customer reported happened sometimes and we couldn't reproduce. One amazing QA person we had managed to narrow it down to...network latency. She played with the network throttling options in Chrome (it's under the dev tools) and found that the problem was almost exclusively happening between some network speeds. On a fast network, you wouldn't see the problem, nor would you on a very slow network. It happened in a narrow band. It was still irregular but she found the optimum speeds where it happened around 80% of the time.

          – VLAZ
          May 30 at 11:22















        36














        I'm going to take a different approach than statistics (though I think the other response answers your actual question more directly). Any time I've encountered "a problem that only happens some of the time" as either QA or a support role it's been an investigative exercise about narrowing down why the event happens irregularly or in what situations it occurs.



        Some (but certainly not all) points of investigation may be:



        • Specific accounts or data.

        • Differences in hosts/environments the applications or services are running on.

        • Different versions of the application or service running on different hosts

        • Certain days, dates, times of day or time zones.

        • Certain users and their specific means of accessing the application (physical device, browser, network connection)

        This sort of situation is where reproduction steps and other details from the person reporting the problem can be so valuable in resolving their issues. Telling a customer "your problem is fixed" when you're just making an educated guess can spiral in a negative direction if your experiment is based on incorrect assumptions. In my experience it's better to try to coach them about what information will help resolve their problem and how they can help you get it.






        share|improve this answer


















        • 20





          Can I add a really obscure one that troubled one company I worked at? "Whether the logged-in user has an odd or an even number of characters in their username". (Which meant that the developer didn't have a problem, and the first test user did.)

          – Martin Bonner
          May 28 at 17:12











        • That's definitely an interesting one. It's always a good challenge when it gets to that granular of a difference.

          – Cherree
          May 28 at 18:09






        • 7





          Back before the turn of the century, we had an issue with phone calls counting double when transit calls (routed through the country but originating and terminating in other countries) for some specific pair of countries had happened on exactly 20 days of a calendar month. That one was completely deterministic but hell to track down. (Root cause was a buggy database driver fetching rows in batches of 20)

          – JollyJoker
          May 29 at 7:38






        • 1





          So much this; apart from hardware failures (and buggy RAM does exist) most problems are deterministic. Even a race-condition is just a fancy name for A occurs before B some of the time. If you have a flaky behavior, it means you have determined the conditions in which it happens in sufficient detail.

          – Matthieu M.
          May 29 at 11:22






        • 1





          Just another anecdotal example - we had a problem once that the customer reported happened sometimes and we couldn't reproduce. One amazing QA person we had managed to narrow it down to...network latency. She played with the network throttling options in Chrome (it's under the dev tools) and found that the problem was almost exclusively happening between some network speeds. On a fast network, you wouldn't see the problem, nor would you on a very slow network. It happened in a narrow band. It was still irregular but she found the optimum speeds where it happened around 80% of the time.

          – VLAZ
          May 30 at 11:22













        36












        36








        36







        I'm going to take a different approach than statistics (though I think the other response answers your actual question more directly). Any time I've encountered "a problem that only happens some of the time" as either QA or a support role it's been an investigative exercise about narrowing down why the event happens irregularly or in what situations it occurs.



        Some (but certainly not all) points of investigation may be:



        • Specific accounts or data.

        • Differences in hosts/environments the applications or services are running on.

        • Different versions of the application or service running on different hosts

        • Certain days, dates, times of day or time zones.

        • Certain users and their specific means of accessing the application (physical device, browser, network connection)

        This sort of situation is where reproduction steps and other details from the person reporting the problem can be so valuable in resolving their issues. Telling a customer "your problem is fixed" when you're just making an educated guess can spiral in a negative direction if your experiment is based on incorrect assumptions. In my experience it's better to try to coach them about what information will help resolve their problem and how they can help you get it.






        share|improve this answer













        I'm going to take a different approach than statistics (though I think the other response answers your actual question more directly). Any time I've encountered "a problem that only happens some of the time" as either QA or a support role it's been an investigative exercise about narrowing down why the event happens irregularly or in what situations it occurs.



        Some (but certainly not all) points of investigation may be:



        • Specific accounts or data.

        • Differences in hosts/environments the applications or services are running on.

        • Different versions of the application or service running on different hosts

        • Certain days, dates, times of day or time zones.

        • Certain users and their specific means of accessing the application (physical device, browser, network connection)

        This sort of situation is where reproduction steps and other details from the person reporting the problem can be so valuable in resolving their issues. Telling a customer "your problem is fixed" when you're just making an educated guess can spiral in a negative direction if your experiment is based on incorrect assumptions. In my experience it's better to try to coach them about what information will help resolve their problem and how they can help you get it.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered May 28 at 13:32









        CherreeCherree

        1,00659




        1,00659







        • 20





          Can I add a really obscure one that troubled one company I worked at? "Whether the logged-in user has an odd or an even number of characters in their username". (Which meant that the developer didn't have a problem, and the first test user did.)

          – Martin Bonner
          May 28 at 17:12











        • That's definitely an interesting one. It's always a good challenge when it gets to that granular of a difference.

          – Cherree
          May 28 at 18:09






        • 7





          Back before the turn of the century, we had an issue with phone calls counting double when transit calls (routed through the country but originating and terminating in other countries) for some specific pair of countries had happened on exactly 20 days of a calendar month. That one was completely deterministic but hell to track down. (Root cause was a buggy database driver fetching rows in batches of 20)

          – JollyJoker
          May 29 at 7:38






        • 1





          So much this; apart from hardware failures (and buggy RAM does exist) most problems are deterministic. Even a race-condition is just a fancy name for A occurs before B some of the time. If you have a flaky behavior, it means you have determined the conditions in which it happens in sufficient detail.

          – Matthieu M.
          May 29 at 11:22






        • 1





          Just another anecdotal example - we had a problem once that the customer reported happened sometimes and we couldn't reproduce. One amazing QA person we had managed to narrow it down to...network latency. She played with the network throttling options in Chrome (it's under the dev tools) and found that the problem was almost exclusively happening between some network speeds. On a fast network, you wouldn't see the problem, nor would you on a very slow network. It happened in a narrow band. It was still irregular but she found the optimum speeds where it happened around 80% of the time.

          – VLAZ
          May 30 at 11:22












        • 20





          Can I add a really obscure one that troubled one company I worked at? "Whether the logged-in user has an odd or an even number of characters in their username". (Which meant that the developer didn't have a problem, and the first test user did.)

          – Martin Bonner
          May 28 at 17:12











        • That's definitely an interesting one. It's always a good challenge when it gets to that granular of a difference.

          – Cherree
          May 28 at 18:09






        • 7





          Back before the turn of the century, we had an issue with phone calls counting double when transit calls (routed through the country but originating and terminating in other countries) for some specific pair of countries had happened on exactly 20 days of a calendar month. That one was completely deterministic but hell to track down. (Root cause was a buggy database driver fetching rows in batches of 20)

          – JollyJoker
          May 29 at 7:38






        • 1





          So much this; apart from hardware failures (and buggy RAM does exist) most problems are deterministic. Even a race-condition is just a fancy name for A occurs before B some of the time. If you have a flaky behavior, it means you have determined the conditions in which it happens in sufficient detail.

          – Matthieu M.
          May 29 at 11:22






        • 1





          Just another anecdotal example - we had a problem once that the customer reported happened sometimes and we couldn't reproduce. One amazing QA person we had managed to narrow it down to...network latency. She played with the network throttling options in Chrome (it's under the dev tools) and found that the problem was almost exclusively happening between some network speeds. On a fast network, you wouldn't see the problem, nor would you on a very slow network. It happened in a narrow band. It was still irregular but she found the optimum speeds where it happened around 80% of the time.

          – VLAZ
          May 30 at 11:22







        20




        20





        Can I add a really obscure one that troubled one company I worked at? "Whether the logged-in user has an odd or an even number of characters in their username". (Which meant that the developer didn't have a problem, and the first test user did.)

        – Martin Bonner
        May 28 at 17:12





        Can I add a really obscure one that troubled one company I worked at? "Whether the logged-in user has an odd or an even number of characters in their username". (Which meant that the developer didn't have a problem, and the first test user did.)

        – Martin Bonner
        May 28 at 17:12













        That's definitely an interesting one. It's always a good challenge when it gets to that granular of a difference.

        – Cherree
        May 28 at 18:09





        That's definitely an interesting one. It's always a good challenge when it gets to that granular of a difference.

        – Cherree
        May 28 at 18:09




        7




        7





        Back before the turn of the century, we had an issue with phone calls counting double when transit calls (routed through the country but originating and terminating in other countries) for some specific pair of countries had happened on exactly 20 days of a calendar month. That one was completely deterministic but hell to track down. (Root cause was a buggy database driver fetching rows in batches of 20)

        – JollyJoker
        May 29 at 7:38





        Back before the turn of the century, we had an issue with phone calls counting double when transit calls (routed through the country but originating and terminating in other countries) for some specific pair of countries had happened on exactly 20 days of a calendar month. That one was completely deterministic but hell to track down. (Root cause was a buggy database driver fetching rows in batches of 20)

        – JollyJoker
        May 29 at 7:38




        1




        1





        So much this; apart from hardware failures (and buggy RAM does exist) most problems are deterministic. Even a race-condition is just a fancy name for A occurs before B some of the time. If you have a flaky behavior, it means you have determined the conditions in which it happens in sufficient detail.

        – Matthieu M.
        May 29 at 11:22





        So much this; apart from hardware failures (and buggy RAM does exist) most problems are deterministic. Even a race-condition is just a fancy name for A occurs before B some of the time. If you have a flaky behavior, it means you have determined the conditions in which it happens in sufficient detail.

        – Matthieu M.
        May 29 at 11:22




        1




        1





        Just another anecdotal example - we had a problem once that the customer reported happened sometimes and we couldn't reproduce. One amazing QA person we had managed to narrow it down to...network latency. She played with the network throttling options in Chrome (it's under the dev tools) and found that the problem was almost exclusively happening between some network speeds. On a fast network, you wouldn't see the problem, nor would you on a very slow network. It happened in a narrow band. It was still irregular but she found the optimum speeds where it happened around 80% of the time.

        – VLAZ
        May 30 at 11:22





        Just another anecdotal example - we had a problem once that the customer reported happened sometimes and we couldn't reproduce. One amazing QA person we had managed to narrow it down to...network latency. She played with the network throttling options in Chrome (it's under the dev tools) and found that the problem was almost exclusively happening between some network speeds. On a fast network, you wouldn't see the problem, nor would you on a very slow network. It happened in a narrow band. It was still irregular but she found the optimum speeds where it happened around 80% of the time.

        – VLAZ
        May 30 at 11:22











        19














        You must perform the same test equal number of times on both the unfixed version and the supposedly fixed version. You have to show that,



        1. The test fails once in every N times randomly, on the unfixed version.

        2. The same test passes every time, or at least fails less often, on the fixed version.

        You have to show that the only difference between 1 and 2 is the "fix" itself, not any external or environmental factors.



        If you only perform the test on the new, fixed version, it could very well be the case that the bug was caused by an unrelated environmental factor that simply doesn't exist in your test now.






        share|improve this answer








        New contributor



        Double Vision Stout Fat Heavy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.














        • 9





          +1 for the principle of verifying that the old version still fails. It's way too easy for random bugs to randomly go away for days due to some external changes such as network speed.

          – jpa
          May 29 at 6:53











        • This is really nasty on less strict platforms like Javascript. Depending on how fast your computer process the scripts, race conditions of overriding functions can occur and the code can randomly call a different function depending on the network latency and particular script's load time.

          – Nelson
          May 29 at 8:58















        19














        You must perform the same test equal number of times on both the unfixed version and the supposedly fixed version. You have to show that,



        1. The test fails once in every N times randomly, on the unfixed version.

        2. The same test passes every time, or at least fails less often, on the fixed version.

        You have to show that the only difference between 1 and 2 is the "fix" itself, not any external or environmental factors.



        If you only perform the test on the new, fixed version, it could very well be the case that the bug was caused by an unrelated environmental factor that simply doesn't exist in your test now.






        share|improve this answer








        New contributor



        Double Vision Stout Fat Heavy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.














        • 9





          +1 for the principle of verifying that the old version still fails. It's way too easy for random bugs to randomly go away for days due to some external changes such as network speed.

          – jpa
          May 29 at 6:53











        • This is really nasty on less strict platforms like Javascript. Depending on how fast your computer process the scripts, race conditions of overriding functions can occur and the code can randomly call a different function depending on the network latency and particular script's load time.

          – Nelson
          May 29 at 8:58













        19












        19








        19







        You must perform the same test equal number of times on both the unfixed version and the supposedly fixed version. You have to show that,



        1. The test fails once in every N times randomly, on the unfixed version.

        2. The same test passes every time, or at least fails less often, on the fixed version.

        You have to show that the only difference between 1 and 2 is the "fix" itself, not any external or environmental factors.



        If you only perform the test on the new, fixed version, it could very well be the case that the bug was caused by an unrelated environmental factor that simply doesn't exist in your test now.






        share|improve this answer








        New contributor



        Double Vision Stout Fat Heavy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        You must perform the same test equal number of times on both the unfixed version and the supposedly fixed version. You have to show that,



        1. The test fails once in every N times randomly, on the unfixed version.

        2. The same test passes every time, or at least fails less often, on the fixed version.

        You have to show that the only difference between 1 and 2 is the "fix" itself, not any external or environmental factors.



        If you only perform the test on the new, fixed version, it could very well be the case that the bug was caused by an unrelated environmental factor that simply doesn't exist in your test now.







        share|improve this answer








        New contributor



        Double Vision Stout Fat Heavy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.








        share|improve this answer



        share|improve this answer






        New contributor



        Double Vision Stout Fat Heavy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.








        answered May 29 at 1:12









        Double Vision Stout Fat HeavyDouble Vision Stout Fat Heavy

        1912




        1912




        New contributor



        Double Vision Stout Fat Heavy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.




        New contributor




        Double Vision Stout Fat Heavy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        • 9





          +1 for the principle of verifying that the old version still fails. It's way too easy for random bugs to randomly go away for days due to some external changes such as network speed.

          – jpa
          May 29 at 6:53











        • This is really nasty on less strict platforms like Javascript. Depending on how fast your computer process the scripts, race conditions of overriding functions can occur and the code can randomly call a different function depending on the network latency and particular script's load time.

          – Nelson
          May 29 at 8:58












        • 9





          +1 for the principle of verifying that the old version still fails. It's way too easy for random bugs to randomly go away for days due to some external changes such as network speed.

          – jpa
          May 29 at 6:53











        • This is really nasty on less strict platforms like Javascript. Depending on how fast your computer process the scripts, race conditions of overriding functions can occur and the code can randomly call a different function depending on the network latency and particular script's load time.

          – Nelson
          May 29 at 8:58







        9




        9





        +1 for the principle of verifying that the old version still fails. It's way too easy for random bugs to randomly go away for days due to some external changes such as network speed.

        – jpa
        May 29 at 6:53





        +1 for the principle of verifying that the old version still fails. It's way too easy for random bugs to randomly go away for days due to some external changes such as network speed.

        – jpa
        May 29 at 6:53













        This is really nasty on less strict platforms like Javascript. Depending on how fast your computer process the scripts, race conditions of overriding functions can occur and the code can randomly call a different function depending on the network latency and particular script's load time.

        – Nelson
        May 29 at 8:58





        This is really nasty on less strict platforms like Javascript. Depending on how fast your computer process the scripts, race conditions of overriding functions can occur and the code can randomly call a different function depending on the network latency and particular script's load time.

        – Nelson
        May 29 at 8:58











        10














        I suppose this answer could help you



        You need to decide first at what probability you want to "detect" the problem.



        This is a nice example to why theoretical knowledge is necessary even for testers.



        The simplified version:



        • p is the probability for failure, 1/N in our case


        • then the probability for success is 1-p


        • and the probability to have N successful tries is (1-p)^N


        • so the probability to have N successful tries and and then a failure would be 1-(1-p)^N


        • extracting N and simplifying a bit assuming big enough N gives:


        • −log(1−p)⋅N

        (*) "log" is sometimes referred to (for example in calculators) as ln(x), loge(x) or log(x)






        share|improve this answer




















        • 5





          I would replace "probability" with "confidence". How confident you want to be? Note, that to be confident for 100%, you would need to execute infinite number of tests, because -log (1-x) goes to infinity when x goes to 1.

          – dzieciou
          May 28 at 18:20






        • 1





          And all this is based on assumption test execution are independent. For instance, the error does not accumulate and manifests as a result of accumulation of n executions.

          – dzieciou
          May 28 at 18:21











        • This is actually a valid point @Makyen, I edited the answer

          – Rsf
          May 29 at 8:07











        • This answer would be greatly improved by defining x, which just shows up in the last bullet point out of nowhere.

          – Gregor
          2 days ago















        10














        I suppose this answer could help you



        You need to decide first at what probability you want to "detect" the problem.



        This is a nice example to why theoretical knowledge is necessary even for testers.



        The simplified version:



        • p is the probability for failure, 1/N in our case


        • then the probability for success is 1-p


        • and the probability to have N successful tries is (1-p)^N


        • so the probability to have N successful tries and and then a failure would be 1-(1-p)^N


        • extracting N and simplifying a bit assuming big enough N gives:


        • −log(1−p)⋅N

        (*) "log" is sometimes referred to (for example in calculators) as ln(x), loge(x) or log(x)






        share|improve this answer




















        • 5





          I would replace "probability" with "confidence". How confident you want to be? Note, that to be confident for 100%, you would need to execute infinite number of tests, because -log (1-x) goes to infinity when x goes to 1.

          – dzieciou
          May 28 at 18:20






        • 1





          And all this is based on assumption test execution are independent. For instance, the error does not accumulate and manifests as a result of accumulation of n executions.

          – dzieciou
          May 28 at 18:21











        • This is actually a valid point @Makyen, I edited the answer

          – Rsf
          May 29 at 8:07











        • This answer would be greatly improved by defining x, which just shows up in the last bullet point out of nowhere.

          – Gregor
          2 days ago













        10












        10








        10







        I suppose this answer could help you



        You need to decide first at what probability you want to "detect" the problem.



        This is a nice example to why theoretical knowledge is necessary even for testers.



        The simplified version:



        • p is the probability for failure, 1/N in our case


        • then the probability for success is 1-p


        • and the probability to have N successful tries is (1-p)^N


        • so the probability to have N successful tries and and then a failure would be 1-(1-p)^N


        • extracting N and simplifying a bit assuming big enough N gives:


        • −log(1−p)⋅N

        (*) "log" is sometimes referred to (for example in calculators) as ln(x), loge(x) or log(x)






        share|improve this answer















        I suppose this answer could help you



        You need to decide first at what probability you want to "detect" the problem.



        This is a nice example to why theoretical knowledge is necessary even for testers.



        The simplified version:



        • p is the probability for failure, 1/N in our case


        • then the probability for success is 1-p


        • and the probability to have N successful tries is (1-p)^N


        • so the probability to have N successful tries and and then a failure would be 1-(1-p)^N


        • extracting N and simplifying a bit assuming big enough N gives:


        • −log(1−p)⋅N

        (*) "log" is sometimes referred to (for example in calculators) as ln(x), loge(x) or log(x)







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 2 days ago

























        answered May 28 at 8:27









        RsfRsf

        4,62811529




        4,62811529







        • 5





          I would replace "probability" with "confidence". How confident you want to be? Note, that to be confident for 100%, you would need to execute infinite number of tests, because -log (1-x) goes to infinity when x goes to 1.

          – dzieciou
          May 28 at 18:20






        • 1





          And all this is based on assumption test execution are independent. For instance, the error does not accumulate and manifests as a result of accumulation of n executions.

          – dzieciou
          May 28 at 18:21











        • This is actually a valid point @Makyen, I edited the answer

          – Rsf
          May 29 at 8:07











        • This answer would be greatly improved by defining x, which just shows up in the last bullet point out of nowhere.

          – Gregor
          2 days ago












        • 5





          I would replace "probability" with "confidence". How confident you want to be? Note, that to be confident for 100%, you would need to execute infinite number of tests, because -log (1-x) goes to infinity when x goes to 1.

          – dzieciou
          May 28 at 18:20






        • 1





          And all this is based on assumption test execution are independent. For instance, the error does not accumulate and manifests as a result of accumulation of n executions.

          – dzieciou
          May 28 at 18:21











        • This is actually a valid point @Makyen, I edited the answer

          – Rsf
          May 29 at 8:07











        • This answer would be greatly improved by defining x, which just shows up in the last bullet point out of nowhere.

          – Gregor
          2 days ago







        5




        5





        I would replace "probability" with "confidence". How confident you want to be? Note, that to be confident for 100%, you would need to execute infinite number of tests, because -log (1-x) goes to infinity when x goes to 1.

        – dzieciou
        May 28 at 18:20





        I would replace "probability" with "confidence". How confident you want to be? Note, that to be confident for 100%, you would need to execute infinite number of tests, because -log (1-x) goes to infinity when x goes to 1.

        – dzieciou
        May 28 at 18:20




        1




        1





        And all this is based on assumption test execution are independent. For instance, the error does not accumulate and manifests as a result of accumulation of n executions.

        – dzieciou
        May 28 at 18:21





        And all this is based on assumption test execution are independent. For instance, the error does not accumulate and manifests as a result of accumulation of n executions.

        – dzieciou
        May 28 at 18:21













        This is actually a valid point @Makyen, I edited the answer

        – Rsf
        May 29 at 8:07





        This is actually a valid point @Makyen, I edited the answer

        – Rsf
        May 29 at 8:07













        This answer would be greatly improved by defining x, which just shows up in the last bullet point out of nowhere.

        – Gregor
        2 days ago





        This answer would be greatly improved by defining x, which just shows up in the last bullet point out of nowhere.

        – Gregor
        2 days ago











        9














        While I agree with the other answers saying "dig deeper", to answer the actual math question in the title:



        If the issue occurs completely at random with probability p, then the chance if it occurring at least once in n trials is 1-(1-p)^n. Setting this to x (your confidence that the issue has been fixed) and solving for n gives you



        n = log(1-x)/log(1-p)



        So for example, if your issue occurs 1 out of 4 times, and you want to be 95% sure it's fixed (meaning you'll incorrectly identify it as fixed 1 out of 20 times!!), then



        p = 0.25
        x = 0.95
        n = log(0.05)/log(0.75) ≈ 10.4


        so you'd need to run 11 trials




        The difference between my answer and @emrys57's is that mine assumes you know the probability, while theirs assumes you know some initial sequence of results. Presumably they should both give the same answer with a large enough initial sequence.






        share|improve this answer










        New contributor



        BlueRaja - Danny Pflughoeft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.























          9














          While I agree with the other answers saying "dig deeper", to answer the actual math question in the title:



          If the issue occurs completely at random with probability p, then the chance if it occurring at least once in n trials is 1-(1-p)^n. Setting this to x (your confidence that the issue has been fixed) and solving for n gives you



          n = log(1-x)/log(1-p)



          So for example, if your issue occurs 1 out of 4 times, and you want to be 95% sure it's fixed (meaning you'll incorrectly identify it as fixed 1 out of 20 times!!), then



          p = 0.25
          x = 0.95
          n = log(0.05)/log(0.75) ≈ 10.4


          so you'd need to run 11 trials




          The difference between my answer and @emrys57's is that mine assumes you know the probability, while theirs assumes you know some initial sequence of results. Presumably they should both give the same answer with a large enough initial sequence.






          share|improve this answer










          New contributor



          BlueRaja - Danny Pflughoeft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





















            9












            9








            9







            While I agree with the other answers saying "dig deeper", to answer the actual math question in the title:



            If the issue occurs completely at random with probability p, then the chance if it occurring at least once in n trials is 1-(1-p)^n. Setting this to x (your confidence that the issue has been fixed) and solving for n gives you



            n = log(1-x)/log(1-p)



            So for example, if your issue occurs 1 out of 4 times, and you want to be 95% sure it's fixed (meaning you'll incorrectly identify it as fixed 1 out of 20 times!!), then



            p = 0.25
            x = 0.95
            n = log(0.05)/log(0.75) ≈ 10.4


            so you'd need to run 11 trials




            The difference between my answer and @emrys57's is that mine assumes you know the probability, while theirs assumes you know some initial sequence of results. Presumably they should both give the same answer with a large enough initial sequence.






            share|improve this answer










            New contributor



            BlueRaja - Danny Pflughoeft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.









            While I agree with the other answers saying "dig deeper", to answer the actual math question in the title:



            If the issue occurs completely at random with probability p, then the chance if it occurring at least once in n trials is 1-(1-p)^n. Setting this to x (your confidence that the issue has been fixed) and solving for n gives you



            n = log(1-x)/log(1-p)



            So for example, if your issue occurs 1 out of 4 times, and you want to be 95% sure it's fixed (meaning you'll incorrectly identify it as fixed 1 out of 20 times!!), then



            p = 0.25
            x = 0.95
            n = log(0.05)/log(0.75) ≈ 10.4


            so you'd need to run 11 trials




            The difference between my answer and @emrys57's is that mine assumes you know the probability, while theirs assumes you know some initial sequence of results. Presumably they should both give the same answer with a large enough initial sequence.







            share|improve this answer










            New contributor



            BlueRaja - Danny Pflughoeft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.








            share|improve this answer



            share|improve this answer








            edited May 28 at 20:41





















            New contributor



            BlueRaja - Danny Pflughoeft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.








            answered May 28 at 20:34









            BlueRaja - Danny PflughoeftBlueRaja - Danny Pflughoeft

            1913




            1913




            New contributor



            BlueRaja - Danny Pflughoeft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.




            New contributor




            BlueRaja - Danny Pflughoeft is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.























                4














                A problem has been observed that sometimes, in testing, produces errors. We don’t actually know the probability that it will produce an error on any given test run, because we can only do a finite number of tests on the broken system. If 1 represents a test pass, and 0 represents a test fail, we might have a sequence of results from multiple tests that looks like this:



                011001010011


                Representing 6 passes and 6 fails in 12 tests. This gives us a guess Pf of the probability of a failure of a test for the broken system. We assume that this probability isn’t changing with time. However, there will be a large uncertainty in the actual value of Pf since we cannot do very many tests.



                Following these tests, we make a repair. We hope that this fixes the problem, but we’re not sure. We make more tests of the repaired system. If we have fixed the problem, the value of Pf measured after the repair should be 0. If we have not fixed the system, it should be the same as the value of Pf before the repair, unchanged Pf.



                If we run some tests after the repair and one fails, we immediately know the repair failed. The question is, if no tests fail after repair, does that mean the repair worked?



                Consider the case where we have results
                a zeroes (test fails)
                b ones (test passes)
                Including after repair: c ones (test passes)



                The number of ways of arranging the a + b initial results is



                Ntot = (a + b)! /(b! * a!)


                In the case where Pf does not change yet all the last C results are ones, we need to arrange b - c ones in the first a + b - c results. The number of different ways of arranging these first a + b - c results is



                Nsuc = (a + b - c)! / ( (b - c)! * a! )


                These patterns are those from all the Ntot possible patterns where the last c results are all ones.



                If the change didn’t actually repair the system, but really left its state the same, the results we observe are a random collection of zeroes and ones produced by chance, depending on the probability Pf that a single experiment will return a zero or one results. Given this, the chance that we will observe a pattern of a zeros and b ones with the last c results all ones is



                Cran = Nsuc/Ntot


                Or



                Cran = (a + b - c)! * b! * a! / ((a +b)! * (b - c)! * a!)


                Or



                Cran = (a + b - c)! * b! / ((a + b)! * (b - c)!)


                Cran is the chance that we observe c test passes after we make a repair to the system, out of a total of a fails and b passes, but in fact we have changed nothing, and we see a sequence of passes after repair by happenstance.



                As an example, with the pattern above before repair, and 6 ones (passes) after repair, Cran is slightly less than 5%. To reach a confidence of 99% that the repair has succeeded, you need 11 passes after repair.



                There's a spreadsheet that can be copied to make this computation. I hope I have it all right, it is 15 years since I last worked this out. And, back then, it took me 15 years to first find the answer.






                share|improve this answer










                New contributor



                emrys57 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.























                  4














                  A problem has been observed that sometimes, in testing, produces errors. We don’t actually know the probability that it will produce an error on any given test run, because we can only do a finite number of tests on the broken system. If 1 represents a test pass, and 0 represents a test fail, we might have a sequence of results from multiple tests that looks like this:



                  011001010011


                  Representing 6 passes and 6 fails in 12 tests. This gives us a guess Pf of the probability of a failure of a test for the broken system. We assume that this probability isn’t changing with time. However, there will be a large uncertainty in the actual value of Pf since we cannot do very many tests.



                  Following these tests, we make a repair. We hope that this fixes the problem, but we’re not sure. We make more tests of the repaired system. If we have fixed the problem, the value of Pf measured after the repair should be 0. If we have not fixed the system, it should be the same as the value of Pf before the repair, unchanged Pf.



                  If we run some tests after the repair and one fails, we immediately know the repair failed. The question is, if no tests fail after repair, does that mean the repair worked?



                  Consider the case where we have results
                  a zeroes (test fails)
                  b ones (test passes)
                  Including after repair: c ones (test passes)



                  The number of ways of arranging the a + b initial results is



                  Ntot = (a + b)! /(b! * a!)


                  In the case where Pf does not change yet all the last C results are ones, we need to arrange b - c ones in the first a + b - c results. The number of different ways of arranging these first a + b - c results is



                  Nsuc = (a + b - c)! / ( (b - c)! * a! )


                  These patterns are those from all the Ntot possible patterns where the last c results are all ones.



                  If the change didn’t actually repair the system, but really left its state the same, the results we observe are a random collection of zeroes and ones produced by chance, depending on the probability Pf that a single experiment will return a zero or one results. Given this, the chance that we will observe a pattern of a zeros and b ones with the last c results all ones is



                  Cran = Nsuc/Ntot


                  Or



                  Cran = (a + b - c)! * b! * a! / ((a +b)! * (b - c)! * a!)


                  Or



                  Cran = (a + b - c)! * b! / ((a + b)! * (b - c)!)


                  Cran is the chance that we observe c test passes after we make a repair to the system, out of a total of a fails and b passes, but in fact we have changed nothing, and we see a sequence of passes after repair by happenstance.



                  As an example, with the pattern above before repair, and 6 ones (passes) after repair, Cran is slightly less than 5%. To reach a confidence of 99% that the repair has succeeded, you need 11 passes after repair.



                  There's a spreadsheet that can be copied to make this computation. I hope I have it all right, it is 15 years since I last worked this out. And, back then, it took me 15 years to first find the answer.






                  share|improve this answer










                  New contributor



                  emrys57 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.





















                    4












                    4








                    4







                    A problem has been observed that sometimes, in testing, produces errors. We don’t actually know the probability that it will produce an error on any given test run, because we can only do a finite number of tests on the broken system. If 1 represents a test pass, and 0 represents a test fail, we might have a sequence of results from multiple tests that looks like this:



                    011001010011


                    Representing 6 passes and 6 fails in 12 tests. This gives us a guess Pf of the probability of a failure of a test for the broken system. We assume that this probability isn’t changing with time. However, there will be a large uncertainty in the actual value of Pf since we cannot do very many tests.



                    Following these tests, we make a repair. We hope that this fixes the problem, but we’re not sure. We make more tests of the repaired system. If we have fixed the problem, the value of Pf measured after the repair should be 0. If we have not fixed the system, it should be the same as the value of Pf before the repair, unchanged Pf.



                    If we run some tests after the repair and one fails, we immediately know the repair failed. The question is, if no tests fail after repair, does that mean the repair worked?



                    Consider the case where we have results
                    a zeroes (test fails)
                    b ones (test passes)
                    Including after repair: c ones (test passes)



                    The number of ways of arranging the a + b initial results is



                    Ntot = (a + b)! /(b! * a!)


                    In the case where Pf does not change yet all the last C results are ones, we need to arrange b - c ones in the first a + b - c results. The number of different ways of arranging these first a + b - c results is



                    Nsuc = (a + b - c)! / ( (b - c)! * a! )


                    These patterns are those from all the Ntot possible patterns where the last c results are all ones.



                    If the change didn’t actually repair the system, but really left its state the same, the results we observe are a random collection of zeroes and ones produced by chance, depending on the probability Pf that a single experiment will return a zero or one results. Given this, the chance that we will observe a pattern of a zeros and b ones with the last c results all ones is



                    Cran = Nsuc/Ntot


                    Or



                    Cran = (a + b - c)! * b! * a! / ((a +b)! * (b - c)! * a!)


                    Or



                    Cran = (a + b - c)! * b! / ((a + b)! * (b - c)!)


                    Cran is the chance that we observe c test passes after we make a repair to the system, out of a total of a fails and b passes, but in fact we have changed nothing, and we see a sequence of passes after repair by happenstance.



                    As an example, with the pattern above before repair, and 6 ones (passes) after repair, Cran is slightly less than 5%. To reach a confidence of 99% that the repair has succeeded, you need 11 passes after repair.



                    There's a spreadsheet that can be copied to make this computation. I hope I have it all right, it is 15 years since I last worked this out. And, back then, it took me 15 years to first find the answer.






                    share|improve this answer










                    New contributor



                    emrys57 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.









                    A problem has been observed that sometimes, in testing, produces errors. We don’t actually know the probability that it will produce an error on any given test run, because we can only do a finite number of tests on the broken system. If 1 represents a test pass, and 0 represents a test fail, we might have a sequence of results from multiple tests that looks like this:



                    011001010011


                    Representing 6 passes and 6 fails in 12 tests. This gives us a guess Pf of the probability of a failure of a test for the broken system. We assume that this probability isn’t changing with time. However, there will be a large uncertainty in the actual value of Pf since we cannot do very many tests.



                    Following these tests, we make a repair. We hope that this fixes the problem, but we’re not sure. We make more tests of the repaired system. If we have fixed the problem, the value of Pf measured after the repair should be 0. If we have not fixed the system, it should be the same as the value of Pf before the repair, unchanged Pf.



                    If we run some tests after the repair and one fails, we immediately know the repair failed. The question is, if no tests fail after repair, does that mean the repair worked?



                    Consider the case where we have results
                    a zeroes (test fails)
                    b ones (test passes)
                    Including after repair: c ones (test passes)



                    The number of ways of arranging the a + b initial results is



                    Ntot = (a + b)! /(b! * a!)


                    In the case where Pf does not change yet all the last C results are ones, we need to arrange b - c ones in the first a + b - c results. The number of different ways of arranging these first a + b - c results is



                    Nsuc = (a + b - c)! / ( (b - c)! * a! )


                    These patterns are those from all the Ntot possible patterns where the last c results are all ones.



                    If the change didn’t actually repair the system, but really left its state the same, the results we observe are a random collection of zeroes and ones produced by chance, depending on the probability Pf that a single experiment will return a zero or one results. Given this, the chance that we will observe a pattern of a zeros and b ones with the last c results all ones is



                    Cran = Nsuc/Ntot


                    Or



                    Cran = (a + b - c)! * b! * a! / ((a +b)! * (b - c)! * a!)


                    Or



                    Cran = (a + b - c)! * b! / ((a + b)! * (b - c)!)


                    Cran is the chance that we observe c test passes after we make a repair to the system, out of a total of a fails and b passes, but in fact we have changed nothing, and we see a sequence of passes after repair by happenstance.



                    As an example, with the pattern above before repair, and 6 ones (passes) after repair, Cran is slightly less than 5%. To reach a confidence of 99% that the repair has succeeded, you need 11 passes after repair.



                    There's a spreadsheet that can be copied to make this computation. I hope I have it all right, it is 15 years since I last worked this out. And, back then, it took me 15 years to first find the answer.







                    share|improve this answer










                    New contributor



                    emrys57 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.








                    share|improve this answer



                    share|improve this answer








                    edited May 28 at 17:44





















                    New contributor



                    emrys57 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.








                    answered May 28 at 17:05









                    emrys57emrys57

                    1412




                    1412




                    New contributor



                    emrys57 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.




                    New contributor




                    emrys57 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.























                        0














                        Let B mean "broken", F mean "fixed", E_k mean "error observed in k trials". You are saying that P(E_1|B)=1/N; the probability of seeing an error in a single observation, given that it's broken, is 1/N. Now, that itself is likely going to have some uncertainty, since likely your only way of measuring it will be by seeing how often it fails, and making an empirical estimate. However, if we take that as given, then applying Bayes' rule gives us that if our prior probability is P(B), then the posterior probability is P(~E_k|B)P(B)/P(~E_k).



                        For P(~E_k|B), we have P(~E_k|B)= P(~E_1|B)^k = (1-P(E_1|B))^k = (1-1/N)^k = ((N-1)/N)^k. For large N and k, we can approximate that as e^(-k/N).



                        For P(~E_k), we have P(~E_k) = P(~E_k|B)P(B)+P(~E_k|F)P(F). P(~E_k|F) =1 (If we've fixed it, we're guaranteed to see no errors). And P(F) is just 1-P(B).



                        Plugging that all in, we have



                        P(B|~E_k) ~= e^(-k/N)P(B)/(e^(-k/N)P(B)+1-P(B))



                        This can be made easier to read by setting X = e^(-k/N), Y = P(B). Then we have



                        XY/(XY+1-Y)



                        or



                        1-(Y-1)/(XY+1-Y)






                        share|improve this answer



























                          0














                          Let B mean "broken", F mean "fixed", E_k mean "error observed in k trials". You are saying that P(E_1|B)=1/N; the probability of seeing an error in a single observation, given that it's broken, is 1/N. Now, that itself is likely going to have some uncertainty, since likely your only way of measuring it will be by seeing how often it fails, and making an empirical estimate. However, if we take that as given, then applying Bayes' rule gives us that if our prior probability is P(B), then the posterior probability is P(~E_k|B)P(B)/P(~E_k).



                          For P(~E_k|B), we have P(~E_k|B)= P(~E_1|B)^k = (1-P(E_1|B))^k = (1-1/N)^k = ((N-1)/N)^k. For large N and k, we can approximate that as e^(-k/N).



                          For P(~E_k), we have P(~E_k) = P(~E_k|B)P(B)+P(~E_k|F)P(F). P(~E_k|F) =1 (If we've fixed it, we're guaranteed to see no errors). And P(F) is just 1-P(B).



                          Plugging that all in, we have



                          P(B|~E_k) ~= e^(-k/N)P(B)/(e^(-k/N)P(B)+1-P(B))



                          This can be made easier to read by setting X = e^(-k/N), Y = P(B). Then we have



                          XY/(XY+1-Y)



                          or



                          1-(Y-1)/(XY+1-Y)






                          share|improve this answer

























                            0












                            0








                            0







                            Let B mean "broken", F mean "fixed", E_k mean "error observed in k trials". You are saying that P(E_1|B)=1/N; the probability of seeing an error in a single observation, given that it's broken, is 1/N. Now, that itself is likely going to have some uncertainty, since likely your only way of measuring it will be by seeing how often it fails, and making an empirical estimate. However, if we take that as given, then applying Bayes' rule gives us that if our prior probability is P(B), then the posterior probability is P(~E_k|B)P(B)/P(~E_k).



                            For P(~E_k|B), we have P(~E_k|B)= P(~E_1|B)^k = (1-P(E_1|B))^k = (1-1/N)^k = ((N-1)/N)^k. For large N and k, we can approximate that as e^(-k/N).



                            For P(~E_k), we have P(~E_k) = P(~E_k|B)P(B)+P(~E_k|F)P(F). P(~E_k|F) =1 (If we've fixed it, we're guaranteed to see no errors). And P(F) is just 1-P(B).



                            Plugging that all in, we have



                            P(B|~E_k) ~= e^(-k/N)P(B)/(e^(-k/N)P(B)+1-P(B))



                            This can be made easier to read by setting X = e^(-k/N), Y = P(B). Then we have



                            XY/(XY+1-Y)



                            or



                            1-(Y-1)/(XY+1-Y)






                            share|improve this answer













                            Let B mean "broken", F mean "fixed", E_k mean "error observed in k trials". You are saying that P(E_1|B)=1/N; the probability of seeing an error in a single observation, given that it's broken, is 1/N. Now, that itself is likely going to have some uncertainty, since likely your only way of measuring it will be by seeing how often it fails, and making an empirical estimate. However, if we take that as given, then applying Bayes' rule gives us that if our prior probability is P(B), then the posterior probability is P(~E_k|B)P(B)/P(~E_k).



                            For P(~E_k|B), we have P(~E_k|B)= P(~E_1|B)^k = (1-P(E_1|B))^k = (1-1/N)^k = ((N-1)/N)^k. For large N and k, we can approximate that as e^(-k/N).



                            For P(~E_k), we have P(~E_k) = P(~E_k|B)P(B)+P(~E_k|F)P(F). P(~E_k|F) =1 (If we've fixed it, we're guaranteed to see no errors). And P(F) is just 1-P(B).



                            Plugging that all in, we have



                            P(B|~E_k) ~= e^(-k/N)P(B)/(e^(-k/N)P(B)+1-P(B))



                            This can be made easier to read by setting X = e^(-k/N), Y = P(B). Then we have



                            XY/(XY+1-Y)



                            or



                            1-(Y-1)/(XY+1-Y)







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered May 28 at 20:56









                            AcccumulationAcccumulation

                            1811




                            1811




















                                Sam Hall is a new contributor. Be nice, and check out our Code of Conduct.









                                draft saved

                                draft discarded


















                                Sam Hall is a new contributor. Be nice, and check out our Code of Conduct.












                                Sam Hall is a new contributor. Be nice, and check out our Code of Conduct.











                                Sam Hall is a new contributor. Be nice, and check out our Code of Conduct.














                                Thanks for contributing an answer to Software Quality Assurance & Testing Stack Exchange!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid


                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.

                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function ()
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsqa.stackexchange.com%2fquestions%2f39365%2fif-a-problem-only-occurs-randomly-once-in-every-n-times-on-average-how-many-tes%23new-answer', 'question_page');

                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

                                Circuit construction for execution of conditional statements using least significant bitHow are two different registers being used as “control”?How exactly is the stated composite state of the two registers being produced using the $R_zz$ controlled rotations?Efficiently performing controlled rotations in HHLWould this quantum algorithm implementation work?How to prepare a superposed states of odd integers from $1$ to $sqrtN$?Why is this implementation of the order finding algorithm not working?Circuit construction for Hamiltonian simulationHow can I invert the least significant bit of a certain term of a superposed state?Implementing an oracleImplementing a controlled sum operation

                                Magento 2 “No Payment Methods” in Admin New OrderHow to integrate Paypal Express Checkout with the Magento APIMagento 1.5 - Sales > Order > edit order and shipping methods disappearAuto Invoice Check/Money Order Payment methodAdd more simple payment methods?Shipping methods not showingWhat should I do to change payment methods if changing the configuration has no effects?1.9 - No Payment Methods showing upMy Payment Methods not Showing for downloadable/virtual product when checkout?Magento2 API to access internal payment methodHow to call an existing payment methods in the registration form?