What does Fisher mean by this quote?Where does this quote/poem come from?what does this +/- of “average” mean?What does p-value mean in R?Is the “hybrid” between Fisher and Neyman-Pearson approaches to statistical testing really an “incoherent mishmash”?What does “statistically insignificantly worse” mean?Is p-value essentially useless and dangerous to use?repeated measures design of Analysis of variance (ANOVA) with mean not statistically significant and Standard deviation significantWhat does the Hedges g mean in this meta-analysis?How to rigorously justify chosen false-positive/false-negative error rates and underlying cost ratio?How does Fisher calculate his $p$-value?

Can others monetize my project with GPLv3?

Vacuum collapse -- why do strong metals implode but glass doesn't?

A second course in the representation theory

Have only girls been born for a long time in this village?

What does it mean to have a subnet mask /32?

Why is 日本 read as "nihon" but not "nitsuhon"?

Do I have to learn /o/ or /ɔ/ separately?

How big would a Daddy Longlegs Spider need to be to kill an average Human?

How to persuade recruiters to send me the Job Description?

Why didn’t Doctor Strange stay in the original winning timeline?

Co-author responds to email by mistake cc'ing the EiC

(Why) May a Beit Din refuse to bury a body in order to coerce a man into giving a divorce?

What can I do to keep a threaded bolt from falling out of it’s slot

Is it appropriate for a prospective landlord to ask me for my credit report?

Overwrite file only if data

Potential new partner angry about first collaboration - how to answer email to close up this encounter in a graceful manner

How to setup a teletype to a unix shell

Sleeping solo in a double sleeping bag

What is the evidence on the danger of feeding whole blueberries and grapes to infants and toddlers?

Is "stainless" a bulk or a surface property of stainless steel?

Why don't we use Cavea-B

Why we don't have vaccination against all diseases which are caused by microbes?

!I!n!s!e!r!t! !n!b!e!t!w!e!e!n!

Should my "average" PC be able to discern the potential of encountering a gelatinous cube from subtle clues?



What does Fisher mean by this quote?


Where does this quote/poem come from?what does this +/- of “average” mean?What does p-value mean in R?Is the “hybrid” between Fisher and Neyman-Pearson approaches to statistical testing really an “incoherent mishmash”?What does “statistically insignificantly worse” mean?Is p-value essentially useless and dangerous to use?repeated measures design of Analysis of variance (ANOVA) with mean not statistically significant and Standard deviation significantWhat does the Hedges g mean in this meta-analysis?How to rigorously justify chosen false-positive/false-negative error rates and underlying cost ratio?How does Fisher calculate his $p$-value?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








13












$begingroup$


I keep seeing this famous quote everywhere, but fail to understand the emphasized part every single time.




A man who ‘rejects’ a hypothesis provisionally, as a matter of
habitual practice, when the significance is at the 1% level or higher,
will certainly be mistaken in not more than 1% of such decisions. For
when the hypothesis is correct he will be mistaken in just 1% of these
cases, and when it is incorrect he will never be mistaken in
rejection. [...] However, the calculation is absurdly academic, for in
fact no scientific worker has a fixed level of significance at which
from year to year, and in all circumstances, he rejects hypotheses; he
rather gives his mind to each particular case in the light of his
evidence and his ideas. It should not be forgotten that the cases
chosen for applying a test are manifestly a highly selected set, and
that the conditions of selection cannot be specified even for a single
worker; nor that in the argument used it would clearly be illegitimate
for one to choose the actual level of significance indicated by a
particular trial as though it were his lifelong habit to use just this
level.



(Statistical Methods and Scientific Inference, 1956, p. 42-45)




More specifically, I don't understand



  1. Why are the cases chosen for applying a test "highly selected"? Say you wonder if the average height of people within an area is less than 165cm, and decide to conduct a test. The standard procedure, as far as I know, is to draw random samples from the area and measure their height. How can this be highly selected?

  2. Suppose the cases are highly selected, but how is this related to the choice of the significance level? Consider again the example above, if your sampling method (what I suppose is what Fisher refers to as conditions of selection) is skewed and somehow favors tall people, then the whole research is ruined, and subjective determination of the significance level cannot save it.

  3. Actually, I don't even know what is "the actual level of significance indicated by a particular trial" referring to. Is it the $p$-value of that experiment, some preset value like the (in)famous 0.05, or something else?









share|cite|improve this question











$endgroup$




















    13












    $begingroup$


    I keep seeing this famous quote everywhere, but fail to understand the emphasized part every single time.




    A man who ‘rejects’ a hypothesis provisionally, as a matter of
    habitual practice, when the significance is at the 1% level or higher,
    will certainly be mistaken in not more than 1% of such decisions. For
    when the hypothesis is correct he will be mistaken in just 1% of these
    cases, and when it is incorrect he will never be mistaken in
    rejection. [...] However, the calculation is absurdly academic, for in
    fact no scientific worker has a fixed level of significance at which
    from year to year, and in all circumstances, he rejects hypotheses; he
    rather gives his mind to each particular case in the light of his
    evidence and his ideas. It should not be forgotten that the cases
    chosen for applying a test are manifestly a highly selected set, and
    that the conditions of selection cannot be specified even for a single
    worker; nor that in the argument used it would clearly be illegitimate
    for one to choose the actual level of significance indicated by a
    particular trial as though it were his lifelong habit to use just this
    level.



    (Statistical Methods and Scientific Inference, 1956, p. 42-45)




    More specifically, I don't understand



    1. Why are the cases chosen for applying a test "highly selected"? Say you wonder if the average height of people within an area is less than 165cm, and decide to conduct a test. The standard procedure, as far as I know, is to draw random samples from the area and measure their height. How can this be highly selected?

    2. Suppose the cases are highly selected, but how is this related to the choice of the significance level? Consider again the example above, if your sampling method (what I suppose is what Fisher refers to as conditions of selection) is skewed and somehow favors tall people, then the whole research is ruined, and subjective determination of the significance level cannot save it.

    3. Actually, I don't even know what is "the actual level of significance indicated by a particular trial" referring to. Is it the $p$-value of that experiment, some preset value like the (in)famous 0.05, or something else?









    share|cite|improve this question











    $endgroup$
















      13












      13








      13


      2



      $begingroup$


      I keep seeing this famous quote everywhere, but fail to understand the emphasized part every single time.




      A man who ‘rejects’ a hypothesis provisionally, as a matter of
      habitual practice, when the significance is at the 1% level or higher,
      will certainly be mistaken in not more than 1% of such decisions. For
      when the hypothesis is correct he will be mistaken in just 1% of these
      cases, and when it is incorrect he will never be mistaken in
      rejection. [...] However, the calculation is absurdly academic, for in
      fact no scientific worker has a fixed level of significance at which
      from year to year, and in all circumstances, he rejects hypotheses; he
      rather gives his mind to each particular case in the light of his
      evidence and his ideas. It should not be forgotten that the cases
      chosen for applying a test are manifestly a highly selected set, and
      that the conditions of selection cannot be specified even for a single
      worker; nor that in the argument used it would clearly be illegitimate
      for one to choose the actual level of significance indicated by a
      particular trial as though it were his lifelong habit to use just this
      level.



      (Statistical Methods and Scientific Inference, 1956, p. 42-45)




      More specifically, I don't understand



      1. Why are the cases chosen for applying a test "highly selected"? Say you wonder if the average height of people within an area is less than 165cm, and decide to conduct a test. The standard procedure, as far as I know, is to draw random samples from the area and measure their height. How can this be highly selected?

      2. Suppose the cases are highly selected, but how is this related to the choice of the significance level? Consider again the example above, if your sampling method (what I suppose is what Fisher refers to as conditions of selection) is skewed and somehow favors tall people, then the whole research is ruined, and subjective determination of the significance level cannot save it.

      3. Actually, I don't even know what is "the actual level of significance indicated by a particular trial" referring to. Is it the $p$-value of that experiment, some preset value like the (in)famous 0.05, or something else?









      share|cite|improve this question











      $endgroup$




      I keep seeing this famous quote everywhere, but fail to understand the emphasized part every single time.




      A man who ‘rejects’ a hypothesis provisionally, as a matter of
      habitual practice, when the significance is at the 1% level or higher,
      will certainly be mistaken in not more than 1% of such decisions. For
      when the hypothesis is correct he will be mistaken in just 1% of these
      cases, and when it is incorrect he will never be mistaken in
      rejection. [...] However, the calculation is absurdly academic, for in
      fact no scientific worker has a fixed level of significance at which
      from year to year, and in all circumstances, he rejects hypotheses; he
      rather gives his mind to each particular case in the light of his
      evidence and his ideas. It should not be forgotten that the cases
      chosen for applying a test are manifestly a highly selected set, and
      that the conditions of selection cannot be specified even for a single
      worker; nor that in the argument used it would clearly be illegitimate
      for one to choose the actual level of significance indicated by a
      particular trial as though it were his lifelong habit to use just this
      level.



      (Statistical Methods and Scientific Inference, 1956, p. 42-45)




      More specifically, I don't understand



      1. Why are the cases chosen for applying a test "highly selected"? Say you wonder if the average height of people within an area is less than 165cm, and decide to conduct a test. The standard procedure, as far as I know, is to draw random samples from the area and measure their height. How can this be highly selected?

      2. Suppose the cases are highly selected, but how is this related to the choice of the significance level? Consider again the example above, if your sampling method (what I suppose is what Fisher refers to as conditions of selection) is skewed and somehow favors tall people, then the whole research is ruined, and subjective determination of the significance level cannot save it.

      3. Actually, I don't even know what is "the actual level of significance indicated by a particular trial" referring to. Is it the $p$-value of that experiment, some preset value like the (in)famous 0.05, or something else?






      hypothesis-testing statistical-significance references experiment-design philosophical






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited Aug 8 at 6:07







      nalzok

















      asked Aug 8 at 6:01









      nalzoknalzok

      5585 silver badges17 bronze badges




      5585 silver badges17 bronze badges























          3 Answers
          3






          active

          oldest

          votes


















          15












          $begingroup$

          Here is my paraphrase of what Fisher says in your bolded quote. It should not be forgotten that quite a lot goes into choosing what hypothesis to test, so much so that even for a single person's decision, you could not specify it all. It also should not be forgotten that, for reasons stated above, you cannot decide on a particular trial's significance level always the same way, as a life long habit.



          1. A scientific hypothesis is selected as worth testing against many other competing hypotheses because of the biases of the researcher and their current state of knowledge. The hypotheses are "highly selected", not the samples; the hypotheses are the cases where we apply tests.


          2. The selection process of the hypotheses affects our significance level. If we are very sure of a hypothesis, that should make the significance level less stringent to satisfy ourselves. If we are unsure there is higher burden of proof. Other factors come into play as well, such as Type I error being worse than Type II in drug trials.


          3. I think when he says "indicated by" he simply means "chosen for". Yes, it is a preset value where we reject the hypothesis if the p-value is more extreme.






          share|cite|improve this answer









          $endgroup$






















            9












            $begingroup$

            The cases to which Fisher is referring are not observations but tests. That is, we select hypotheses to test. We don't just test random hypotheses - we base them on observation, the literature, scientific theories and so on.



            If you did test random hypotheses, then the number of times you are mistaken (in the first sentence of your quote) would be 1% (or whatever value is chosen). E.g. if we tested hypotheses like



            • The parity of a person's social security number is related to his IQ


            • Blond haired people throw Frisbees better than dark haired people


            • The time to getting an answer on Cross Validated is related to the number of syllables in your first name.


            And tested a whole bunch of them at 1%, we would reject the null about 1% of the time, and do so incorrectly. (Unless, of course, I am on to something with the above nonsense).



            I did once see an article about hair color and Frisbee throwing - and it found a difference! So, I call this sort of thing "Frisbee research".



            But the part I like the best from the quote is this:




            for in fact no scientific worker has a fixed level of significance at
            which from year to year, and in all circumstances, he rejects
            hypotheses; he rather gives his mind to each particular case in the
            light of his evidence and his ideas.




            He must be spinning in his grave.






            share|cite|improve this answer









            $endgroup$










            • 3




              $begingroup$
              This is a good answer, but I'm hesitated to view "Frisbee research" as bad things. As long as the methodologies are employed properly (taking into account the effect size, etc), I would consider the result plausible. I mean, it is believed that hair color has nothing to do with Frisbee throwing, but it was accepted that Earth is at the center of the universe until hundreds of years ago! We can criticize people for doing things wrong, but we shouldn't blame anyone for asking questions. That being said, I agree that some hypotheses are less useful than others, but still, they can be correct.
              $endgroup$
              – nalzok
              Aug 9 at 0:09











            • $begingroup$
              And they can also be type I errors.
              $endgroup$
              – Peter Flom
              Aug 9 at 10:54


















            2












            $begingroup$

            Trying to see the background of the quote I came to a version of the book (I am not sure which is which version) that has a slightly different quote



            https://archive.org/details/in.ernet.dli.2015.134555/page/n47




            The attempts that have been made to explain the cogency of tests of significance in scientific research, by reference to hypothetical frequencies of possible statements, based on them, being right or wrong, thus seem to miss the essential nature of such tests. A man who "rejects" a hypothesis provisionally, as a matter of habitual practice, when the significance is at the 1% level or higher, will certainly be mistaken in not
            more than 1% of such decisions. For when the hypothesis is correct he will be mistaken in just 1% of these cases, and when it is incorrect he will never be mistaken in rejection. This inequality statement can therefore be made. However, the calculation is
            absurdly academic, for in fact no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas. Further, the calculation is based solely on a hypothesis, which, in the light of the evidence, is often not believed to be true at all, so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance. To a practical man, also, who rejects a hypothesis, it is, of course, a matter of indifference with what probability he might be led to accept the hypothesis falsely, for in his case he is not accepting it.




            This seems to me a criticism to use the mathematical expression of rejection possibilities, type I errors, as some rigorous argument. Those expressions are often not a good expression for what is relevant and neither are they rigorous.




            1. Why are the cases chosen for applying a test "highly selected"?



              This seems to relate to the sentence




              Further, the calculation is based solely on a hypothesis, which, in
              the light of the evidence, is often not believed to be true at all




              We are not indifferent towards the hypothesis that is being tested, and often a hypothesis that is being tested is not believed to be true.




            2. how is this related to the choice of the significance level?



              This relates to




              so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance




              The p-value is just the frequency of making a mistake when the null-hypothesis is true. But the actual frequency of making a mistake will be different (lower).




            3. what is "the actual level of significance indicated by a particular trial" referring to



              I believe that this part refers to some sort of p-value hacking. Changing the significance level, alpha, after the observations have occurred in order to match the observed p-value, and pretend that this was the cut-off value all along from the beginning.







            share|cite|improve this answer











            $endgroup$

















              Your Answer








              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "65"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f421179%2fwhat-does-fisher-mean-by-this-quote%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              15












              $begingroup$

              Here is my paraphrase of what Fisher says in your bolded quote. It should not be forgotten that quite a lot goes into choosing what hypothesis to test, so much so that even for a single person's decision, you could not specify it all. It also should not be forgotten that, for reasons stated above, you cannot decide on a particular trial's significance level always the same way, as a life long habit.



              1. A scientific hypothesis is selected as worth testing against many other competing hypotheses because of the biases of the researcher and their current state of knowledge. The hypotheses are "highly selected", not the samples; the hypotheses are the cases where we apply tests.


              2. The selection process of the hypotheses affects our significance level. If we are very sure of a hypothesis, that should make the significance level less stringent to satisfy ourselves. If we are unsure there is higher burden of proof. Other factors come into play as well, such as Type I error being worse than Type II in drug trials.


              3. I think when he says "indicated by" he simply means "chosen for". Yes, it is a preset value where we reject the hypothesis if the p-value is more extreme.






              share|cite|improve this answer









              $endgroup$



















                15












                $begingroup$

                Here is my paraphrase of what Fisher says in your bolded quote. It should not be forgotten that quite a lot goes into choosing what hypothesis to test, so much so that even for a single person's decision, you could not specify it all. It also should not be forgotten that, for reasons stated above, you cannot decide on a particular trial's significance level always the same way, as a life long habit.



                1. A scientific hypothesis is selected as worth testing against many other competing hypotheses because of the biases of the researcher and their current state of knowledge. The hypotheses are "highly selected", not the samples; the hypotheses are the cases where we apply tests.


                2. The selection process of the hypotheses affects our significance level. If we are very sure of a hypothesis, that should make the significance level less stringent to satisfy ourselves. If we are unsure there is higher burden of proof. Other factors come into play as well, such as Type I error being worse than Type II in drug trials.


                3. I think when he says "indicated by" he simply means "chosen for". Yes, it is a preset value where we reject the hypothesis if the p-value is more extreme.






                share|cite|improve this answer









                $endgroup$

















                  15












                  15








                  15





                  $begingroup$

                  Here is my paraphrase of what Fisher says in your bolded quote. It should not be forgotten that quite a lot goes into choosing what hypothesis to test, so much so that even for a single person's decision, you could not specify it all. It also should not be forgotten that, for reasons stated above, you cannot decide on a particular trial's significance level always the same way, as a life long habit.



                  1. A scientific hypothesis is selected as worth testing against many other competing hypotheses because of the biases of the researcher and their current state of knowledge. The hypotheses are "highly selected", not the samples; the hypotheses are the cases where we apply tests.


                  2. The selection process of the hypotheses affects our significance level. If we are very sure of a hypothesis, that should make the significance level less stringent to satisfy ourselves. If we are unsure there is higher burden of proof. Other factors come into play as well, such as Type I error being worse than Type II in drug trials.


                  3. I think when he says "indicated by" he simply means "chosen for". Yes, it is a preset value where we reject the hypothesis if the p-value is more extreme.






                  share|cite|improve this answer









                  $endgroup$



                  Here is my paraphrase of what Fisher says in your bolded quote. It should not be forgotten that quite a lot goes into choosing what hypothesis to test, so much so that even for a single person's decision, you could not specify it all. It also should not be forgotten that, for reasons stated above, you cannot decide on a particular trial's significance level always the same way, as a life long habit.



                  1. A scientific hypothesis is selected as worth testing against many other competing hypotheses because of the biases of the researcher and their current state of knowledge. The hypotheses are "highly selected", not the samples; the hypotheses are the cases where we apply tests.


                  2. The selection process of the hypotheses affects our significance level. If we are very sure of a hypothesis, that should make the significance level less stringent to satisfy ourselves. If we are unsure there is higher burden of proof. Other factors come into play as well, such as Type I error being worse than Type II in drug trials.


                  3. I think when he says "indicated by" he simply means "chosen for". Yes, it is a preset value where we reject the hypothesis if the p-value is more extreme.







                  share|cite|improve this answer












                  share|cite|improve this answer



                  share|cite|improve this answer










                  answered Aug 8 at 7:44









                  Drew N Drew N

                  3952 silver badges9 bronze badges




                  3952 silver badges9 bronze badges


























                      9












                      $begingroup$

                      The cases to which Fisher is referring are not observations but tests. That is, we select hypotheses to test. We don't just test random hypotheses - we base them on observation, the literature, scientific theories and so on.



                      If you did test random hypotheses, then the number of times you are mistaken (in the first sentence of your quote) would be 1% (or whatever value is chosen). E.g. if we tested hypotheses like



                      • The parity of a person's social security number is related to his IQ


                      • Blond haired people throw Frisbees better than dark haired people


                      • The time to getting an answer on Cross Validated is related to the number of syllables in your first name.


                      And tested a whole bunch of them at 1%, we would reject the null about 1% of the time, and do so incorrectly. (Unless, of course, I am on to something with the above nonsense).



                      I did once see an article about hair color and Frisbee throwing - and it found a difference! So, I call this sort of thing "Frisbee research".



                      But the part I like the best from the quote is this:




                      for in fact no scientific worker has a fixed level of significance at
                      which from year to year, and in all circumstances, he rejects
                      hypotheses; he rather gives his mind to each particular case in the
                      light of his evidence and his ideas.




                      He must be spinning in his grave.






                      share|cite|improve this answer









                      $endgroup$










                      • 3




                        $begingroup$
                        This is a good answer, but I'm hesitated to view "Frisbee research" as bad things. As long as the methodologies are employed properly (taking into account the effect size, etc), I would consider the result plausible. I mean, it is believed that hair color has nothing to do with Frisbee throwing, but it was accepted that Earth is at the center of the universe until hundreds of years ago! We can criticize people for doing things wrong, but we shouldn't blame anyone for asking questions. That being said, I agree that some hypotheses are less useful than others, but still, they can be correct.
                        $endgroup$
                        – nalzok
                        Aug 9 at 0:09











                      • $begingroup$
                        And they can also be type I errors.
                        $endgroup$
                        – Peter Flom
                        Aug 9 at 10:54















                      9












                      $begingroup$

                      The cases to which Fisher is referring are not observations but tests. That is, we select hypotheses to test. We don't just test random hypotheses - we base them on observation, the literature, scientific theories and so on.



                      If you did test random hypotheses, then the number of times you are mistaken (in the first sentence of your quote) would be 1% (or whatever value is chosen). E.g. if we tested hypotheses like



                      • The parity of a person's social security number is related to his IQ


                      • Blond haired people throw Frisbees better than dark haired people


                      • The time to getting an answer on Cross Validated is related to the number of syllables in your first name.


                      And tested a whole bunch of them at 1%, we would reject the null about 1% of the time, and do so incorrectly. (Unless, of course, I am on to something with the above nonsense).



                      I did once see an article about hair color and Frisbee throwing - and it found a difference! So, I call this sort of thing "Frisbee research".



                      But the part I like the best from the quote is this:




                      for in fact no scientific worker has a fixed level of significance at
                      which from year to year, and in all circumstances, he rejects
                      hypotheses; he rather gives his mind to each particular case in the
                      light of his evidence and his ideas.




                      He must be spinning in his grave.






                      share|cite|improve this answer









                      $endgroup$










                      • 3




                        $begingroup$
                        This is a good answer, but I'm hesitated to view "Frisbee research" as bad things. As long as the methodologies are employed properly (taking into account the effect size, etc), I would consider the result plausible. I mean, it is believed that hair color has nothing to do with Frisbee throwing, but it was accepted that Earth is at the center of the universe until hundreds of years ago! We can criticize people for doing things wrong, but we shouldn't blame anyone for asking questions. That being said, I agree that some hypotheses are less useful than others, but still, they can be correct.
                        $endgroup$
                        – nalzok
                        Aug 9 at 0:09











                      • $begingroup$
                        And they can also be type I errors.
                        $endgroup$
                        – Peter Flom
                        Aug 9 at 10:54













                      9












                      9








                      9





                      $begingroup$

                      The cases to which Fisher is referring are not observations but tests. That is, we select hypotheses to test. We don't just test random hypotheses - we base them on observation, the literature, scientific theories and so on.



                      If you did test random hypotheses, then the number of times you are mistaken (in the first sentence of your quote) would be 1% (or whatever value is chosen). E.g. if we tested hypotheses like



                      • The parity of a person's social security number is related to his IQ


                      • Blond haired people throw Frisbees better than dark haired people


                      • The time to getting an answer on Cross Validated is related to the number of syllables in your first name.


                      And tested a whole bunch of them at 1%, we would reject the null about 1% of the time, and do so incorrectly. (Unless, of course, I am on to something with the above nonsense).



                      I did once see an article about hair color and Frisbee throwing - and it found a difference! So, I call this sort of thing "Frisbee research".



                      But the part I like the best from the quote is this:




                      for in fact no scientific worker has a fixed level of significance at
                      which from year to year, and in all circumstances, he rejects
                      hypotheses; he rather gives his mind to each particular case in the
                      light of his evidence and his ideas.




                      He must be spinning in his grave.






                      share|cite|improve this answer









                      $endgroup$



                      The cases to which Fisher is referring are not observations but tests. That is, we select hypotheses to test. We don't just test random hypotheses - we base them on observation, the literature, scientific theories and so on.



                      If you did test random hypotheses, then the number of times you are mistaken (in the first sentence of your quote) would be 1% (or whatever value is chosen). E.g. if we tested hypotheses like



                      • The parity of a person's social security number is related to his IQ


                      • Blond haired people throw Frisbees better than dark haired people


                      • The time to getting an answer on Cross Validated is related to the number of syllables in your first name.


                      And tested a whole bunch of them at 1%, we would reject the null about 1% of the time, and do so incorrectly. (Unless, of course, I am on to something with the above nonsense).



                      I did once see an article about hair color and Frisbee throwing - and it found a difference! So, I call this sort of thing "Frisbee research".



                      But the part I like the best from the quote is this:




                      for in fact no scientific worker has a fixed level of significance at
                      which from year to year, and in all circumstances, he rejects
                      hypotheses; he rather gives his mind to each particular case in the
                      light of his evidence and his ideas.




                      He must be spinning in his grave.







                      share|cite|improve this answer












                      share|cite|improve this answer



                      share|cite|improve this answer










                      answered Aug 8 at 13:10









                      Peter FlomPeter Flom

                      80k13 gold badges116 silver badges225 bronze badges




                      80k13 gold badges116 silver badges225 bronze badges










                      • 3




                        $begingroup$
                        This is a good answer, but I'm hesitated to view "Frisbee research" as bad things. As long as the methodologies are employed properly (taking into account the effect size, etc), I would consider the result plausible. I mean, it is believed that hair color has nothing to do with Frisbee throwing, but it was accepted that Earth is at the center of the universe until hundreds of years ago! We can criticize people for doing things wrong, but we shouldn't blame anyone for asking questions. That being said, I agree that some hypotheses are less useful than others, but still, they can be correct.
                        $endgroup$
                        – nalzok
                        Aug 9 at 0:09











                      • $begingroup$
                        And they can also be type I errors.
                        $endgroup$
                        – Peter Flom
                        Aug 9 at 10:54












                      • 3




                        $begingroup$
                        This is a good answer, but I'm hesitated to view "Frisbee research" as bad things. As long as the methodologies are employed properly (taking into account the effect size, etc), I would consider the result plausible. I mean, it is believed that hair color has nothing to do with Frisbee throwing, but it was accepted that Earth is at the center of the universe until hundreds of years ago! We can criticize people for doing things wrong, but we shouldn't blame anyone for asking questions. That being said, I agree that some hypotheses are less useful than others, but still, they can be correct.
                        $endgroup$
                        – nalzok
                        Aug 9 at 0:09











                      • $begingroup$
                        And they can also be type I errors.
                        $endgroup$
                        – Peter Flom
                        Aug 9 at 10:54







                      3




                      3




                      $begingroup$
                      This is a good answer, but I'm hesitated to view "Frisbee research" as bad things. As long as the methodologies are employed properly (taking into account the effect size, etc), I would consider the result plausible. I mean, it is believed that hair color has nothing to do with Frisbee throwing, but it was accepted that Earth is at the center of the universe until hundreds of years ago! We can criticize people for doing things wrong, but we shouldn't blame anyone for asking questions. That being said, I agree that some hypotheses are less useful than others, but still, they can be correct.
                      $endgroup$
                      – nalzok
                      Aug 9 at 0:09





                      $begingroup$
                      This is a good answer, but I'm hesitated to view "Frisbee research" as bad things. As long as the methodologies are employed properly (taking into account the effect size, etc), I would consider the result plausible. I mean, it is believed that hair color has nothing to do with Frisbee throwing, but it was accepted that Earth is at the center of the universe until hundreds of years ago! We can criticize people for doing things wrong, but we shouldn't blame anyone for asking questions. That being said, I agree that some hypotheses are less useful than others, but still, they can be correct.
                      $endgroup$
                      – nalzok
                      Aug 9 at 0:09













                      $begingroup$
                      And they can also be type I errors.
                      $endgroup$
                      – Peter Flom
                      Aug 9 at 10:54




                      $begingroup$
                      And they can also be type I errors.
                      $endgroup$
                      – Peter Flom
                      Aug 9 at 10:54











                      2












                      $begingroup$

                      Trying to see the background of the quote I came to a version of the book (I am not sure which is which version) that has a slightly different quote



                      https://archive.org/details/in.ernet.dli.2015.134555/page/n47




                      The attempts that have been made to explain the cogency of tests of significance in scientific research, by reference to hypothetical frequencies of possible statements, based on them, being right or wrong, thus seem to miss the essential nature of such tests. A man who "rejects" a hypothesis provisionally, as a matter of habitual practice, when the significance is at the 1% level or higher, will certainly be mistaken in not
                      more than 1% of such decisions. For when the hypothesis is correct he will be mistaken in just 1% of these cases, and when it is incorrect he will never be mistaken in rejection. This inequality statement can therefore be made. However, the calculation is
                      absurdly academic, for in fact no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas. Further, the calculation is based solely on a hypothesis, which, in the light of the evidence, is often not believed to be true at all, so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance. To a practical man, also, who rejects a hypothesis, it is, of course, a matter of indifference with what probability he might be led to accept the hypothesis falsely, for in his case he is not accepting it.




                      This seems to me a criticism to use the mathematical expression of rejection possibilities, type I errors, as some rigorous argument. Those expressions are often not a good expression for what is relevant and neither are they rigorous.




                      1. Why are the cases chosen for applying a test "highly selected"?



                        This seems to relate to the sentence




                        Further, the calculation is based solely on a hypothesis, which, in
                        the light of the evidence, is often not believed to be true at all




                        We are not indifferent towards the hypothesis that is being tested, and often a hypothesis that is being tested is not believed to be true.




                      2. how is this related to the choice of the significance level?



                        This relates to




                        so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance




                        The p-value is just the frequency of making a mistake when the null-hypothesis is true. But the actual frequency of making a mistake will be different (lower).




                      3. what is "the actual level of significance indicated by a particular trial" referring to



                        I believe that this part refers to some sort of p-value hacking. Changing the significance level, alpha, after the observations have occurred in order to match the observed p-value, and pretend that this was the cut-off value all along from the beginning.







                      share|cite|improve this answer











                      $endgroup$



















                        2












                        $begingroup$

                        Trying to see the background of the quote I came to a version of the book (I am not sure which is which version) that has a slightly different quote



                        https://archive.org/details/in.ernet.dli.2015.134555/page/n47




                        The attempts that have been made to explain the cogency of tests of significance in scientific research, by reference to hypothetical frequencies of possible statements, based on them, being right or wrong, thus seem to miss the essential nature of such tests. A man who "rejects" a hypothesis provisionally, as a matter of habitual practice, when the significance is at the 1% level or higher, will certainly be mistaken in not
                        more than 1% of such decisions. For when the hypothesis is correct he will be mistaken in just 1% of these cases, and when it is incorrect he will never be mistaken in rejection. This inequality statement can therefore be made. However, the calculation is
                        absurdly academic, for in fact no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas. Further, the calculation is based solely on a hypothesis, which, in the light of the evidence, is often not believed to be true at all, so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance. To a practical man, also, who rejects a hypothesis, it is, of course, a matter of indifference with what probability he might be led to accept the hypothesis falsely, for in his case he is not accepting it.




                        This seems to me a criticism to use the mathematical expression of rejection possibilities, type I errors, as some rigorous argument. Those expressions are often not a good expression for what is relevant and neither are they rigorous.




                        1. Why are the cases chosen for applying a test "highly selected"?



                          This seems to relate to the sentence




                          Further, the calculation is based solely on a hypothesis, which, in
                          the light of the evidence, is often not believed to be true at all




                          We are not indifferent towards the hypothesis that is being tested, and often a hypothesis that is being tested is not believed to be true.




                        2. how is this related to the choice of the significance level?



                          This relates to




                          so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance




                          The p-value is just the frequency of making a mistake when the null-hypothesis is true. But the actual frequency of making a mistake will be different (lower).




                        3. what is "the actual level of significance indicated by a particular trial" referring to



                          I believe that this part refers to some sort of p-value hacking. Changing the significance level, alpha, after the observations have occurred in order to match the observed p-value, and pretend that this was the cut-off value all along from the beginning.







                        share|cite|improve this answer











                        $endgroup$

















                          2












                          2








                          2





                          $begingroup$

                          Trying to see the background of the quote I came to a version of the book (I am not sure which is which version) that has a slightly different quote



                          https://archive.org/details/in.ernet.dli.2015.134555/page/n47




                          The attempts that have been made to explain the cogency of tests of significance in scientific research, by reference to hypothetical frequencies of possible statements, based on them, being right or wrong, thus seem to miss the essential nature of such tests. A man who "rejects" a hypothesis provisionally, as a matter of habitual practice, when the significance is at the 1% level or higher, will certainly be mistaken in not
                          more than 1% of such decisions. For when the hypothesis is correct he will be mistaken in just 1% of these cases, and when it is incorrect he will never be mistaken in rejection. This inequality statement can therefore be made. However, the calculation is
                          absurdly academic, for in fact no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas. Further, the calculation is based solely on a hypothesis, which, in the light of the evidence, is often not believed to be true at all, so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance. To a practical man, also, who rejects a hypothesis, it is, of course, a matter of indifference with what probability he might be led to accept the hypothesis falsely, for in his case he is not accepting it.




                          This seems to me a criticism to use the mathematical expression of rejection possibilities, type I errors, as some rigorous argument. Those expressions are often not a good expression for what is relevant and neither are they rigorous.




                          1. Why are the cases chosen for applying a test "highly selected"?



                            This seems to relate to the sentence




                            Further, the calculation is based solely on a hypothesis, which, in
                            the light of the evidence, is often not believed to be true at all




                            We are not indifferent towards the hypothesis that is being tested, and often a hypothesis that is being tested is not believed to be true.




                          2. how is this related to the choice of the significance level?



                            This relates to




                            so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance




                            The p-value is just the frequency of making a mistake when the null-hypothesis is true. But the actual frequency of making a mistake will be different (lower).




                          3. what is "the actual level of significance indicated by a particular trial" referring to



                            I believe that this part refers to some sort of p-value hacking. Changing the significance level, alpha, after the observations have occurred in order to match the observed p-value, and pretend that this was the cut-off value all along from the beginning.







                          share|cite|improve this answer











                          $endgroup$



                          Trying to see the background of the quote I came to a version of the book (I am not sure which is which version) that has a slightly different quote



                          https://archive.org/details/in.ernet.dli.2015.134555/page/n47




                          The attempts that have been made to explain the cogency of tests of significance in scientific research, by reference to hypothetical frequencies of possible statements, based on them, being right or wrong, thus seem to miss the essential nature of such tests. A man who "rejects" a hypothesis provisionally, as a matter of habitual practice, when the significance is at the 1% level or higher, will certainly be mistaken in not
                          more than 1% of such decisions. For when the hypothesis is correct he will be mistaken in just 1% of these cases, and when it is incorrect he will never be mistaken in rejection. This inequality statement can therefore be made. However, the calculation is
                          absurdly academic, for in fact no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas. Further, the calculation is based solely on a hypothesis, which, in the light of the evidence, is often not believed to be true at all, so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance. To a practical man, also, who rejects a hypothesis, it is, of course, a matter of indifference with what probability he might be led to accept the hypothesis falsely, for in his case he is not accepting it.




                          This seems to me a criticism to use the mathematical expression of rejection possibilities, type I errors, as some rigorous argument. Those expressions are often not a good expression for what is relevant and neither are they rigorous.




                          1. Why are the cases chosen for applying a test "highly selected"?



                            This seems to relate to the sentence




                            Further, the calculation is based solely on a hypothesis, which, in
                            the light of the evidence, is often not believed to be true at all




                            We are not indifferent towards the hypothesis that is being tested, and often a hypothesis that is being tested is not believed to be true.




                          2. how is this related to the choice of the significance level?



                            This relates to




                            so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance




                            The p-value is just the frequency of making a mistake when the null-hypothesis is true. But the actual frequency of making a mistake will be different (lower).




                          3. what is "the actual level of significance indicated by a particular trial" referring to



                            I believe that this part refers to some sort of p-value hacking. Changing the significance level, alpha, after the observations have occurred in order to match the observed p-value, and pretend that this was the cut-off value all along from the beginning.








                          share|cite|improve this answer














                          share|cite|improve this answer



                          share|cite|improve this answer








                          edited Aug 8 at 11:18

























                          answered Aug 8 at 11:10









                          Martijn WeteringsMartijn Weterings

                          15.5k23 silver badges67 bronze badges




                          15.5k23 silver badges67 bronze badges






























                              draft saved

                              draft discarded
















































                              Thanks for contributing an answer to Cross Validated!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f421179%2fwhat-does-fisher-mean-by-this-quote%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Grendel Contents Story Scholarship Depictions Notes References Navigation menu10.1093/notesj/gjn112Berserkeree

                              Area configuration aggregation error after install Porto themeMagento 2.1 CE Installed but front/backend not loading/workingCSS not loading on page within Magento 2 pageCannot install module in Magento 2no commands defined in the “setup” namespace. in Magento2Magento 2: Static files are present but shows 404Why do i have to always run the commands to clean cache in Magento 2.1.8?Failure reason: 'Unable to unserialize value.'Error 500 after magento migrationIn production mode the site does not loadMagento 2 : Error 500 after installing

                              Middle Expansion Olielle Resaix Definition: Uttering songs of triumph shouting with joy triumphant exulting Sejunction Journal 붙다 달 고급 품목 외출 The stretch trades the screeching tin. Definition: The act of speaking with a drawl a drawl Cough Sand Definition: An uproar a quarrel a noisy outbreak Shake Iron Publicize Horse House Baby 사과 Resaix Flaggy Jelly Temporary Unequaled Puppet A drop in the bucket Shrew 성격 회원 성질 미팅 The burn frames the tacky quality. Materialistic The smoke reduces the way. Yammoe Nondescript Cheek 얼굴 배 약하다 날리다 타다 The illegal country shows the iron. Help Rule Drearien Smoke Teaching Meaty Wasp Abraham Lincoln Jaws 진심 수리하다 Size Cork Idea Convert Think Lark John Lennon 거울 청소 군 추천하다 아이스크림