What does Fisher mean by this quote?Where does this quote/poem come from?what does this +/- of “average” mean?What does p-value mean in R?Is the “hybrid” between Fisher and Neyman-Pearson approaches to statistical testing really an “incoherent mishmash”?What does “statistically insignificantly worse” mean?Is p-value essentially useless and dangerous to use?repeated measures design of Analysis of variance (ANOVA) with mean not statistically significant and Standard deviation significantWhat does the Hedges g mean in this meta-analysis?How to rigorously justify chosen false-positive/false-negative error rates and underlying cost ratio?How does Fisher calculate his $p$-value?
Can others monetize my project with GPLv3?
Vacuum collapse -- why do strong metals implode but glass doesn't?
A second course in the representation theory
Have only girls been born for a long time in this village?
What does it mean to have a subnet mask /32?
Why is 日本 read as "nihon" but not "nitsuhon"?
Do I have to learn /o/ or /ɔ/ separately?
How big would a Daddy Longlegs Spider need to be to kill an average Human?
How to persuade recruiters to send me the Job Description?
Why didn’t Doctor Strange stay in the original winning timeline?
Co-author responds to email by mistake cc'ing the EiC
(Why) May a Beit Din refuse to bury a body in order to coerce a man into giving a divorce?
What can I do to keep a threaded bolt from falling out of it’s slot
Is it appropriate for a prospective landlord to ask me for my credit report?
Overwrite file only if data
Potential new partner angry about first collaboration - how to answer email to close up this encounter in a graceful manner
How to setup a teletype to a unix shell
Sleeping solo in a double sleeping bag
What is the evidence on the danger of feeding whole blueberries and grapes to infants and toddlers?
Is "stainless" a bulk or a surface property of stainless steel?
Why don't we use Cavea-B
Why we don't have vaccination against all diseases which are caused by microbes?
!I!n!s!e!r!t! !n!b!e!t!w!e!e!n!
Should my "average" PC be able to discern the potential of encountering a gelatinous cube from subtle clues?
What does Fisher mean by this quote?
Where does this quote/poem come from?what does this +/- of “average” mean?What does p-value mean in R?Is the “hybrid” between Fisher and Neyman-Pearson approaches to statistical testing really an “incoherent mishmash”?What does “statistically insignificantly worse” mean?Is p-value essentially useless and dangerous to use?repeated measures design of Analysis of variance (ANOVA) with mean not statistically significant and Standard deviation significantWhat does the Hedges g mean in this meta-analysis?How to rigorously justify chosen false-positive/false-negative error rates and underlying cost ratio?How does Fisher calculate his $p$-value?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
I keep seeing this famous quote everywhere, but fail to understand the emphasized part every single time.
A man who ‘rejects’ a hypothesis provisionally, as a matter of
habitual practice, when the significance is at the 1% level or higher,
will certainly be mistaken in not more than 1% of such decisions. For
when the hypothesis is correct he will be mistaken in just 1% of these
cases, and when it is incorrect he will never be mistaken in
rejection. [...] However, the calculation is absurdly academic, for in
fact no scientific worker has a fixed level of significance at which
from year to year, and in all circumstances, he rejects hypotheses; he
rather gives his mind to each particular case in the light of his
evidence and his ideas. It should not be forgotten that the cases
chosen for applying a test are manifestly a highly selected set, and
that the conditions of selection cannot be specified even for a single
worker; nor that in the argument used it would clearly be illegitimate
for one to choose the actual level of significance indicated by a
particular trial as though it were his lifelong habit to use just this
level.
(Statistical Methods and Scientific Inference, 1956, p. 42-45)
More specifically, I don't understand
- Why are the cases chosen for applying a test "highly selected"? Say you wonder if the average height of people within an area is less than 165cm, and decide to conduct a test. The standard procedure, as far as I know, is to draw random samples from the area and measure their height. How can this be highly selected?
- Suppose the cases are highly selected, but how is this related to the choice of the significance level? Consider again the example above, if your sampling method (what I suppose is what Fisher refers to as conditions of selection) is skewed and somehow favors tall people, then the whole research is ruined, and subjective determination of the significance level cannot save it.
- Actually, I don't even know what is "the actual level of significance indicated by a particular trial" referring to. Is it the $p$-value of that experiment, some preset value like the (in)famous 0.05, or something else?
hypothesis-testing statistical-significance references experiment-design philosophical
$endgroup$
add a comment |
$begingroup$
I keep seeing this famous quote everywhere, but fail to understand the emphasized part every single time.
A man who ‘rejects’ a hypothesis provisionally, as a matter of
habitual practice, when the significance is at the 1% level or higher,
will certainly be mistaken in not more than 1% of such decisions. For
when the hypothesis is correct he will be mistaken in just 1% of these
cases, and when it is incorrect he will never be mistaken in
rejection. [...] However, the calculation is absurdly academic, for in
fact no scientific worker has a fixed level of significance at which
from year to year, and in all circumstances, he rejects hypotheses; he
rather gives his mind to each particular case in the light of his
evidence and his ideas. It should not be forgotten that the cases
chosen for applying a test are manifestly a highly selected set, and
that the conditions of selection cannot be specified even for a single
worker; nor that in the argument used it would clearly be illegitimate
for one to choose the actual level of significance indicated by a
particular trial as though it were his lifelong habit to use just this
level.
(Statistical Methods and Scientific Inference, 1956, p. 42-45)
More specifically, I don't understand
- Why are the cases chosen for applying a test "highly selected"? Say you wonder if the average height of people within an area is less than 165cm, and decide to conduct a test. The standard procedure, as far as I know, is to draw random samples from the area and measure their height. How can this be highly selected?
- Suppose the cases are highly selected, but how is this related to the choice of the significance level? Consider again the example above, if your sampling method (what I suppose is what Fisher refers to as conditions of selection) is skewed and somehow favors tall people, then the whole research is ruined, and subjective determination of the significance level cannot save it.
- Actually, I don't even know what is "the actual level of significance indicated by a particular trial" referring to. Is it the $p$-value of that experiment, some preset value like the (in)famous 0.05, or something else?
hypothesis-testing statistical-significance references experiment-design philosophical
$endgroup$
add a comment |
$begingroup$
I keep seeing this famous quote everywhere, but fail to understand the emphasized part every single time.
A man who ‘rejects’ a hypothesis provisionally, as a matter of
habitual practice, when the significance is at the 1% level or higher,
will certainly be mistaken in not more than 1% of such decisions. For
when the hypothesis is correct he will be mistaken in just 1% of these
cases, and when it is incorrect he will never be mistaken in
rejection. [...] However, the calculation is absurdly academic, for in
fact no scientific worker has a fixed level of significance at which
from year to year, and in all circumstances, he rejects hypotheses; he
rather gives his mind to each particular case in the light of his
evidence and his ideas. It should not be forgotten that the cases
chosen for applying a test are manifestly a highly selected set, and
that the conditions of selection cannot be specified even for a single
worker; nor that in the argument used it would clearly be illegitimate
for one to choose the actual level of significance indicated by a
particular trial as though it were his lifelong habit to use just this
level.
(Statistical Methods and Scientific Inference, 1956, p. 42-45)
More specifically, I don't understand
- Why are the cases chosen for applying a test "highly selected"? Say you wonder if the average height of people within an area is less than 165cm, and decide to conduct a test. The standard procedure, as far as I know, is to draw random samples from the area and measure their height. How can this be highly selected?
- Suppose the cases are highly selected, but how is this related to the choice of the significance level? Consider again the example above, if your sampling method (what I suppose is what Fisher refers to as conditions of selection) is skewed and somehow favors tall people, then the whole research is ruined, and subjective determination of the significance level cannot save it.
- Actually, I don't even know what is "the actual level of significance indicated by a particular trial" referring to. Is it the $p$-value of that experiment, some preset value like the (in)famous 0.05, or something else?
hypothesis-testing statistical-significance references experiment-design philosophical
$endgroup$
I keep seeing this famous quote everywhere, but fail to understand the emphasized part every single time.
A man who ‘rejects’ a hypothesis provisionally, as a matter of
habitual practice, when the significance is at the 1% level or higher,
will certainly be mistaken in not more than 1% of such decisions. For
when the hypothesis is correct he will be mistaken in just 1% of these
cases, and when it is incorrect he will never be mistaken in
rejection. [...] However, the calculation is absurdly academic, for in
fact no scientific worker has a fixed level of significance at which
from year to year, and in all circumstances, he rejects hypotheses; he
rather gives his mind to each particular case in the light of his
evidence and his ideas. It should not be forgotten that the cases
chosen for applying a test are manifestly a highly selected set, and
that the conditions of selection cannot be specified even for a single
worker; nor that in the argument used it would clearly be illegitimate
for one to choose the actual level of significance indicated by a
particular trial as though it were his lifelong habit to use just this
level.
(Statistical Methods and Scientific Inference, 1956, p. 42-45)
More specifically, I don't understand
- Why are the cases chosen for applying a test "highly selected"? Say you wonder if the average height of people within an area is less than 165cm, and decide to conduct a test. The standard procedure, as far as I know, is to draw random samples from the area and measure their height. How can this be highly selected?
- Suppose the cases are highly selected, but how is this related to the choice of the significance level? Consider again the example above, if your sampling method (what I suppose is what Fisher refers to as conditions of selection) is skewed and somehow favors tall people, then the whole research is ruined, and subjective determination of the significance level cannot save it.
- Actually, I don't even know what is "the actual level of significance indicated by a particular trial" referring to. Is it the $p$-value of that experiment, some preset value like the (in)famous 0.05, or something else?
hypothesis-testing statistical-significance references experiment-design philosophical
hypothesis-testing statistical-significance references experiment-design philosophical
edited Aug 8 at 6:07
nalzok
asked Aug 8 at 6:01
nalzoknalzok
5585 silver badges17 bronze badges
5585 silver badges17 bronze badges
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
Here is my paraphrase of what Fisher says in your bolded quote. It should not be forgotten that quite a lot goes into choosing what hypothesis to test, so much so that even for a single person's decision, you could not specify it all. It also should not be forgotten that, for reasons stated above, you cannot decide on a particular trial's significance level always the same way, as a life long habit.
A scientific hypothesis is selected as worth testing against many other competing hypotheses because of the biases of the researcher and their current state of knowledge. The hypotheses are "highly selected", not the samples; the hypotheses are the cases where we apply tests.
The selection process of the hypotheses affects our significance level. If we are very sure of a hypothesis, that should make the significance level less stringent to satisfy ourselves. If we are unsure there is higher burden of proof. Other factors come into play as well, such as Type I error being worse than Type II in drug trials.
I think when he says "indicated by" he simply means "chosen for". Yes, it is a preset value where we reject the hypothesis if the p-value is more extreme.
$endgroup$
add a comment |
$begingroup$
The cases to which Fisher is referring are not observations but tests. That is, we select hypotheses to test. We don't just test random hypotheses - we base them on observation, the literature, scientific theories and so on.
If you did test random hypotheses, then the number of times you are mistaken (in the first sentence of your quote) would be 1% (or whatever value is chosen). E.g. if we tested hypotheses like
The parity of a person's social security number is related to his IQ
Blond haired people throw Frisbees better than dark haired people
The time to getting an answer on Cross Validated is related to the number of syllables in your first name.
And tested a whole bunch of them at 1%, we would reject the null about 1% of the time, and do so incorrectly. (Unless, of course, I am on to something with the above nonsense).
I did once see an article about hair color and Frisbee throwing - and it found a difference! So, I call this sort of thing "Frisbee research".
But the part I like the best from the quote is this:
for in fact no scientific worker has a fixed level of significance at
which from year to year, and in all circumstances, he rejects
hypotheses; he rather gives his mind to each particular case in the
light of his evidence and his ideas.
He must be spinning in his grave.
$endgroup$
3
$begingroup$
This is a good answer, but I'm hesitated to view "Frisbee research" as bad things. As long as the methodologies are employed properly (taking into account the effect size, etc), I would consider the result plausible. I mean, it is believed that hair color has nothing to do with Frisbee throwing, but it was accepted that Earth is at the center of the universe until hundreds of years ago! We can criticize people for doing things wrong, but we shouldn't blame anyone for asking questions. That being said, I agree that some hypotheses are less useful than others, but still, they can be correct.
$endgroup$
– nalzok
Aug 9 at 0:09
$begingroup$
And they can also be type I errors.
$endgroup$
– Peter Flom♦
Aug 9 at 10:54
add a comment |
$begingroup$
Trying to see the background of the quote I came to a version of the book (I am not sure which is which version) that has a slightly different quote
https://archive.org/details/in.ernet.dli.2015.134555/page/n47
The attempts that have been made to explain the cogency of tests of significance in scientific research, by reference to hypothetical frequencies of possible statements, based on them, being right or wrong, thus seem to miss the essential nature of such tests. A man who "rejects" a hypothesis provisionally, as a matter of habitual practice, when the significance is at the 1% level or higher, will certainly be mistaken in not
more than 1% of such decisions. For when the hypothesis is correct he will be mistaken in just 1% of these cases, and when it is incorrect he will never be mistaken in rejection. This inequality statement can therefore be made. However, the calculation is
absurdly academic, for in fact no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas. Further, the calculation is based solely on a hypothesis, which, in the light of the evidence, is often not believed to be true at all, so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance. To a practical man, also, who rejects a hypothesis, it is, of course, a matter of indifference with what probability he might be led to accept the hypothesis falsely, for in his case he is not accepting it.
This seems to me a criticism to use the mathematical expression of rejection possibilities, type I errors, as some rigorous argument. Those expressions are often not a good expression for what is relevant and neither are they rigorous.
Why are the cases chosen for applying a test "highly selected"?
This seems to relate to the sentence
Further, the calculation is based solely on a hypothesis, which, in
the light of the evidence, is often not believed to be true at allWe are not indifferent towards the hypothesis that is being tested, and often a hypothesis that is being tested is not believed to be true.
how is this related to the choice of the significance level?
This relates to
so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance
The p-value is just the frequency of making a mistake when the null-hypothesis is true. But the actual frequency of making a mistake will be different (lower).
what is "the actual level of significance indicated by a particular trial" referring to
I believe that this part refers to some sort of p-value hacking. Changing the significance level, alpha, after the observations have occurred in order to match the observed p-value, and pretend that this was the cut-off value all along from the beginning.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f421179%2fwhat-does-fisher-mean-by-this-quote%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Here is my paraphrase of what Fisher says in your bolded quote. It should not be forgotten that quite a lot goes into choosing what hypothesis to test, so much so that even for a single person's decision, you could not specify it all. It also should not be forgotten that, for reasons stated above, you cannot decide on a particular trial's significance level always the same way, as a life long habit.
A scientific hypothesis is selected as worth testing against many other competing hypotheses because of the biases of the researcher and their current state of knowledge. The hypotheses are "highly selected", not the samples; the hypotheses are the cases where we apply tests.
The selection process of the hypotheses affects our significance level. If we are very sure of a hypothesis, that should make the significance level less stringent to satisfy ourselves. If we are unsure there is higher burden of proof. Other factors come into play as well, such as Type I error being worse than Type II in drug trials.
I think when he says "indicated by" he simply means "chosen for". Yes, it is a preset value where we reject the hypothesis if the p-value is more extreme.
$endgroup$
add a comment |
$begingroup$
Here is my paraphrase of what Fisher says in your bolded quote. It should not be forgotten that quite a lot goes into choosing what hypothesis to test, so much so that even for a single person's decision, you could not specify it all. It also should not be forgotten that, for reasons stated above, you cannot decide on a particular trial's significance level always the same way, as a life long habit.
A scientific hypothesis is selected as worth testing against many other competing hypotheses because of the biases of the researcher and their current state of knowledge. The hypotheses are "highly selected", not the samples; the hypotheses are the cases where we apply tests.
The selection process of the hypotheses affects our significance level. If we are very sure of a hypothesis, that should make the significance level less stringent to satisfy ourselves. If we are unsure there is higher burden of proof. Other factors come into play as well, such as Type I error being worse than Type II in drug trials.
I think when he says "indicated by" he simply means "chosen for". Yes, it is a preset value where we reject the hypothesis if the p-value is more extreme.
$endgroup$
add a comment |
$begingroup$
Here is my paraphrase of what Fisher says in your bolded quote. It should not be forgotten that quite a lot goes into choosing what hypothesis to test, so much so that even for a single person's decision, you could not specify it all. It also should not be forgotten that, for reasons stated above, you cannot decide on a particular trial's significance level always the same way, as a life long habit.
A scientific hypothesis is selected as worth testing against many other competing hypotheses because of the biases of the researcher and their current state of knowledge. The hypotheses are "highly selected", not the samples; the hypotheses are the cases where we apply tests.
The selection process of the hypotheses affects our significance level. If we are very sure of a hypothesis, that should make the significance level less stringent to satisfy ourselves. If we are unsure there is higher burden of proof. Other factors come into play as well, such as Type I error being worse than Type II in drug trials.
I think when he says "indicated by" he simply means "chosen for". Yes, it is a preset value where we reject the hypothesis if the p-value is more extreme.
$endgroup$
Here is my paraphrase of what Fisher says in your bolded quote. It should not be forgotten that quite a lot goes into choosing what hypothesis to test, so much so that even for a single person's decision, you could not specify it all. It also should not be forgotten that, for reasons stated above, you cannot decide on a particular trial's significance level always the same way, as a life long habit.
A scientific hypothesis is selected as worth testing against many other competing hypotheses because of the biases of the researcher and their current state of knowledge. The hypotheses are "highly selected", not the samples; the hypotheses are the cases where we apply tests.
The selection process of the hypotheses affects our significance level. If we are very sure of a hypothesis, that should make the significance level less stringent to satisfy ourselves. If we are unsure there is higher burden of proof. Other factors come into play as well, such as Type I error being worse than Type II in drug trials.
I think when he says "indicated by" he simply means "chosen for". Yes, it is a preset value where we reject the hypothesis if the p-value is more extreme.
answered Aug 8 at 7:44
Drew N Drew N
3952 silver badges9 bronze badges
3952 silver badges9 bronze badges
add a comment |
add a comment |
$begingroup$
The cases to which Fisher is referring are not observations but tests. That is, we select hypotheses to test. We don't just test random hypotheses - we base them on observation, the literature, scientific theories and so on.
If you did test random hypotheses, then the number of times you are mistaken (in the first sentence of your quote) would be 1% (or whatever value is chosen). E.g. if we tested hypotheses like
The parity of a person's social security number is related to his IQ
Blond haired people throw Frisbees better than dark haired people
The time to getting an answer on Cross Validated is related to the number of syllables in your first name.
And tested a whole bunch of them at 1%, we would reject the null about 1% of the time, and do so incorrectly. (Unless, of course, I am on to something with the above nonsense).
I did once see an article about hair color and Frisbee throwing - and it found a difference! So, I call this sort of thing "Frisbee research".
But the part I like the best from the quote is this:
for in fact no scientific worker has a fixed level of significance at
which from year to year, and in all circumstances, he rejects
hypotheses; he rather gives his mind to each particular case in the
light of his evidence and his ideas.
He must be spinning in his grave.
$endgroup$
3
$begingroup$
This is a good answer, but I'm hesitated to view "Frisbee research" as bad things. As long as the methodologies are employed properly (taking into account the effect size, etc), I would consider the result plausible. I mean, it is believed that hair color has nothing to do with Frisbee throwing, but it was accepted that Earth is at the center of the universe until hundreds of years ago! We can criticize people for doing things wrong, but we shouldn't blame anyone for asking questions. That being said, I agree that some hypotheses are less useful than others, but still, they can be correct.
$endgroup$
– nalzok
Aug 9 at 0:09
$begingroup$
And they can also be type I errors.
$endgroup$
– Peter Flom♦
Aug 9 at 10:54
add a comment |
$begingroup$
The cases to which Fisher is referring are not observations but tests. That is, we select hypotheses to test. We don't just test random hypotheses - we base them on observation, the literature, scientific theories and so on.
If you did test random hypotheses, then the number of times you are mistaken (in the first sentence of your quote) would be 1% (or whatever value is chosen). E.g. if we tested hypotheses like
The parity of a person's social security number is related to his IQ
Blond haired people throw Frisbees better than dark haired people
The time to getting an answer on Cross Validated is related to the number of syllables in your first name.
And tested a whole bunch of them at 1%, we would reject the null about 1% of the time, and do so incorrectly. (Unless, of course, I am on to something with the above nonsense).
I did once see an article about hair color and Frisbee throwing - and it found a difference! So, I call this sort of thing "Frisbee research".
But the part I like the best from the quote is this:
for in fact no scientific worker has a fixed level of significance at
which from year to year, and in all circumstances, he rejects
hypotheses; he rather gives his mind to each particular case in the
light of his evidence and his ideas.
He must be spinning in his grave.
$endgroup$
3
$begingroup$
This is a good answer, but I'm hesitated to view "Frisbee research" as bad things. As long as the methodologies are employed properly (taking into account the effect size, etc), I would consider the result plausible. I mean, it is believed that hair color has nothing to do with Frisbee throwing, but it was accepted that Earth is at the center of the universe until hundreds of years ago! We can criticize people for doing things wrong, but we shouldn't blame anyone for asking questions. That being said, I agree that some hypotheses are less useful than others, but still, they can be correct.
$endgroup$
– nalzok
Aug 9 at 0:09
$begingroup$
And they can also be type I errors.
$endgroup$
– Peter Flom♦
Aug 9 at 10:54
add a comment |
$begingroup$
The cases to which Fisher is referring are not observations but tests. That is, we select hypotheses to test. We don't just test random hypotheses - we base them on observation, the literature, scientific theories and so on.
If you did test random hypotheses, then the number of times you are mistaken (in the first sentence of your quote) would be 1% (or whatever value is chosen). E.g. if we tested hypotheses like
The parity of a person's social security number is related to his IQ
Blond haired people throw Frisbees better than dark haired people
The time to getting an answer on Cross Validated is related to the number of syllables in your first name.
And tested a whole bunch of them at 1%, we would reject the null about 1% of the time, and do so incorrectly. (Unless, of course, I am on to something with the above nonsense).
I did once see an article about hair color and Frisbee throwing - and it found a difference! So, I call this sort of thing "Frisbee research".
But the part I like the best from the quote is this:
for in fact no scientific worker has a fixed level of significance at
which from year to year, and in all circumstances, he rejects
hypotheses; he rather gives his mind to each particular case in the
light of his evidence and his ideas.
He must be spinning in his grave.
$endgroup$
The cases to which Fisher is referring are not observations but tests. That is, we select hypotheses to test. We don't just test random hypotheses - we base them on observation, the literature, scientific theories and so on.
If you did test random hypotheses, then the number of times you are mistaken (in the first sentence of your quote) would be 1% (or whatever value is chosen). E.g. if we tested hypotheses like
The parity of a person's social security number is related to his IQ
Blond haired people throw Frisbees better than dark haired people
The time to getting an answer on Cross Validated is related to the number of syllables in your first name.
And tested a whole bunch of them at 1%, we would reject the null about 1% of the time, and do so incorrectly. (Unless, of course, I am on to something with the above nonsense).
I did once see an article about hair color and Frisbee throwing - and it found a difference! So, I call this sort of thing "Frisbee research".
But the part I like the best from the quote is this:
for in fact no scientific worker has a fixed level of significance at
which from year to year, and in all circumstances, he rejects
hypotheses; he rather gives his mind to each particular case in the
light of his evidence and his ideas.
He must be spinning in his grave.
answered Aug 8 at 13:10
Peter Flom♦Peter Flom
80k13 gold badges116 silver badges225 bronze badges
80k13 gold badges116 silver badges225 bronze badges
3
$begingroup$
This is a good answer, but I'm hesitated to view "Frisbee research" as bad things. As long as the methodologies are employed properly (taking into account the effect size, etc), I would consider the result plausible. I mean, it is believed that hair color has nothing to do with Frisbee throwing, but it was accepted that Earth is at the center of the universe until hundreds of years ago! We can criticize people for doing things wrong, but we shouldn't blame anyone for asking questions. That being said, I agree that some hypotheses are less useful than others, but still, they can be correct.
$endgroup$
– nalzok
Aug 9 at 0:09
$begingroup$
And they can also be type I errors.
$endgroup$
– Peter Flom♦
Aug 9 at 10:54
add a comment |
3
$begingroup$
This is a good answer, but I'm hesitated to view "Frisbee research" as bad things. As long as the methodologies are employed properly (taking into account the effect size, etc), I would consider the result plausible. I mean, it is believed that hair color has nothing to do with Frisbee throwing, but it was accepted that Earth is at the center of the universe until hundreds of years ago! We can criticize people for doing things wrong, but we shouldn't blame anyone for asking questions. That being said, I agree that some hypotheses are less useful than others, but still, they can be correct.
$endgroup$
– nalzok
Aug 9 at 0:09
$begingroup$
And they can also be type I errors.
$endgroup$
– Peter Flom♦
Aug 9 at 10:54
3
3
$begingroup$
This is a good answer, but I'm hesitated to view "Frisbee research" as bad things. As long as the methodologies are employed properly (taking into account the effect size, etc), I would consider the result plausible. I mean, it is believed that hair color has nothing to do with Frisbee throwing, but it was accepted that Earth is at the center of the universe until hundreds of years ago! We can criticize people for doing things wrong, but we shouldn't blame anyone for asking questions. That being said, I agree that some hypotheses are less useful than others, but still, they can be correct.
$endgroup$
– nalzok
Aug 9 at 0:09
$begingroup$
This is a good answer, but I'm hesitated to view "Frisbee research" as bad things. As long as the methodologies are employed properly (taking into account the effect size, etc), I would consider the result plausible. I mean, it is believed that hair color has nothing to do with Frisbee throwing, but it was accepted that Earth is at the center of the universe until hundreds of years ago! We can criticize people for doing things wrong, but we shouldn't blame anyone for asking questions. That being said, I agree that some hypotheses are less useful than others, but still, they can be correct.
$endgroup$
– nalzok
Aug 9 at 0:09
$begingroup$
And they can also be type I errors.
$endgroup$
– Peter Flom♦
Aug 9 at 10:54
$begingroup$
And they can also be type I errors.
$endgroup$
– Peter Flom♦
Aug 9 at 10:54
add a comment |
$begingroup$
Trying to see the background of the quote I came to a version of the book (I am not sure which is which version) that has a slightly different quote
https://archive.org/details/in.ernet.dli.2015.134555/page/n47
The attempts that have been made to explain the cogency of tests of significance in scientific research, by reference to hypothetical frequencies of possible statements, based on them, being right or wrong, thus seem to miss the essential nature of such tests. A man who "rejects" a hypothesis provisionally, as a matter of habitual practice, when the significance is at the 1% level or higher, will certainly be mistaken in not
more than 1% of such decisions. For when the hypothesis is correct he will be mistaken in just 1% of these cases, and when it is incorrect he will never be mistaken in rejection. This inequality statement can therefore be made. However, the calculation is
absurdly academic, for in fact no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas. Further, the calculation is based solely on a hypothesis, which, in the light of the evidence, is often not believed to be true at all, so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance. To a practical man, also, who rejects a hypothesis, it is, of course, a matter of indifference with what probability he might be led to accept the hypothesis falsely, for in his case he is not accepting it.
This seems to me a criticism to use the mathematical expression of rejection possibilities, type I errors, as some rigorous argument. Those expressions are often not a good expression for what is relevant and neither are they rigorous.
Why are the cases chosen for applying a test "highly selected"?
This seems to relate to the sentence
Further, the calculation is based solely on a hypothesis, which, in
the light of the evidence, is often not believed to be true at allWe are not indifferent towards the hypothesis that is being tested, and often a hypothesis that is being tested is not believed to be true.
how is this related to the choice of the significance level?
This relates to
so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance
The p-value is just the frequency of making a mistake when the null-hypothesis is true. But the actual frequency of making a mistake will be different (lower).
what is "the actual level of significance indicated by a particular trial" referring to
I believe that this part refers to some sort of p-value hacking. Changing the significance level, alpha, after the observations have occurred in order to match the observed p-value, and pretend that this was the cut-off value all along from the beginning.
$endgroup$
add a comment |
$begingroup$
Trying to see the background of the quote I came to a version of the book (I am not sure which is which version) that has a slightly different quote
https://archive.org/details/in.ernet.dli.2015.134555/page/n47
The attempts that have been made to explain the cogency of tests of significance in scientific research, by reference to hypothetical frequencies of possible statements, based on them, being right or wrong, thus seem to miss the essential nature of such tests. A man who "rejects" a hypothesis provisionally, as a matter of habitual practice, when the significance is at the 1% level or higher, will certainly be mistaken in not
more than 1% of such decisions. For when the hypothesis is correct he will be mistaken in just 1% of these cases, and when it is incorrect he will never be mistaken in rejection. This inequality statement can therefore be made. However, the calculation is
absurdly academic, for in fact no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas. Further, the calculation is based solely on a hypothesis, which, in the light of the evidence, is often not believed to be true at all, so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance. To a practical man, also, who rejects a hypothesis, it is, of course, a matter of indifference with what probability he might be led to accept the hypothesis falsely, for in his case he is not accepting it.
This seems to me a criticism to use the mathematical expression of rejection possibilities, type I errors, as some rigorous argument. Those expressions are often not a good expression for what is relevant and neither are they rigorous.
Why are the cases chosen for applying a test "highly selected"?
This seems to relate to the sentence
Further, the calculation is based solely on a hypothesis, which, in
the light of the evidence, is often not believed to be true at allWe are not indifferent towards the hypothesis that is being tested, and often a hypothesis that is being tested is not believed to be true.
how is this related to the choice of the significance level?
This relates to
so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance
The p-value is just the frequency of making a mistake when the null-hypothesis is true. But the actual frequency of making a mistake will be different (lower).
what is "the actual level of significance indicated by a particular trial" referring to
I believe that this part refers to some sort of p-value hacking. Changing the significance level, alpha, after the observations have occurred in order to match the observed p-value, and pretend that this was the cut-off value all along from the beginning.
$endgroup$
add a comment |
$begingroup$
Trying to see the background of the quote I came to a version of the book (I am not sure which is which version) that has a slightly different quote
https://archive.org/details/in.ernet.dli.2015.134555/page/n47
The attempts that have been made to explain the cogency of tests of significance in scientific research, by reference to hypothetical frequencies of possible statements, based on them, being right or wrong, thus seem to miss the essential nature of such tests. A man who "rejects" a hypothesis provisionally, as a matter of habitual practice, when the significance is at the 1% level or higher, will certainly be mistaken in not
more than 1% of such decisions. For when the hypothesis is correct he will be mistaken in just 1% of these cases, and when it is incorrect he will never be mistaken in rejection. This inequality statement can therefore be made. However, the calculation is
absurdly academic, for in fact no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas. Further, the calculation is based solely on a hypothesis, which, in the light of the evidence, is often not believed to be true at all, so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance. To a practical man, also, who rejects a hypothesis, it is, of course, a matter of indifference with what probability he might be led to accept the hypothesis falsely, for in his case he is not accepting it.
This seems to me a criticism to use the mathematical expression of rejection possibilities, type I errors, as some rigorous argument. Those expressions are often not a good expression for what is relevant and neither are they rigorous.
Why are the cases chosen for applying a test "highly selected"?
This seems to relate to the sentence
Further, the calculation is based solely on a hypothesis, which, in
the light of the evidence, is often not believed to be true at allWe are not indifferent towards the hypothesis that is being tested, and often a hypothesis that is being tested is not believed to be true.
how is this related to the choice of the significance level?
This relates to
so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance
The p-value is just the frequency of making a mistake when the null-hypothesis is true. But the actual frequency of making a mistake will be different (lower).
what is "the actual level of significance indicated by a particular trial" referring to
I believe that this part refers to some sort of p-value hacking. Changing the significance level, alpha, after the observations have occurred in order to match the observed p-value, and pretend that this was the cut-off value all along from the beginning.
$endgroup$
Trying to see the background of the quote I came to a version of the book (I am not sure which is which version) that has a slightly different quote
https://archive.org/details/in.ernet.dli.2015.134555/page/n47
The attempts that have been made to explain the cogency of tests of significance in scientific research, by reference to hypothetical frequencies of possible statements, based on them, being right or wrong, thus seem to miss the essential nature of such tests. A man who "rejects" a hypothesis provisionally, as a matter of habitual practice, when the significance is at the 1% level or higher, will certainly be mistaken in not
more than 1% of such decisions. For when the hypothesis is correct he will be mistaken in just 1% of these cases, and when it is incorrect he will never be mistaken in rejection. This inequality statement can therefore be made. However, the calculation is
absurdly academic, for in fact no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas. Further, the calculation is based solely on a hypothesis, which, in the light of the evidence, is often not believed to be true at all, so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance. To a practical man, also, who rejects a hypothesis, it is, of course, a matter of indifference with what probability he might be led to accept the hypothesis falsely, for in his case he is not accepting it.
This seems to me a criticism to use the mathematical expression of rejection possibilities, type I errors, as some rigorous argument. Those expressions are often not a good expression for what is relevant and neither are they rigorous.
Why are the cases chosen for applying a test "highly selected"?
This seems to relate to the sentence
Further, the calculation is based solely on a hypothesis, which, in
the light of the evidence, is often not believed to be true at allWe are not indifferent towards the hypothesis that is being tested, and often a hypothesis that is being tested is not believed to be true.
how is this related to the choice of the significance level?
This relates to
so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance
The p-value is just the frequency of making a mistake when the null-hypothesis is true. But the actual frequency of making a mistake will be different (lower).
what is "the actual level of significance indicated by a particular trial" referring to
I believe that this part refers to some sort of p-value hacking. Changing the significance level, alpha, after the observations have occurred in order to match the observed p-value, and pretend that this was the cut-off value all along from the beginning.
edited Aug 8 at 11:18
answered Aug 8 at 11:10
Martijn WeteringsMartijn Weterings
15.5k23 silver badges67 bronze badges
15.5k23 silver badges67 bronze badges
add a comment |
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f421179%2fwhat-does-fisher-mean-by-this-quote%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown