Dropping outliers based on “2.5 times the RMSE”Is it reasonable to delete a large number of outliers from a dataset?Using regression weights when $Y$ might be measured with biasRemoving Outliers From Non-Linear Data in the Inappropriate Way Gives a Better ResultHow to define the multiplier range for variance test based outliers detection algorithm?Can I decrease further the RMSE based on this feature?Outliers and the meanHandling outliers in the target variableRemoving outliers based on cook's distance in R LanguageDetecting outliers with angle-based outlier degreeIs it cheating to drop the outliers based on the boxplot of Mean Absolute Error to improve a regression modelHow do gradient-based optimization methods deal with outliers?Is the proporation of outliers detected by mean + 2.5*standard deviation for poisson distribution larger or smaller then 5%How to interpret the ratio between a coefficient on a dummy and the coefficient of a log income variable?

What is the minimum wait before I may I re-enter the USA after a 90 day visit on the Visa B-2 Program?

Can anyone help me to adjust the following asterisks?

A Real World Example for Divide and Conquer Method

Ethiopian Airlines tickets seem to always have the same price regardless of the proximity of the date?

How to tell readers that I know my story is factually incorrect?

How does the Gameboy's memory bank switching work?

Killing a star safely

Making an example from 'Clean Code' more functional

What does Windows' "Tuning up Application Start" do?

Satellite in orbit in front of and behind the Moon

Difference between string += s1 and string = string + s1

Do gauntlets count as armor?

Is there an English word to describe when a sound "protrudes"?

What does a Nintendo Game Boy do when turned on without a game cartridge inserted?

Can two waves interfere head on?

Redirection operator, standard input and command parameters

What's so great about Shalantha's Delicate Disk?

3D cursor orientation

I want light controlled by one switch, not two

Why didn't NASA launch communications relay satellites for the Apollo missions?

Three Subway Escalators

Why would word of Princess Leia's capture generate sympathy for the Rebellion in the Senate?

How can electronics on board JWST survive the low operating temperature while it's difficult to survive lunar nights?

What's the physical meaning of the statement that "photons don't have positions"?



Dropping outliers based on “2.5 times the RMSE”


Is it reasonable to delete a large number of outliers from a dataset?Using regression weights when $Y$ might be measured with biasRemoving Outliers From Non-Linear Data in the Inappropriate Way Gives a Better ResultHow to define the multiplier range for variance test based outliers detection algorithm?Can I decrease further the RMSE based on this feature?Outliers and the meanHandling outliers in the target variableRemoving outliers based on cook's distance in R LanguageDetecting outliers with angle-based outlier degreeIs it cheating to drop the outliers based on the boxplot of Mean Absolute Error to improve a regression modelHow do gradient-based optimization methods deal with outliers?Is the proporation of outliers detected by mean + 2.5*standard deviation for poisson distribution larger or smaller then 5%How to interpret the ratio between a coefficient on a dummy and the coefficient of a log income variable?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








13












$begingroup$


In Kahneman and Deaton (2010)$^dagger$, the authors write the following:




This regression explains 37% of the variance, with a root mean square
error (RMSE) of 0.67852. To eliminate outliers and implausible income
reports, we dropped observations in which the absolute value of the
difference between log income and its prediction exceeded 2.5 times
the RMSE.




Is this common practice? What is the intuition behind doing so? It seems somewhat strange to define an outlier based upon a model which may not be well-specified in the first place. Shouldn't the determination of outliers be based on some theoretical grounds for what constitutes a plausible value, rather than how well your model predicts the real values?




$dagger$: Daniel Kahneman, Angus Deaton (2010): High income improves evaluation of life but not emotional well-being. Proceedings of the National Academy of Sciences Sep 2010, 107 (38) 16489-16493; DOI: 10.1073/pnas.1011492107










share|cite|improve this question











$endgroup$







  • 1




    $begingroup$
    When you give a quote from a paper, always give a reference that includes the page number.
    $endgroup$
    – Ben
    Jul 12 at 1:26







  • 7




    $begingroup$
    I can't say whether this is 'common practice', but I hope not. Automated removals of 'outliers' is fundamentally a bad idea. Maybe your model or removal criterion is not good, maybe there's something new going on (downturn beginning, fresh possibilities awakening) that you shouldn't ignore. // It's different if you can track a suspicious value to data entry error or equipment failure, or if the value is simply off-the-charts absurd (16'2" tall man, guy w/ 61 billable hours last Tuesday, 25min flight SFO-ORD). But not because it doesn't fit a model. I know a startup that went broke that way.
    $endgroup$
    – BruceET
    Jul 12 at 1:38







  • 7




    $begingroup$
    The statistical validity of this approach is reflected by the absurd number of decimals they report for the RMSE.
    $endgroup$
    – Frans Rodenburg
    Jul 12 at 3:55










  • $begingroup$
    This feels like a crude / heroic assumption solution to a question I asked a few months ago: stats.stackexchange.com/questions/390051/…
    $endgroup$
    – Adrian
    Jul 13 at 0:37

















13












$begingroup$


In Kahneman and Deaton (2010)$^dagger$, the authors write the following:




This regression explains 37% of the variance, with a root mean square
error (RMSE) of 0.67852. To eliminate outliers and implausible income
reports, we dropped observations in which the absolute value of the
difference between log income and its prediction exceeded 2.5 times
the RMSE.




Is this common practice? What is the intuition behind doing so? It seems somewhat strange to define an outlier based upon a model which may not be well-specified in the first place. Shouldn't the determination of outliers be based on some theoretical grounds for what constitutes a plausible value, rather than how well your model predicts the real values?




$dagger$: Daniel Kahneman, Angus Deaton (2010): High income improves evaluation of life but not emotional well-being. Proceedings of the National Academy of Sciences Sep 2010, 107 (38) 16489-16493; DOI: 10.1073/pnas.1011492107










share|cite|improve this question











$endgroup$







  • 1




    $begingroup$
    When you give a quote from a paper, always give a reference that includes the page number.
    $endgroup$
    – Ben
    Jul 12 at 1:26







  • 7




    $begingroup$
    I can't say whether this is 'common practice', but I hope not. Automated removals of 'outliers' is fundamentally a bad idea. Maybe your model or removal criterion is not good, maybe there's something new going on (downturn beginning, fresh possibilities awakening) that you shouldn't ignore. // It's different if you can track a suspicious value to data entry error or equipment failure, or if the value is simply off-the-charts absurd (16'2" tall man, guy w/ 61 billable hours last Tuesday, 25min flight SFO-ORD). But not because it doesn't fit a model. I know a startup that went broke that way.
    $endgroup$
    – BruceET
    Jul 12 at 1:38







  • 7




    $begingroup$
    The statistical validity of this approach is reflected by the absurd number of decimals they report for the RMSE.
    $endgroup$
    – Frans Rodenburg
    Jul 12 at 3:55










  • $begingroup$
    This feels like a crude / heroic assumption solution to a question I asked a few months ago: stats.stackexchange.com/questions/390051/…
    $endgroup$
    – Adrian
    Jul 13 at 0:37













13












13








13


3



$begingroup$


In Kahneman and Deaton (2010)$^dagger$, the authors write the following:




This regression explains 37% of the variance, with a root mean square
error (RMSE) of 0.67852. To eliminate outliers and implausible income
reports, we dropped observations in which the absolute value of the
difference between log income and its prediction exceeded 2.5 times
the RMSE.




Is this common practice? What is the intuition behind doing so? It seems somewhat strange to define an outlier based upon a model which may not be well-specified in the first place. Shouldn't the determination of outliers be based on some theoretical grounds for what constitutes a plausible value, rather than how well your model predicts the real values?




$dagger$: Daniel Kahneman, Angus Deaton (2010): High income improves evaluation of life but not emotional well-being. Proceedings of the National Academy of Sciences Sep 2010, 107 (38) 16489-16493; DOI: 10.1073/pnas.1011492107










share|cite|improve this question











$endgroup$




In Kahneman and Deaton (2010)$^dagger$, the authors write the following:




This regression explains 37% of the variance, with a root mean square
error (RMSE) of 0.67852. To eliminate outliers and implausible income
reports, we dropped observations in which the absolute value of the
difference between log income and its prediction exceeded 2.5 times
the RMSE.




Is this common practice? What is the intuition behind doing so? It seems somewhat strange to define an outlier based upon a model which may not be well-specified in the first place. Shouldn't the determination of outliers be based on some theoretical grounds for what constitutes a plausible value, rather than how well your model predicts the real values?




$dagger$: Daniel Kahneman, Angus Deaton (2010): High income improves evaluation of life but not emotional well-being. Proceedings of the National Academy of Sciences Sep 2010, 107 (38) 16489-16493; DOI: 10.1073/pnas.1011492107







regression outliers






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Jul 12 at 3:54









Frans Rodenburg

5,3121 gold badge7 silver badges31 bronze badges




5,3121 gold badge7 silver badges31 bronze badges










asked Jul 11 at 23:14









ParseltongueParseltongue

3393 silver badges16 bronze badges




3393 silver badges16 bronze badges







  • 1




    $begingroup$
    When you give a quote from a paper, always give a reference that includes the page number.
    $endgroup$
    – Ben
    Jul 12 at 1:26







  • 7




    $begingroup$
    I can't say whether this is 'common practice', but I hope not. Automated removals of 'outliers' is fundamentally a bad idea. Maybe your model or removal criterion is not good, maybe there's something new going on (downturn beginning, fresh possibilities awakening) that you shouldn't ignore. // It's different if you can track a suspicious value to data entry error or equipment failure, or if the value is simply off-the-charts absurd (16'2" tall man, guy w/ 61 billable hours last Tuesday, 25min flight SFO-ORD). But not because it doesn't fit a model. I know a startup that went broke that way.
    $endgroup$
    – BruceET
    Jul 12 at 1:38







  • 7




    $begingroup$
    The statistical validity of this approach is reflected by the absurd number of decimals they report for the RMSE.
    $endgroup$
    – Frans Rodenburg
    Jul 12 at 3:55










  • $begingroup$
    This feels like a crude / heroic assumption solution to a question I asked a few months ago: stats.stackexchange.com/questions/390051/…
    $endgroup$
    – Adrian
    Jul 13 at 0:37












  • 1




    $begingroup$
    When you give a quote from a paper, always give a reference that includes the page number.
    $endgroup$
    – Ben
    Jul 12 at 1:26







  • 7




    $begingroup$
    I can't say whether this is 'common practice', but I hope not. Automated removals of 'outliers' is fundamentally a bad idea. Maybe your model or removal criterion is not good, maybe there's something new going on (downturn beginning, fresh possibilities awakening) that you shouldn't ignore. // It's different if you can track a suspicious value to data entry error or equipment failure, or if the value is simply off-the-charts absurd (16'2" tall man, guy w/ 61 billable hours last Tuesday, 25min flight SFO-ORD). But not because it doesn't fit a model. I know a startup that went broke that way.
    $endgroup$
    – BruceET
    Jul 12 at 1:38







  • 7




    $begingroup$
    The statistical validity of this approach is reflected by the absurd number of decimals they report for the RMSE.
    $endgroup$
    – Frans Rodenburg
    Jul 12 at 3:55










  • $begingroup$
    This feels like a crude / heroic assumption solution to a question I asked a few months ago: stats.stackexchange.com/questions/390051/…
    $endgroup$
    – Adrian
    Jul 13 at 0:37







1




1




$begingroup$
When you give a quote from a paper, always give a reference that includes the page number.
$endgroup$
– Ben
Jul 12 at 1:26





$begingroup$
When you give a quote from a paper, always give a reference that includes the page number.
$endgroup$
– Ben
Jul 12 at 1:26





7




7




$begingroup$
I can't say whether this is 'common practice', but I hope not. Automated removals of 'outliers' is fundamentally a bad idea. Maybe your model or removal criterion is not good, maybe there's something new going on (downturn beginning, fresh possibilities awakening) that you shouldn't ignore. // It's different if you can track a suspicious value to data entry error or equipment failure, or if the value is simply off-the-charts absurd (16'2" tall man, guy w/ 61 billable hours last Tuesday, 25min flight SFO-ORD). But not because it doesn't fit a model. I know a startup that went broke that way.
$endgroup$
– BruceET
Jul 12 at 1:38





$begingroup$
I can't say whether this is 'common practice', but I hope not. Automated removals of 'outliers' is fundamentally a bad idea. Maybe your model or removal criterion is not good, maybe there's something new going on (downturn beginning, fresh possibilities awakening) that you shouldn't ignore. // It's different if you can track a suspicious value to data entry error or equipment failure, or if the value is simply off-the-charts absurd (16'2" tall man, guy w/ 61 billable hours last Tuesday, 25min flight SFO-ORD). But not because it doesn't fit a model. I know a startup that went broke that way.
$endgroup$
– BruceET
Jul 12 at 1:38





7




7




$begingroup$
The statistical validity of this approach is reflected by the absurd number of decimals they report for the RMSE.
$endgroup$
– Frans Rodenburg
Jul 12 at 3:55




$begingroup$
The statistical validity of this approach is reflected by the absurd number of decimals they report for the RMSE.
$endgroup$
– Frans Rodenburg
Jul 12 at 3:55












$begingroup$
This feels like a crude / heroic assumption solution to a question I asked a few months ago: stats.stackexchange.com/questions/390051/…
$endgroup$
– Adrian
Jul 13 at 0:37




$begingroup$
This feels like a crude / heroic assumption solution to a question I asked a few months ago: stats.stackexchange.com/questions/390051/…
$endgroup$
– Adrian
Jul 13 at 0:37










1 Answer
1






active

oldest

votes


















30












$begingroup$

The reason for dropping this data is stated right there in the quote: namely, to "eliminate outliers and implausible income reports". The fact that they refer to both of these things in conjunction means that they are conceding that at least some of their outliers are not implausible values, and in any case, they give no argument for why values with a high residual should be considered "implausible" income values. By doing this, they are effectively removing data points because the residuals are higher than what is expected in their regression model. As I have stated in another answers here, this is tantamount to requiring reality to conform to your model assumptions, and ignoring parts of reality that are non-compliant with those assumptions.



Whether or not this is a common practice, it is a terrible practice. It occurs because the outlying data points are hard to deal with, and the analyst is unwilling to model them properly (e.g., by using a model that allows higher kurtosis in the error terms), so they just remove parts of reality that don't conform to their ability to undertake statistical modelling. This practice is statistically undesirable and it leads to inferences that systematically underestimate variance and kurtosis in the error terms. The authors of this paper report that they dropped 3.22% of their data due to the removal of these outliers (p. 16490). Since most of these data points would have been very high incomes, this casts substantial doubt on their ability to make robust conclusions about the effect of high incomes (which is the goal of their paper).






share|cite|improve this answer











$endgroup$












  • $begingroup$
    How dare you criticize the Daniel Kahneman! Jokes aside, those are very good points +1.
    $endgroup$
    – Tim
    Jul 12 at 6:57






  • 11




    $begingroup$
    Kahneman is a very fine psychologist, whose books I have generally enjoyed and found helpful. They could each have fifty Nobel prizes --- it wouldn't change the fact that mass removal of "outliers" is a terrible statistical practice.
    $endgroup$
    – Ben
    Jul 12 at 8:19







  • 3




    $begingroup$
    Naturally I agree with you. I didn't think that needed saying.
    $endgroup$
    – Nick Cox
    Jul 12 at 8:24






  • 1




    $begingroup$
    @NickCox You mean the so called "Nobel Memorial Prize": as I'm sure you know it wasn't established by Nobel and has nothing to do with him really. The official name is apparently "The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel".
    $endgroup$
    – amoeba
    Jul 12 at 9:27






  • 1




    $begingroup$
    You're sure I know that and you are indeed correct. The always authoritative EJMR once carried this posting about me "No, he will never win the Nobel", meaning that prize.
    $endgroup$
    – Nick Cox
    Jul 12 at 10:20













Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f417055%2fdropping-outliers-based-on-2-5-times-the-rmse%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









30












$begingroup$

The reason for dropping this data is stated right there in the quote: namely, to "eliminate outliers and implausible income reports". The fact that they refer to both of these things in conjunction means that they are conceding that at least some of their outliers are not implausible values, and in any case, they give no argument for why values with a high residual should be considered "implausible" income values. By doing this, they are effectively removing data points because the residuals are higher than what is expected in their regression model. As I have stated in another answers here, this is tantamount to requiring reality to conform to your model assumptions, and ignoring parts of reality that are non-compliant with those assumptions.



Whether or not this is a common practice, it is a terrible practice. It occurs because the outlying data points are hard to deal with, and the analyst is unwilling to model them properly (e.g., by using a model that allows higher kurtosis in the error terms), so they just remove parts of reality that don't conform to their ability to undertake statistical modelling. This practice is statistically undesirable and it leads to inferences that systematically underestimate variance and kurtosis in the error terms. The authors of this paper report that they dropped 3.22% of their data due to the removal of these outliers (p. 16490). Since most of these data points would have been very high incomes, this casts substantial doubt on their ability to make robust conclusions about the effect of high incomes (which is the goal of their paper).






share|cite|improve this answer











$endgroup$












  • $begingroup$
    How dare you criticize the Daniel Kahneman! Jokes aside, those are very good points +1.
    $endgroup$
    – Tim
    Jul 12 at 6:57






  • 11




    $begingroup$
    Kahneman is a very fine psychologist, whose books I have generally enjoyed and found helpful. They could each have fifty Nobel prizes --- it wouldn't change the fact that mass removal of "outliers" is a terrible statistical practice.
    $endgroup$
    – Ben
    Jul 12 at 8:19







  • 3




    $begingroup$
    Naturally I agree with you. I didn't think that needed saying.
    $endgroup$
    – Nick Cox
    Jul 12 at 8:24






  • 1




    $begingroup$
    @NickCox You mean the so called "Nobel Memorial Prize": as I'm sure you know it wasn't established by Nobel and has nothing to do with him really. The official name is apparently "The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel".
    $endgroup$
    – amoeba
    Jul 12 at 9:27






  • 1




    $begingroup$
    You're sure I know that and you are indeed correct. The always authoritative EJMR once carried this posting about me "No, he will never win the Nobel", meaning that prize.
    $endgroup$
    – Nick Cox
    Jul 12 at 10:20















30












$begingroup$

The reason for dropping this data is stated right there in the quote: namely, to "eliminate outliers and implausible income reports". The fact that they refer to both of these things in conjunction means that they are conceding that at least some of their outliers are not implausible values, and in any case, they give no argument for why values with a high residual should be considered "implausible" income values. By doing this, they are effectively removing data points because the residuals are higher than what is expected in their regression model. As I have stated in another answers here, this is tantamount to requiring reality to conform to your model assumptions, and ignoring parts of reality that are non-compliant with those assumptions.



Whether or not this is a common practice, it is a terrible practice. It occurs because the outlying data points are hard to deal with, and the analyst is unwilling to model them properly (e.g., by using a model that allows higher kurtosis in the error terms), so they just remove parts of reality that don't conform to their ability to undertake statistical modelling. This practice is statistically undesirable and it leads to inferences that systematically underestimate variance and kurtosis in the error terms. The authors of this paper report that they dropped 3.22% of their data due to the removal of these outliers (p. 16490). Since most of these data points would have been very high incomes, this casts substantial doubt on their ability to make robust conclusions about the effect of high incomes (which is the goal of their paper).






share|cite|improve this answer











$endgroup$












  • $begingroup$
    How dare you criticize the Daniel Kahneman! Jokes aside, those are very good points +1.
    $endgroup$
    – Tim
    Jul 12 at 6:57






  • 11




    $begingroup$
    Kahneman is a very fine psychologist, whose books I have generally enjoyed and found helpful. They could each have fifty Nobel prizes --- it wouldn't change the fact that mass removal of "outliers" is a terrible statistical practice.
    $endgroup$
    – Ben
    Jul 12 at 8:19







  • 3




    $begingroup$
    Naturally I agree with you. I didn't think that needed saying.
    $endgroup$
    – Nick Cox
    Jul 12 at 8:24






  • 1




    $begingroup$
    @NickCox You mean the so called "Nobel Memorial Prize": as I'm sure you know it wasn't established by Nobel and has nothing to do with him really. The official name is apparently "The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel".
    $endgroup$
    – amoeba
    Jul 12 at 9:27






  • 1




    $begingroup$
    You're sure I know that and you are indeed correct. The always authoritative EJMR once carried this posting about me "No, he will never win the Nobel", meaning that prize.
    $endgroup$
    – Nick Cox
    Jul 12 at 10:20













30












30








30





$begingroup$

The reason for dropping this data is stated right there in the quote: namely, to "eliminate outliers and implausible income reports". The fact that they refer to both of these things in conjunction means that they are conceding that at least some of their outliers are not implausible values, and in any case, they give no argument for why values with a high residual should be considered "implausible" income values. By doing this, they are effectively removing data points because the residuals are higher than what is expected in their regression model. As I have stated in another answers here, this is tantamount to requiring reality to conform to your model assumptions, and ignoring parts of reality that are non-compliant with those assumptions.



Whether or not this is a common practice, it is a terrible practice. It occurs because the outlying data points are hard to deal with, and the analyst is unwilling to model them properly (e.g., by using a model that allows higher kurtosis in the error terms), so they just remove parts of reality that don't conform to their ability to undertake statistical modelling. This practice is statistically undesirable and it leads to inferences that systematically underestimate variance and kurtosis in the error terms. The authors of this paper report that they dropped 3.22% of their data due to the removal of these outliers (p. 16490). Since most of these data points would have been very high incomes, this casts substantial doubt on their ability to make robust conclusions about the effect of high incomes (which is the goal of their paper).






share|cite|improve this answer











$endgroup$



The reason for dropping this data is stated right there in the quote: namely, to "eliminate outliers and implausible income reports". The fact that they refer to both of these things in conjunction means that they are conceding that at least some of their outliers are not implausible values, and in any case, they give no argument for why values with a high residual should be considered "implausible" income values. By doing this, they are effectively removing data points because the residuals are higher than what is expected in their regression model. As I have stated in another answers here, this is tantamount to requiring reality to conform to your model assumptions, and ignoring parts of reality that are non-compliant with those assumptions.



Whether or not this is a common practice, it is a terrible practice. It occurs because the outlying data points are hard to deal with, and the analyst is unwilling to model them properly (e.g., by using a model that allows higher kurtosis in the error terms), so they just remove parts of reality that don't conform to their ability to undertake statistical modelling. This practice is statistically undesirable and it leads to inferences that systematically underestimate variance and kurtosis in the error terms. The authors of this paper report that they dropped 3.22% of their data due to the removal of these outliers (p. 16490). Since most of these data points would have been very high incomes, this casts substantial doubt on their ability to make robust conclusions about the effect of high incomes (which is the goal of their paper).







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited Jul 13 at 0:42

























answered Jul 12 at 1:41









BenBen

35.1k2 gold badges43 silver badges154 bronze badges




35.1k2 gold badges43 silver badges154 bronze badges











  • $begingroup$
    How dare you criticize the Daniel Kahneman! Jokes aside, those are very good points +1.
    $endgroup$
    – Tim
    Jul 12 at 6:57






  • 11




    $begingroup$
    Kahneman is a very fine psychologist, whose books I have generally enjoyed and found helpful. They could each have fifty Nobel prizes --- it wouldn't change the fact that mass removal of "outliers" is a terrible statistical practice.
    $endgroup$
    – Ben
    Jul 12 at 8:19







  • 3




    $begingroup$
    Naturally I agree with you. I didn't think that needed saying.
    $endgroup$
    – Nick Cox
    Jul 12 at 8:24






  • 1




    $begingroup$
    @NickCox You mean the so called "Nobel Memorial Prize": as I'm sure you know it wasn't established by Nobel and has nothing to do with him really. The official name is apparently "The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel".
    $endgroup$
    – amoeba
    Jul 12 at 9:27






  • 1




    $begingroup$
    You're sure I know that and you are indeed correct. The always authoritative EJMR once carried this posting about me "No, he will never win the Nobel", meaning that prize.
    $endgroup$
    – Nick Cox
    Jul 12 at 10:20
















  • $begingroup$
    How dare you criticize the Daniel Kahneman! Jokes aside, those are very good points +1.
    $endgroup$
    – Tim
    Jul 12 at 6:57






  • 11




    $begingroup$
    Kahneman is a very fine psychologist, whose books I have generally enjoyed and found helpful. They could each have fifty Nobel prizes --- it wouldn't change the fact that mass removal of "outliers" is a terrible statistical practice.
    $endgroup$
    – Ben
    Jul 12 at 8:19







  • 3




    $begingroup$
    Naturally I agree with you. I didn't think that needed saying.
    $endgroup$
    – Nick Cox
    Jul 12 at 8:24






  • 1




    $begingroup$
    @NickCox You mean the so called "Nobel Memorial Prize": as I'm sure you know it wasn't established by Nobel and has nothing to do with him really. The official name is apparently "The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel".
    $endgroup$
    – amoeba
    Jul 12 at 9:27






  • 1




    $begingroup$
    You're sure I know that and you are indeed correct. The always authoritative EJMR once carried this posting about me "No, he will never win the Nobel", meaning that prize.
    $endgroup$
    – Nick Cox
    Jul 12 at 10:20















$begingroup$
How dare you criticize the Daniel Kahneman! Jokes aside, those are very good points +1.
$endgroup$
– Tim
Jul 12 at 6:57




$begingroup$
How dare you criticize the Daniel Kahneman! Jokes aside, those are very good points +1.
$endgroup$
– Tim
Jul 12 at 6:57




11




11




$begingroup$
Kahneman is a very fine psychologist, whose books I have generally enjoyed and found helpful. They could each have fifty Nobel prizes --- it wouldn't change the fact that mass removal of "outliers" is a terrible statistical practice.
$endgroup$
– Ben
Jul 12 at 8:19





$begingroup$
Kahneman is a very fine psychologist, whose books I have generally enjoyed and found helpful. They could each have fifty Nobel prizes --- it wouldn't change the fact that mass removal of "outliers" is a terrible statistical practice.
$endgroup$
– Ben
Jul 12 at 8:19





3




3




$begingroup$
Naturally I agree with you. I didn't think that needed saying.
$endgroup$
– Nick Cox
Jul 12 at 8:24




$begingroup$
Naturally I agree with you. I didn't think that needed saying.
$endgroup$
– Nick Cox
Jul 12 at 8:24




1




1




$begingroup$
@NickCox You mean the so called "Nobel Memorial Prize": as I'm sure you know it wasn't established by Nobel and has nothing to do with him really. The official name is apparently "The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel".
$endgroup$
– amoeba
Jul 12 at 9:27




$begingroup$
@NickCox You mean the so called "Nobel Memorial Prize": as I'm sure you know it wasn't established by Nobel and has nothing to do with him really. The official name is apparently "The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel".
$endgroup$
– amoeba
Jul 12 at 9:27




1




1




$begingroup$
You're sure I know that and you are indeed correct. The always authoritative EJMR once carried this posting about me "No, he will never win the Nobel", meaning that prize.
$endgroup$
– Nick Cox
Jul 12 at 10:20




$begingroup$
You're sure I know that and you are indeed correct. The always authoritative EJMR once carried this posting about me "No, he will never win the Nobel", meaning that prize.
$endgroup$
– Nick Cox
Jul 12 at 10:20

















draft saved

draft discarded
















































Thanks for contributing an answer to Cross Validated!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f417055%2fdropping-outliers-based-on-2-5-times-the-rmse%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Get product attribute by attribute group code in magento 2get product attribute by product attribute group in magento 2Magento 2 Log Bundle Product Data in List Page?How to get all product attribute of a attribute group of Default attribute set?Magento 2.1 Create a filter in the product grid by new attributeMagento 2 : Get Product Attribute values By GroupMagento 2 How to get all existing values for one attributeMagento 2 get custom attribute of a single product inside a pluginMagento 2.3 How to get all the Multi Source Inventory (MSI) locations collection in custom module?Magento2: how to develop rest API to get new productsGet product attribute by attribute group code ( [attribute_group_code] ) in magento 2

Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

Magento 2.3: How do i solve this, Not registered handle, on custom form?How can i rewrite TierPrice Block in Magento2magento 2 captcha not rendering if I override layout xmlmain.CRITICAL: Plugin class doesn't existMagento 2 : Problem while adding custom button order view page?Magento 2.2.5: Overriding Admin Controller sales/orderMagento 2.2.5: Add, Update and Delete existing products Custom OptionsMagento 2.3 : File Upload issue in UI Component FormMagento2 Not registered handleHow to configured Form Builder Js in my custom magento 2.3.0 module?Magento 2.3. How to create image upload field in an admin form