Why does linear regression use “vertical” distance to the best-fit-line, instead of actual distance? [duplicate]Why does linear regression use a cost function based on the vertical distance between the hypothesis and the input data point?What is the difference between linear regression on y with x and x with y?Other ways to find line of “best” fitHow to plot the contribution of each regression coefficient in a model, with R?Line of best fit (Linear regression) over vertical lineOther ways to find line of “best” fitBest method of calculating line of best fit / extrapolate to compensate for delaysCoefficient of determination of a orthogonal regressionWhy is linear regression different from PCA?Visualling results from longitudinal mixed model with subtle time by treatment trendsHow do I explain the “line of best fit” in this diagram?Why does linear regression use a cost function based on the vertical distance between the hypothesis and the input data point?Can residuals be calculated from N-point moving averages or just the regression line? Also, what is the standard way to determine regression line?

In a script how can I signal who's winning the argument?

dos2unix is unable to convert typescript file to unix format

Is it OK to accept a job opportunity while planning on not taking it?

Raw curve25519 public key points

Are rockets faster than airplanes?

Is the apartment I want to rent a scam?

Where is this photo of a group of hikers taken? Is it really in the Ural?

Can 々 stand for a duplicated kanji with a different reading?

What the purpose of the fuel shutoff valve?

Idioms: Should it be " the internet is a seemingly infinite well of information" or "the internet is a seemingly infinite wealth of information"

Inverse Colombian Function

High income and difficulty during interviews

What is an Eternal Word™?

How can I deal with someone that wants to kill something that isn't supposed to be killed?

How can I make sure my players' decisions have consequences?

Extrapolation v. Interpolation

Why do people say "I am broke" instead of "I am broken"?

Why is the UH-60 tail rotor canted?

Considerations when providing money to one child now, and the other later?

Other than a swing wing, what types of variable geometry have flown?

Does Impedance Matching Imply any Practical RF Transmitter Must Waste >=50% of Energy?

Why is a dedicated QA team member necessary?

Why are there not any MRI machines available in Interstellar?

What happens if an IRB mistakenly approves unethical research?



Why does linear regression use “vertical” distance to the best-fit-line, instead of actual distance? [duplicate]


Why does linear regression use a cost function based on the vertical distance between the hypothesis and the input data point?What is the difference between linear regression on y with x and x with y?Other ways to find line of “best” fitHow to plot the contribution of each regression coefficient in a model, with R?Line of best fit (Linear regression) over vertical lineOther ways to find line of “best” fitBest method of calculating line of best fit / extrapolate to compensate for delaysCoefficient of determination of a orthogonal regressionWhy is linear regression different from PCA?Visualling results from longitudinal mixed model with subtle time by treatment trendsHow do I explain the “line of best fit” in this diagram?Why does linear regression use a cost function based on the vertical distance between the hypothesis and the input data point?Can residuals be calculated from N-point moving averages or just the regression line? Also, what is the standard way to determine regression line?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








3












$begingroup$



This question already has an answer here:



  • Why does linear regression use a cost function based on the vertical distance between the hypothesis and the input data point?

    5 answers



Linear regression uses the "vertical" (in two dimensions) distance of (y - ŷ). But this is not the real distance between any point and the best fit line.



I.e. - in the image here:



Enter image description here



you use the green lines instead of the purple.



Is this done because the math is simpler? Because the effect of using the real distance is negligible, or equivalent? Because it's actually better to use a "vertical" distance?










share|cite|improve this question











$endgroup$



marked as duplicate by Scortchi Jul 15 at 9:17


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.













  • 7




    $begingroup$
    There is such a thing as minimizing perpendicular distance. It is called Deming Regression. Ordinary linear regression assums the x value are known and the only error is in y. That is often a reasonable assumption.
    $endgroup$
    – Michael Chernick
    Jul 14 at 17:14






  • 2




    $begingroup$
    Sometimes the ultimate purpose of finding the regression line is to make predictions of $hat Y_i$'s based on future $x_i$'s. (There is a 'prediction interval' formula for that.) Then it is vertical distance that matters.
    $endgroup$
    – BruceET
    Jul 14 at 17:29






  • 1




    $begingroup$
    @MichaelChernick I think your one-liner explained it best, maybe you can elaborate it a bit, and post it as an answer?
    $endgroup$
    – David Refaeli
    Jul 14 at 17:36










  • $begingroup$
    I think Gung's answer is what I would say elaborating on my comment.
    $endgroup$
    – Michael Chernick
    Jul 14 at 18:54










  • $begingroup$
    Related: stats.stackexchange.com/questions/63966/…
    $endgroup$
    – Sycorax
    Jul 14 at 19:48

















3












$begingroup$



This question already has an answer here:



  • Why does linear regression use a cost function based on the vertical distance between the hypothesis and the input data point?

    5 answers



Linear regression uses the "vertical" (in two dimensions) distance of (y - ŷ). But this is not the real distance between any point and the best fit line.



I.e. - in the image here:



Enter image description here



you use the green lines instead of the purple.



Is this done because the math is simpler? Because the effect of using the real distance is negligible, or equivalent? Because it's actually better to use a "vertical" distance?










share|cite|improve this question











$endgroup$



marked as duplicate by Scortchi Jul 15 at 9:17


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.













  • 7




    $begingroup$
    There is such a thing as minimizing perpendicular distance. It is called Deming Regression. Ordinary linear regression assums the x value are known and the only error is in y. That is often a reasonable assumption.
    $endgroup$
    – Michael Chernick
    Jul 14 at 17:14






  • 2




    $begingroup$
    Sometimes the ultimate purpose of finding the regression line is to make predictions of $hat Y_i$'s based on future $x_i$'s. (There is a 'prediction interval' formula for that.) Then it is vertical distance that matters.
    $endgroup$
    – BruceET
    Jul 14 at 17:29






  • 1




    $begingroup$
    @MichaelChernick I think your one-liner explained it best, maybe you can elaborate it a bit, and post it as an answer?
    $endgroup$
    – David Refaeli
    Jul 14 at 17:36










  • $begingroup$
    I think Gung's answer is what I would say elaborating on my comment.
    $endgroup$
    – Michael Chernick
    Jul 14 at 18:54










  • $begingroup$
    Related: stats.stackexchange.com/questions/63966/…
    $endgroup$
    – Sycorax
    Jul 14 at 19:48













3












3








3





$begingroup$



This question already has an answer here:



  • Why does linear regression use a cost function based on the vertical distance between the hypothesis and the input data point?

    5 answers



Linear regression uses the "vertical" (in two dimensions) distance of (y - ŷ). But this is not the real distance between any point and the best fit line.



I.e. - in the image here:



Enter image description here



you use the green lines instead of the purple.



Is this done because the math is simpler? Because the effect of using the real distance is negligible, or equivalent? Because it's actually better to use a "vertical" distance?










share|cite|improve this question











$endgroup$





This question already has an answer here:



  • Why does linear regression use a cost function based on the vertical distance between the hypothesis and the input data point?

    5 answers



Linear regression uses the "vertical" (in two dimensions) distance of (y - ŷ). But this is not the real distance between any point and the best fit line.



I.e. - in the image here:



Enter image description here



you use the green lines instead of the purple.



Is this done because the math is simpler? Because the effect of using the real distance is negligible, or equivalent? Because it's actually better to use a "vertical" distance?





This question already has an answer here:



  • Why does linear regression use a cost function based on the vertical distance between the hypothesis and the input data point?

    5 answers







regression linear-model






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Jul 15 at 4:55









Peter Mortensen

2032 silver badges8 bronze badges




2032 silver badges8 bronze badges










asked Jul 14 at 17:09









David RefaeliDavid Refaeli

1366 bronze badges




1366 bronze badges




marked as duplicate by Scortchi Jul 15 at 9:17


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.









marked as duplicate by Scortchi Jul 15 at 9:17


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.









  • 7




    $begingroup$
    There is such a thing as minimizing perpendicular distance. It is called Deming Regression. Ordinary linear regression assums the x value are known and the only error is in y. That is often a reasonable assumption.
    $endgroup$
    – Michael Chernick
    Jul 14 at 17:14






  • 2




    $begingroup$
    Sometimes the ultimate purpose of finding the regression line is to make predictions of $hat Y_i$'s based on future $x_i$'s. (There is a 'prediction interval' formula for that.) Then it is vertical distance that matters.
    $endgroup$
    – BruceET
    Jul 14 at 17:29






  • 1




    $begingroup$
    @MichaelChernick I think your one-liner explained it best, maybe you can elaborate it a bit, and post it as an answer?
    $endgroup$
    – David Refaeli
    Jul 14 at 17:36










  • $begingroup$
    I think Gung's answer is what I would say elaborating on my comment.
    $endgroup$
    – Michael Chernick
    Jul 14 at 18:54










  • $begingroup$
    Related: stats.stackexchange.com/questions/63966/…
    $endgroup$
    – Sycorax
    Jul 14 at 19:48












  • 7




    $begingroup$
    There is such a thing as minimizing perpendicular distance. It is called Deming Regression. Ordinary linear regression assums the x value are known and the only error is in y. That is often a reasonable assumption.
    $endgroup$
    – Michael Chernick
    Jul 14 at 17:14






  • 2




    $begingroup$
    Sometimes the ultimate purpose of finding the regression line is to make predictions of $hat Y_i$'s based on future $x_i$'s. (There is a 'prediction interval' formula for that.) Then it is vertical distance that matters.
    $endgroup$
    – BruceET
    Jul 14 at 17:29






  • 1




    $begingroup$
    @MichaelChernick I think your one-liner explained it best, maybe you can elaborate it a bit, and post it as an answer?
    $endgroup$
    – David Refaeli
    Jul 14 at 17:36










  • $begingroup$
    I think Gung's answer is what I would say elaborating on my comment.
    $endgroup$
    – Michael Chernick
    Jul 14 at 18:54










  • $begingroup$
    Related: stats.stackexchange.com/questions/63966/…
    $endgroup$
    – Sycorax
    Jul 14 at 19:48







7




7




$begingroup$
There is such a thing as minimizing perpendicular distance. It is called Deming Regression. Ordinary linear regression assums the x value are known and the only error is in y. That is often a reasonable assumption.
$endgroup$
– Michael Chernick
Jul 14 at 17:14




$begingroup$
There is such a thing as minimizing perpendicular distance. It is called Deming Regression. Ordinary linear regression assums the x value are known and the only error is in y. That is often a reasonable assumption.
$endgroup$
– Michael Chernick
Jul 14 at 17:14




2




2




$begingroup$
Sometimes the ultimate purpose of finding the regression line is to make predictions of $hat Y_i$'s based on future $x_i$'s. (There is a 'prediction interval' formula for that.) Then it is vertical distance that matters.
$endgroup$
– BruceET
Jul 14 at 17:29




$begingroup$
Sometimes the ultimate purpose of finding the regression line is to make predictions of $hat Y_i$'s based on future $x_i$'s. (There is a 'prediction interval' formula for that.) Then it is vertical distance that matters.
$endgroup$
– BruceET
Jul 14 at 17:29




1




1




$begingroup$
@MichaelChernick I think your one-liner explained it best, maybe you can elaborate it a bit, and post it as an answer?
$endgroup$
– David Refaeli
Jul 14 at 17:36




$begingroup$
@MichaelChernick I think your one-liner explained it best, maybe you can elaborate it a bit, and post it as an answer?
$endgroup$
– David Refaeli
Jul 14 at 17:36












$begingroup$
I think Gung's answer is what I would say elaborating on my comment.
$endgroup$
– Michael Chernick
Jul 14 at 18:54




$begingroup$
I think Gung's answer is what I would say elaborating on my comment.
$endgroup$
– Michael Chernick
Jul 14 at 18:54












$begingroup$
Related: stats.stackexchange.com/questions/63966/…
$endgroup$
– Sycorax
Jul 14 at 19:48




$begingroup$
Related: stats.stackexchange.com/questions/63966/…
$endgroup$
– Sycorax
Jul 14 at 19:48










2 Answers
2






active

oldest

votes


















11












$begingroup$

Vertical distance is a "real distance". The distance from a given point to any point on the line is a "real distance". The question for how to fit the best regression line is which of the infinite possible distances makes the most sense for how we are thinking about our model. That is, any number of possible loss functions could be right, it depends on our situation, our data, and our goals (it may help you to read my answer to: What is the difference between linear regression on y with x and x with y?).



It is often the case that vertical distances make the most sense, though. This would be the case when we are thinking of $Y$ as a function of $X$, which would make sense in a true experiment where $X$ is randomly assigned and the values are independently manipulated, and $Y$ is measured as a response to that intervention. It can also make sense in a predictive setting, where we want to be able to predict values of $Y$ based on knowledge of $X$ and the predictive relationship that we establish. Then, when we want to make predictions about unknown $Y$ values in the future, we will know and be using $X$. In each of these cases, we are treating $X$ as fixed and known, and that $Y$ is understood to be a function of $X$ in some sense. However, it can be the case that that mental model does not fit your situation, in which case, you would need to use a different loss function. There is no absolute 'correct' distance irrespective of the situation.






share|cite|improve this answer









$endgroup$




















    0












    $begingroup$

    Summing up Michael Chernick comment and gung answer:



    Both vertical and point distances are "real" - it all depends on the situation.



    Ordinary linear regression assumes the $X$ value are known and the only error is in the $Y$'s. That is often a reasonable assumption.



    If you assume error in the $X$'s as well, you get what is called a Deming regression, which fits a point distance.






    share|cite|improve this answer









    $endgroup$








    • 2




      $begingroup$
      I don't see that this answer needs to be downvoted.
      $endgroup$
      – gung
      Jul 15 at 11:20



















    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    11












    $begingroup$

    Vertical distance is a "real distance". The distance from a given point to any point on the line is a "real distance". The question for how to fit the best regression line is which of the infinite possible distances makes the most sense for how we are thinking about our model. That is, any number of possible loss functions could be right, it depends on our situation, our data, and our goals (it may help you to read my answer to: What is the difference between linear regression on y with x and x with y?).



    It is often the case that vertical distances make the most sense, though. This would be the case when we are thinking of $Y$ as a function of $X$, which would make sense in a true experiment where $X$ is randomly assigned and the values are independently manipulated, and $Y$ is measured as a response to that intervention. It can also make sense in a predictive setting, where we want to be able to predict values of $Y$ based on knowledge of $X$ and the predictive relationship that we establish. Then, when we want to make predictions about unknown $Y$ values in the future, we will know and be using $X$. In each of these cases, we are treating $X$ as fixed and known, and that $Y$ is understood to be a function of $X$ in some sense. However, it can be the case that that mental model does not fit your situation, in which case, you would need to use a different loss function. There is no absolute 'correct' distance irrespective of the situation.






    share|cite|improve this answer









    $endgroup$

















      11












      $begingroup$

      Vertical distance is a "real distance". The distance from a given point to any point on the line is a "real distance". The question for how to fit the best regression line is which of the infinite possible distances makes the most sense for how we are thinking about our model. That is, any number of possible loss functions could be right, it depends on our situation, our data, and our goals (it may help you to read my answer to: What is the difference between linear regression on y with x and x with y?).



      It is often the case that vertical distances make the most sense, though. This would be the case when we are thinking of $Y$ as a function of $X$, which would make sense in a true experiment where $X$ is randomly assigned and the values are independently manipulated, and $Y$ is measured as a response to that intervention. It can also make sense in a predictive setting, where we want to be able to predict values of $Y$ based on knowledge of $X$ and the predictive relationship that we establish. Then, when we want to make predictions about unknown $Y$ values in the future, we will know and be using $X$. In each of these cases, we are treating $X$ as fixed and known, and that $Y$ is understood to be a function of $X$ in some sense. However, it can be the case that that mental model does not fit your situation, in which case, you would need to use a different loss function. There is no absolute 'correct' distance irrespective of the situation.






      share|cite|improve this answer









      $endgroup$















        11












        11








        11





        $begingroup$

        Vertical distance is a "real distance". The distance from a given point to any point on the line is a "real distance". The question for how to fit the best regression line is which of the infinite possible distances makes the most sense for how we are thinking about our model. That is, any number of possible loss functions could be right, it depends on our situation, our data, and our goals (it may help you to read my answer to: What is the difference between linear regression on y with x and x with y?).



        It is often the case that vertical distances make the most sense, though. This would be the case when we are thinking of $Y$ as a function of $X$, which would make sense in a true experiment where $X$ is randomly assigned and the values are independently manipulated, and $Y$ is measured as a response to that intervention. It can also make sense in a predictive setting, where we want to be able to predict values of $Y$ based on knowledge of $X$ and the predictive relationship that we establish. Then, when we want to make predictions about unknown $Y$ values in the future, we will know and be using $X$. In each of these cases, we are treating $X$ as fixed and known, and that $Y$ is understood to be a function of $X$ in some sense. However, it can be the case that that mental model does not fit your situation, in which case, you would need to use a different loss function. There is no absolute 'correct' distance irrespective of the situation.






        share|cite|improve this answer









        $endgroup$



        Vertical distance is a "real distance". The distance from a given point to any point on the line is a "real distance". The question for how to fit the best regression line is which of the infinite possible distances makes the most sense for how we are thinking about our model. That is, any number of possible loss functions could be right, it depends on our situation, our data, and our goals (it may help you to read my answer to: What is the difference between linear regression on y with x and x with y?).



        It is often the case that vertical distances make the most sense, though. This would be the case when we are thinking of $Y$ as a function of $X$, which would make sense in a true experiment where $X$ is randomly assigned and the values are independently manipulated, and $Y$ is measured as a response to that intervention. It can also make sense in a predictive setting, where we want to be able to predict values of $Y$ based on knowledge of $X$ and the predictive relationship that we establish. Then, when we want to make predictions about unknown $Y$ values in the future, we will know and be using $X$. In each of these cases, we are treating $X$ as fixed and known, and that $Y$ is understood to be a function of $X$ in some sense. However, it can be the case that that mental model does not fit your situation, in which case, you would need to use a different loss function. There is no absolute 'correct' distance irrespective of the situation.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered Jul 14 at 17:24









        gunggung

        112k34 gold badges277 silver badges545 bronze badges




        112k34 gold badges277 silver badges545 bronze badges























            0












            $begingroup$

            Summing up Michael Chernick comment and gung answer:



            Both vertical and point distances are "real" - it all depends on the situation.



            Ordinary linear regression assumes the $X$ value are known and the only error is in the $Y$'s. That is often a reasonable assumption.



            If you assume error in the $X$'s as well, you get what is called a Deming regression, which fits a point distance.






            share|cite|improve this answer









            $endgroup$








            • 2




              $begingroup$
              I don't see that this answer needs to be downvoted.
              $endgroup$
              – gung
              Jul 15 at 11:20















            0












            $begingroup$

            Summing up Michael Chernick comment and gung answer:



            Both vertical and point distances are "real" - it all depends on the situation.



            Ordinary linear regression assumes the $X$ value are known and the only error is in the $Y$'s. That is often a reasonable assumption.



            If you assume error in the $X$'s as well, you get what is called a Deming regression, which fits a point distance.






            share|cite|improve this answer









            $endgroup$








            • 2




              $begingroup$
              I don't see that this answer needs to be downvoted.
              $endgroup$
              – gung
              Jul 15 at 11:20













            0












            0








            0





            $begingroup$

            Summing up Michael Chernick comment and gung answer:



            Both vertical and point distances are "real" - it all depends on the situation.



            Ordinary linear regression assumes the $X$ value are known and the only error is in the $Y$'s. That is often a reasonable assumption.



            If you assume error in the $X$'s as well, you get what is called a Deming regression, which fits a point distance.






            share|cite|improve this answer









            $endgroup$



            Summing up Michael Chernick comment and gung answer:



            Both vertical and point distances are "real" - it all depends on the situation.



            Ordinary linear regression assumes the $X$ value are known and the only error is in the $Y$'s. That is often a reasonable assumption.



            If you assume error in the $X$'s as well, you get what is called a Deming regression, which fits a point distance.







            share|cite|improve this answer












            share|cite|improve this answer



            share|cite|improve this answer










            answered Jul 14 at 19:10









            David RefaeliDavid Refaeli

            1366 bronze badges




            1366 bronze badges







            • 2




              $begingroup$
              I don't see that this answer needs to be downvoted.
              $endgroup$
              – gung
              Jul 15 at 11:20












            • 2




              $begingroup$
              I don't see that this answer needs to be downvoted.
              $endgroup$
              – gung
              Jul 15 at 11:20







            2




            2




            $begingroup$
            I don't see that this answer needs to be downvoted.
            $endgroup$
            – gung
            Jul 15 at 11:20




            $begingroup$
            I don't see that this answer needs to be downvoted.
            $endgroup$
            – gung
            Jul 15 at 11:20



            Popular posts from this blog

            Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

            Circuit construction for execution of conditional statements using least significant bitHow are two different registers being used as “control”?How exactly is the stated composite state of the two registers being produced using the $R_zz$ controlled rotations?Efficiently performing controlled rotations in HHLWould this quantum algorithm implementation work?How to prepare a superposed states of odd integers from $1$ to $sqrtN$?Why is this implementation of the order finding algorithm not working?Circuit construction for Hamiltonian simulationHow can I invert the least significant bit of a certain term of a superposed state?Implementing an oracleImplementing a controlled sum operation

            Magento 2 “No Payment Methods” in Admin New OrderHow to integrate Paypal Express Checkout with the Magento APIMagento 1.5 - Sales > Order > edit order and shipping methods disappearAuto Invoice Check/Money Order Payment methodAdd more simple payment methods?Shipping methods not showingWhat should I do to change payment methods if changing the configuration has no effects?1.9 - No Payment Methods showing upMy Payment Methods not Showing for downloadable/virtual product when checkout?Magento2 API to access internal payment methodHow to call an existing payment methods in the registration form?