Are the errors in this formulation of the simple linear regression model random variables?Expected Value and Variance of Estimation of Slope Parameter $beta_1$ in Simple Linear RegressionIn simple linear regression, what is the covariance between the error term and the residual?How to get the Standard Error of linear-regression parameters?Ridge regression formulation as constrained versus penalized: How are they equivalent?Linear regression and interpretation of random variablesProblem regarding the concept of random error component in simple regression model and the nature of its varianceErrors and residuals in linear regressionConfusion in terminologies for simple linear regression modelShow that target variable is gaussian in simple linear regressionFormulating quantile regression as Linear Programming problem?

What is "ass door"?

What is a reasonable time for modern human society to adapt to dungeons?

Can two figures have the same area, perimeter, and same number of segments have different shape?

Does static fire reduce reliability?

Keeping an "hot eyeball planet" wet

This message is flooding my syslog, how to find where it comes from?

kids pooling money for Lego League and taxes

Do Rabbis get punished in Heaven for wrong interpretations or claims?

What should I say when a company asks you why someone (a friend) who was fired left?

What is the difference between $path and $PATH (lowercase versus uppercase) with zsh?

The seven story archetypes. Are they truly all of them?

Sextortion with actual password not found in leaks

What exactly makes a General Products hull nearly indestructible?

Explanation for a joke about a three-legged dog that walks into a bar

How may I shorten this shell script?

Sitecore Powershell extensions module compatibility with Sitecore 9.2

Where is this photo of a group of hikers taken? Is it really in the Ural?

Using "Kollege" as "university friend"?

Why do websites not use the HaveIBeenPwned API to warn users about exposed passwords?

What does the minus sign mean in measurements in datasheet footprint drawings?

Why did Saturn V not head straight to the moon?

Current relevance: "She has broken her leg" vs. "She broke her leg yesterday"

How do campaign rallies gain candidates votes?

No-cloning theorem does not seem precise



Are the errors in this formulation of the simple linear regression model random variables?


Expected Value and Variance of Estimation of Slope Parameter $beta_1$ in Simple Linear RegressionIn simple linear regression, what is the covariance between the error term and the residual?How to get the Standard Error of linear-regression parameters?Ridge regression formulation as constrained versus penalized: How are they equivalent?Linear regression and interpretation of random variablesProblem regarding the concept of random error component in simple regression model and the nature of its varianceErrors and residuals in linear regressionConfusion in terminologies for simple linear regression modelShow that target variable is gaussian in simple linear regressionFormulating quantile regression as Linear Programming problem?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








4












$begingroup$


On page 21 of Applied Linear Regression, fourth edition, by Sanford Weisberg, the error $e_i$ for case $i$ under the simple linear regression model is defined to be $y_i - E(Y | X = x_i)$, where $E(Y | X = x_i)$ is assumed to equal $beta_0 + beta_1 x_i$ for some unknown $beta_0, beta_1 in mathbbR$. The book says that




The errors $e_i$ depend on unknown parameters in the mean function and so are not observable quantities. They are random variables and correspond to the vertical distance between the point $y_i$ and the mean function $E(Y | X = x_i)$.




It doesn't seem to me like $e_i$ is a random variable, because it's a function of $y_i$ and $x_i$, which are non-random, observed values. Why can $e_i$ be considered a random variable?










share|cite|improve this question









$endgroup$


















    4












    $begingroup$


    On page 21 of Applied Linear Regression, fourth edition, by Sanford Weisberg, the error $e_i$ for case $i$ under the simple linear regression model is defined to be $y_i - E(Y | X = x_i)$, where $E(Y | X = x_i)$ is assumed to equal $beta_0 + beta_1 x_i$ for some unknown $beta_0, beta_1 in mathbbR$. The book says that




    The errors $e_i$ depend on unknown parameters in the mean function and so are not observable quantities. They are random variables and correspond to the vertical distance between the point $y_i$ and the mean function $E(Y | X = x_i)$.




    It doesn't seem to me like $e_i$ is a random variable, because it's a function of $y_i$ and $x_i$, which are non-random, observed values. Why can $e_i$ be considered a random variable?










    share|cite|improve this question









    $endgroup$














      4












      4








      4


      1



      $begingroup$


      On page 21 of Applied Linear Regression, fourth edition, by Sanford Weisberg, the error $e_i$ for case $i$ under the simple linear regression model is defined to be $y_i - E(Y | X = x_i)$, where $E(Y | X = x_i)$ is assumed to equal $beta_0 + beta_1 x_i$ for some unknown $beta_0, beta_1 in mathbbR$. The book says that




      The errors $e_i$ depend on unknown parameters in the mean function and so are not observable quantities. They are random variables and correspond to the vertical distance between the point $y_i$ and the mean function $E(Y | X = x_i)$.




      It doesn't seem to me like $e_i$ is a random variable, because it's a function of $y_i$ and $x_i$, which are non-random, observed values. Why can $e_i$ be considered a random variable?










      share|cite|improve this question









      $endgroup$




      On page 21 of Applied Linear Regression, fourth edition, by Sanford Weisberg, the error $e_i$ for case $i$ under the simple linear regression model is defined to be $y_i - E(Y | X = x_i)$, where $E(Y | X = x_i)$ is assumed to equal $beta_0 + beta_1 x_i$ for some unknown $beta_0, beta_1 in mathbbR$. The book says that




      The errors $e_i$ depend on unknown parameters in the mean function and so are not observable quantities. They are random variables and correspond to the vertical distance between the point $y_i$ and the mean function $E(Y | X = x_i)$.




      It doesn't seem to me like $e_i$ is a random variable, because it's a function of $y_i$ and $x_i$, which are non-random, observed values. Why can $e_i$ be considered a random variable?







      regression random-variable assumptions






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Jul 15 at 15:23









      VKVVKV

      1353 bronze badges




      1353 bronze badges




















          2 Answers
          2






          active

          oldest

          votes


















          6












          $begingroup$

          I looked up your citation (4th edition, page 21) because I found it very alarming and was relieved to find is actually given as:



          $$ hate_i = y_i − widehatE(Y|X=x_i) = y_i - (hatbeta_0 + hatbeta_1) tag2.3 $$



          Which is still confusing, I grant you, and the difference isn't actually germane to your question, but at least it isn't patently false. I'll explain why I found it alarming before discussing your (unrelated, I think) question. The "hat" indicates "estimated", usually by MLE in the context of linear regression, and there is a crucial distinction between "true errors" which are denoted $epsilon_i$ and are normally distributed and i.i.d., and "residuals which are denoted $e_i$ and are not i.i.d. The formula without the hats would imply the two are exactly equal which is not the case.



          On to your real question, which boils down to, "are the given data $x_i$ and $y_i$ random or not?"



          If you believe the pairs $(x_i, y_i)$ are known and not-random, e.g. that is, if you believe that $forall; 1 leq i leq n,, (x_i, y_i) in mathbbR times mathbbR $, then the residuals $e_i$ are also known and non-random, e.g. $forall; 1 leq i leq n,, e_i in mathbbR$. This is because there is a deterministic function for the "best" parameters $hatbeta_0$ and $hatbeta_1$ from those observations, and then a deterministic function for the residuals in terms of those parameters. This point of view is useful and allows us to derive the MLE estimators of $beta$, for example. It is also the most intuitive view to take when your sitting in front of a concrete, real-world dataset.



          However, it kind of puts the cart before the horse and basically shuts down certain kinds of statistical analysis. For example, we cannot talk about the "distribution" of $hatbeta_1$ because it is not a random variable and therefore has no distribution! How can we then talk something like the Wald test? Likewise, how do we talk about the "distribution" of residuals so that we can say whether one is an outlier or not?



          The way this is done is treating the dataset itself as random. When we want to do statistical inference on a known dataset, we can then treat the known values as a realization of the random dataset. The exact construction is a little bit pedantic but and is often omitted but it helps to go through it at least once. First, we say that $X$ and $Y$ are two random variables with some joint probability distribution $F_X,Y(mathbfbeta, sigma^2)$ with parameters $mathbfbeta = [beta_0, beta_1]^T $ and $sigma$. $F_X,Y$ is specified by the model $Y = Xbeta_0 + beta_1 + epsilon, epsilon sim mathcal(0, sigma^2)$. Now, imagine that we have $n$ i.i.d. copies of $F_X,Y$ that we combine into one big joint probability function $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$.



          Now we can imagine the dataset $(x_i, y_i)$ for $i=1,...,n$ not merely as some known set of numbers, but as a realization sampled from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Each time we sample, we don't just get one pair of numbers, we get $n$ pairs of numbers: a brand new dataset. But that means the parameters $hatbeta$ get new estimates, and we then calculate new residuals $e_i$, right?



          Instead of thinking of this as repeated sampling, which is somewhat crude, we can express this entirely in the algebra of random variables. It can be expressed as two $n$-dimensional random vectors $vecX$ and $vecY$ drawn from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Now $hatbeta_0$ and $hatbeta_1$ are random variables because they are functions of $(vecX, vecY)$. Likewise, all the $e_i$ are random variables because they are functions of $(vecX, vecY)$.



          This state of affairs is much better, because now we can make statements like "The set of residuals $e_i$ cannot be independent because they always sum exactly to zero" or "the standard error of $hatbeta_1$ follows a t-distribution." without talking literal nonsense. (Both of these statements only make sense if their subjects are random variables.)



          In the real world we can't always go and get a brand-new, randomly sampled dataset. We can approximate this with something like the bootstrap, of course, but doing it for real isn't usually practical. But doing it conceptually allows us to think clearly about how randomness during sampling would affect our regression.



          You'll note that I did not introduce new notation for $e_i$ and $hatbeta$ but simply said, "now these things, which we previously thought of a concrete realizations, will now be treated as random variables." As far as I can tell, you just have to be on your toes for this kind of signposting - the same kind you found in your textbook - to indicate whether symbols are referring to random or non-random variables because while there are conventions (such as using uppercase roman letters for random variables) they are not consistently applied. If the author tells you $e_i$ is a random variable, he is telling you is also viewing $x_i$ and $y_i$ as random variables.






          share|cite|improve this answer











          $endgroup$




















            1












            $begingroup$

            In simple linear regression, we assume that the observations are randomly perturbed from the conditional expected value, i.e. $E[Y|X=x_i]$; so, each of your observations are assumed to be generated from a model of the form: $$Y=beta_0+beta_1X+epsilon , epsilonsim N(0,sigma^2)$$



            This makes each $epsilon_i$ a RV by definition. Think about a box where you give $x_i$ and get $y_i$, and you never know what's inside, how much error is introduced by the box etc. Even if we really know that the relation is of the form given above, we don't know the true $beta_0,beta_1$. If we had known those quantities, we would easily recover $epsilon_i$. Instead, we estimate those, and get residuals.






            share|cite|improve this answer









            $endgroup$















              Your Answer








              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "65"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f417529%2fare-the-errors-in-this-formulation-of-the-simple-linear-regression-model-random%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              6












              $begingroup$

              I looked up your citation (4th edition, page 21) because I found it very alarming and was relieved to find is actually given as:



              $$ hate_i = y_i − widehatE(Y|X=x_i) = y_i - (hatbeta_0 + hatbeta_1) tag2.3 $$



              Which is still confusing, I grant you, and the difference isn't actually germane to your question, but at least it isn't patently false. I'll explain why I found it alarming before discussing your (unrelated, I think) question. The "hat" indicates "estimated", usually by MLE in the context of linear regression, and there is a crucial distinction between "true errors" which are denoted $epsilon_i$ and are normally distributed and i.i.d., and "residuals which are denoted $e_i$ and are not i.i.d. The formula without the hats would imply the two are exactly equal which is not the case.



              On to your real question, which boils down to, "are the given data $x_i$ and $y_i$ random or not?"



              If you believe the pairs $(x_i, y_i)$ are known and not-random, e.g. that is, if you believe that $forall; 1 leq i leq n,, (x_i, y_i) in mathbbR times mathbbR $, then the residuals $e_i$ are also known and non-random, e.g. $forall; 1 leq i leq n,, e_i in mathbbR$. This is because there is a deterministic function for the "best" parameters $hatbeta_0$ and $hatbeta_1$ from those observations, and then a deterministic function for the residuals in terms of those parameters. This point of view is useful and allows us to derive the MLE estimators of $beta$, for example. It is also the most intuitive view to take when your sitting in front of a concrete, real-world dataset.



              However, it kind of puts the cart before the horse and basically shuts down certain kinds of statistical analysis. For example, we cannot talk about the "distribution" of $hatbeta_1$ because it is not a random variable and therefore has no distribution! How can we then talk something like the Wald test? Likewise, how do we talk about the "distribution" of residuals so that we can say whether one is an outlier or not?



              The way this is done is treating the dataset itself as random. When we want to do statistical inference on a known dataset, we can then treat the known values as a realization of the random dataset. The exact construction is a little bit pedantic but and is often omitted but it helps to go through it at least once. First, we say that $X$ and $Y$ are two random variables with some joint probability distribution $F_X,Y(mathbfbeta, sigma^2)$ with parameters $mathbfbeta = [beta_0, beta_1]^T $ and $sigma$. $F_X,Y$ is specified by the model $Y = Xbeta_0 + beta_1 + epsilon, epsilon sim mathcal(0, sigma^2)$. Now, imagine that we have $n$ i.i.d. copies of $F_X,Y$ that we combine into one big joint probability function $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$.



              Now we can imagine the dataset $(x_i, y_i)$ for $i=1,...,n$ not merely as some known set of numbers, but as a realization sampled from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Each time we sample, we don't just get one pair of numbers, we get $n$ pairs of numbers: a brand new dataset. But that means the parameters $hatbeta$ get new estimates, and we then calculate new residuals $e_i$, right?



              Instead of thinking of this as repeated sampling, which is somewhat crude, we can express this entirely in the algebra of random variables. It can be expressed as two $n$-dimensional random vectors $vecX$ and $vecY$ drawn from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Now $hatbeta_0$ and $hatbeta_1$ are random variables because they are functions of $(vecX, vecY)$. Likewise, all the $e_i$ are random variables because they are functions of $(vecX, vecY)$.



              This state of affairs is much better, because now we can make statements like "The set of residuals $e_i$ cannot be independent because they always sum exactly to zero" or "the standard error of $hatbeta_1$ follows a t-distribution." without talking literal nonsense. (Both of these statements only make sense if their subjects are random variables.)



              In the real world we can't always go and get a brand-new, randomly sampled dataset. We can approximate this with something like the bootstrap, of course, but doing it for real isn't usually practical. But doing it conceptually allows us to think clearly about how randomness during sampling would affect our regression.



              You'll note that I did not introduce new notation for $e_i$ and $hatbeta$ but simply said, "now these things, which we previously thought of a concrete realizations, will now be treated as random variables." As far as I can tell, you just have to be on your toes for this kind of signposting - the same kind you found in your textbook - to indicate whether symbols are referring to random or non-random variables because while there are conventions (such as using uppercase roman letters for random variables) they are not consistently applied. If the author tells you $e_i$ is a random variable, he is telling you is also viewing $x_i$ and $y_i$ as random variables.






              share|cite|improve this answer











              $endgroup$

















                6












                $begingroup$

                I looked up your citation (4th edition, page 21) because I found it very alarming and was relieved to find is actually given as:



                $$ hate_i = y_i − widehatE(Y|X=x_i) = y_i - (hatbeta_0 + hatbeta_1) tag2.3 $$



                Which is still confusing, I grant you, and the difference isn't actually germane to your question, but at least it isn't patently false. I'll explain why I found it alarming before discussing your (unrelated, I think) question. The "hat" indicates "estimated", usually by MLE in the context of linear regression, and there is a crucial distinction between "true errors" which are denoted $epsilon_i$ and are normally distributed and i.i.d., and "residuals which are denoted $e_i$ and are not i.i.d. The formula without the hats would imply the two are exactly equal which is not the case.



                On to your real question, which boils down to, "are the given data $x_i$ and $y_i$ random or not?"



                If you believe the pairs $(x_i, y_i)$ are known and not-random, e.g. that is, if you believe that $forall; 1 leq i leq n,, (x_i, y_i) in mathbbR times mathbbR $, then the residuals $e_i$ are also known and non-random, e.g. $forall; 1 leq i leq n,, e_i in mathbbR$. This is because there is a deterministic function for the "best" parameters $hatbeta_0$ and $hatbeta_1$ from those observations, and then a deterministic function for the residuals in terms of those parameters. This point of view is useful and allows us to derive the MLE estimators of $beta$, for example. It is also the most intuitive view to take when your sitting in front of a concrete, real-world dataset.



                However, it kind of puts the cart before the horse and basically shuts down certain kinds of statistical analysis. For example, we cannot talk about the "distribution" of $hatbeta_1$ because it is not a random variable and therefore has no distribution! How can we then talk something like the Wald test? Likewise, how do we talk about the "distribution" of residuals so that we can say whether one is an outlier or not?



                The way this is done is treating the dataset itself as random. When we want to do statistical inference on a known dataset, we can then treat the known values as a realization of the random dataset. The exact construction is a little bit pedantic but and is often omitted but it helps to go through it at least once. First, we say that $X$ and $Y$ are two random variables with some joint probability distribution $F_X,Y(mathbfbeta, sigma^2)$ with parameters $mathbfbeta = [beta_0, beta_1]^T $ and $sigma$. $F_X,Y$ is specified by the model $Y = Xbeta_0 + beta_1 + epsilon, epsilon sim mathcal(0, sigma^2)$. Now, imagine that we have $n$ i.i.d. copies of $F_X,Y$ that we combine into one big joint probability function $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$.



                Now we can imagine the dataset $(x_i, y_i)$ for $i=1,...,n$ not merely as some known set of numbers, but as a realization sampled from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Each time we sample, we don't just get one pair of numbers, we get $n$ pairs of numbers: a brand new dataset. But that means the parameters $hatbeta$ get new estimates, and we then calculate new residuals $e_i$, right?



                Instead of thinking of this as repeated sampling, which is somewhat crude, we can express this entirely in the algebra of random variables. It can be expressed as two $n$-dimensional random vectors $vecX$ and $vecY$ drawn from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Now $hatbeta_0$ and $hatbeta_1$ are random variables because they are functions of $(vecX, vecY)$. Likewise, all the $e_i$ are random variables because they are functions of $(vecX, vecY)$.



                This state of affairs is much better, because now we can make statements like "The set of residuals $e_i$ cannot be independent because they always sum exactly to zero" or "the standard error of $hatbeta_1$ follows a t-distribution." without talking literal nonsense. (Both of these statements only make sense if their subjects are random variables.)



                In the real world we can't always go and get a brand-new, randomly sampled dataset. We can approximate this with something like the bootstrap, of course, but doing it for real isn't usually practical. But doing it conceptually allows us to think clearly about how randomness during sampling would affect our regression.



                You'll note that I did not introduce new notation for $e_i$ and $hatbeta$ but simply said, "now these things, which we previously thought of a concrete realizations, will now be treated as random variables." As far as I can tell, you just have to be on your toes for this kind of signposting - the same kind you found in your textbook - to indicate whether symbols are referring to random or non-random variables because while there are conventions (such as using uppercase roman letters for random variables) they are not consistently applied. If the author tells you $e_i$ is a random variable, he is telling you is also viewing $x_i$ and $y_i$ as random variables.






                share|cite|improve this answer











                $endgroup$















                  6












                  6








                  6





                  $begingroup$

                  I looked up your citation (4th edition, page 21) because I found it very alarming and was relieved to find is actually given as:



                  $$ hate_i = y_i − widehatE(Y|X=x_i) = y_i - (hatbeta_0 + hatbeta_1) tag2.3 $$



                  Which is still confusing, I grant you, and the difference isn't actually germane to your question, but at least it isn't patently false. I'll explain why I found it alarming before discussing your (unrelated, I think) question. The "hat" indicates "estimated", usually by MLE in the context of linear regression, and there is a crucial distinction between "true errors" which are denoted $epsilon_i$ and are normally distributed and i.i.d., and "residuals which are denoted $e_i$ and are not i.i.d. The formula without the hats would imply the two are exactly equal which is not the case.



                  On to your real question, which boils down to, "are the given data $x_i$ and $y_i$ random or not?"



                  If you believe the pairs $(x_i, y_i)$ are known and not-random, e.g. that is, if you believe that $forall; 1 leq i leq n,, (x_i, y_i) in mathbbR times mathbbR $, then the residuals $e_i$ are also known and non-random, e.g. $forall; 1 leq i leq n,, e_i in mathbbR$. This is because there is a deterministic function for the "best" parameters $hatbeta_0$ and $hatbeta_1$ from those observations, and then a deterministic function for the residuals in terms of those parameters. This point of view is useful and allows us to derive the MLE estimators of $beta$, for example. It is also the most intuitive view to take when your sitting in front of a concrete, real-world dataset.



                  However, it kind of puts the cart before the horse and basically shuts down certain kinds of statistical analysis. For example, we cannot talk about the "distribution" of $hatbeta_1$ because it is not a random variable and therefore has no distribution! How can we then talk something like the Wald test? Likewise, how do we talk about the "distribution" of residuals so that we can say whether one is an outlier or not?



                  The way this is done is treating the dataset itself as random. When we want to do statistical inference on a known dataset, we can then treat the known values as a realization of the random dataset. The exact construction is a little bit pedantic but and is often omitted but it helps to go through it at least once. First, we say that $X$ and $Y$ are two random variables with some joint probability distribution $F_X,Y(mathbfbeta, sigma^2)$ with parameters $mathbfbeta = [beta_0, beta_1]^T $ and $sigma$. $F_X,Y$ is specified by the model $Y = Xbeta_0 + beta_1 + epsilon, epsilon sim mathcal(0, sigma^2)$. Now, imagine that we have $n$ i.i.d. copies of $F_X,Y$ that we combine into one big joint probability function $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$.



                  Now we can imagine the dataset $(x_i, y_i)$ for $i=1,...,n$ not merely as some known set of numbers, but as a realization sampled from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Each time we sample, we don't just get one pair of numbers, we get $n$ pairs of numbers: a brand new dataset. But that means the parameters $hatbeta$ get new estimates, and we then calculate new residuals $e_i$, right?



                  Instead of thinking of this as repeated sampling, which is somewhat crude, we can express this entirely in the algebra of random variables. It can be expressed as two $n$-dimensional random vectors $vecX$ and $vecY$ drawn from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Now $hatbeta_0$ and $hatbeta_1$ are random variables because they are functions of $(vecX, vecY)$. Likewise, all the $e_i$ are random variables because they are functions of $(vecX, vecY)$.



                  This state of affairs is much better, because now we can make statements like "The set of residuals $e_i$ cannot be independent because they always sum exactly to zero" or "the standard error of $hatbeta_1$ follows a t-distribution." without talking literal nonsense. (Both of these statements only make sense if their subjects are random variables.)



                  In the real world we can't always go and get a brand-new, randomly sampled dataset. We can approximate this with something like the bootstrap, of course, but doing it for real isn't usually practical. But doing it conceptually allows us to think clearly about how randomness during sampling would affect our regression.



                  You'll note that I did not introduce new notation for $e_i$ and $hatbeta$ but simply said, "now these things, which we previously thought of a concrete realizations, will now be treated as random variables." As far as I can tell, you just have to be on your toes for this kind of signposting - the same kind you found in your textbook - to indicate whether symbols are referring to random or non-random variables because while there are conventions (such as using uppercase roman letters for random variables) they are not consistently applied. If the author tells you $e_i$ is a random variable, he is telling you is also viewing $x_i$ and $y_i$ as random variables.






                  share|cite|improve this answer











                  $endgroup$



                  I looked up your citation (4th edition, page 21) because I found it very alarming and was relieved to find is actually given as:



                  $$ hate_i = y_i − widehatE(Y|X=x_i) = y_i - (hatbeta_0 + hatbeta_1) tag2.3 $$



                  Which is still confusing, I grant you, and the difference isn't actually germane to your question, but at least it isn't patently false. I'll explain why I found it alarming before discussing your (unrelated, I think) question. The "hat" indicates "estimated", usually by MLE in the context of linear regression, and there is a crucial distinction between "true errors" which are denoted $epsilon_i$ and are normally distributed and i.i.d., and "residuals which are denoted $e_i$ and are not i.i.d. The formula without the hats would imply the two are exactly equal which is not the case.



                  On to your real question, which boils down to, "are the given data $x_i$ and $y_i$ random or not?"



                  If you believe the pairs $(x_i, y_i)$ are known and not-random, e.g. that is, if you believe that $forall; 1 leq i leq n,, (x_i, y_i) in mathbbR times mathbbR $, then the residuals $e_i$ are also known and non-random, e.g. $forall; 1 leq i leq n,, e_i in mathbbR$. This is because there is a deterministic function for the "best" parameters $hatbeta_0$ and $hatbeta_1$ from those observations, and then a deterministic function for the residuals in terms of those parameters. This point of view is useful and allows us to derive the MLE estimators of $beta$, for example. It is also the most intuitive view to take when your sitting in front of a concrete, real-world dataset.



                  However, it kind of puts the cart before the horse and basically shuts down certain kinds of statistical analysis. For example, we cannot talk about the "distribution" of $hatbeta_1$ because it is not a random variable and therefore has no distribution! How can we then talk something like the Wald test? Likewise, how do we talk about the "distribution" of residuals so that we can say whether one is an outlier or not?



                  The way this is done is treating the dataset itself as random. When we want to do statistical inference on a known dataset, we can then treat the known values as a realization of the random dataset. The exact construction is a little bit pedantic but and is often omitted but it helps to go through it at least once. First, we say that $X$ and $Y$ are two random variables with some joint probability distribution $F_X,Y(mathbfbeta, sigma^2)$ with parameters $mathbfbeta = [beta_0, beta_1]^T $ and $sigma$. $F_X,Y$ is specified by the model $Y = Xbeta_0 + beta_1 + epsilon, epsilon sim mathcal(0, sigma^2)$. Now, imagine that we have $n$ i.i.d. copies of $F_X,Y$ that we combine into one big joint probability function $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$.



                  Now we can imagine the dataset $(x_i, y_i)$ for $i=1,...,n$ not merely as some known set of numbers, but as a realization sampled from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Each time we sample, we don't just get one pair of numbers, we get $n$ pairs of numbers: a brand new dataset. But that means the parameters $hatbeta$ get new estimates, and we then calculate new residuals $e_i$, right?



                  Instead of thinking of this as repeated sampling, which is somewhat crude, we can express this entirely in the algebra of random variables. It can be expressed as two $n$-dimensional random vectors $vecX$ and $vecY$ drawn from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Now $hatbeta_0$ and $hatbeta_1$ are random variables because they are functions of $(vecX, vecY)$. Likewise, all the $e_i$ are random variables because they are functions of $(vecX, vecY)$.



                  This state of affairs is much better, because now we can make statements like "The set of residuals $e_i$ cannot be independent because they always sum exactly to zero" or "the standard error of $hatbeta_1$ follows a t-distribution." without talking literal nonsense. (Both of these statements only make sense if their subjects are random variables.)



                  In the real world we can't always go and get a brand-new, randomly sampled dataset. We can approximate this with something like the bootstrap, of course, but doing it for real isn't usually practical. But doing it conceptually allows us to think clearly about how randomness during sampling would affect our regression.



                  You'll note that I did not introduce new notation for $e_i$ and $hatbeta$ but simply said, "now these things, which we previously thought of a concrete realizations, will now be treated as random variables." As far as I can tell, you just have to be on your toes for this kind of signposting - the same kind you found in your textbook - to indicate whether symbols are referring to random or non-random variables because while there are conventions (such as using uppercase roman letters for random variables) they are not consistently applied. If the author tells you $e_i$ is a random variable, he is telling you is also viewing $x_i$ and $y_i$ as random variables.







                  share|cite|improve this answer














                  share|cite|improve this answer



                  share|cite|improve this answer








                  edited Jul 15 at 17:26









                  Tim

                  63.1k9 gold badges141 silver badges238 bronze badges




                  63.1k9 gold badges141 silver badges238 bronze badges










                  answered Jul 15 at 17:12









                  olooneyolooney

                  2,1989 silver badges20 bronze badges




                  2,1989 silver badges20 bronze badges























                      1












                      $begingroup$

                      In simple linear regression, we assume that the observations are randomly perturbed from the conditional expected value, i.e. $E[Y|X=x_i]$; so, each of your observations are assumed to be generated from a model of the form: $$Y=beta_0+beta_1X+epsilon , epsilonsim N(0,sigma^2)$$



                      This makes each $epsilon_i$ a RV by definition. Think about a box where you give $x_i$ and get $y_i$, and you never know what's inside, how much error is introduced by the box etc. Even if we really know that the relation is of the form given above, we don't know the true $beta_0,beta_1$. If we had known those quantities, we would easily recover $epsilon_i$. Instead, we estimate those, and get residuals.






                      share|cite|improve this answer









                      $endgroup$

















                        1












                        $begingroup$

                        In simple linear regression, we assume that the observations are randomly perturbed from the conditional expected value, i.e. $E[Y|X=x_i]$; so, each of your observations are assumed to be generated from a model of the form: $$Y=beta_0+beta_1X+epsilon , epsilonsim N(0,sigma^2)$$



                        This makes each $epsilon_i$ a RV by definition. Think about a box where you give $x_i$ and get $y_i$, and you never know what's inside, how much error is introduced by the box etc. Even if we really know that the relation is of the form given above, we don't know the true $beta_0,beta_1$. If we had known those quantities, we would easily recover $epsilon_i$. Instead, we estimate those, and get residuals.






                        share|cite|improve this answer









                        $endgroup$















                          1












                          1








                          1





                          $begingroup$

                          In simple linear regression, we assume that the observations are randomly perturbed from the conditional expected value, i.e. $E[Y|X=x_i]$; so, each of your observations are assumed to be generated from a model of the form: $$Y=beta_0+beta_1X+epsilon , epsilonsim N(0,sigma^2)$$



                          This makes each $epsilon_i$ a RV by definition. Think about a box where you give $x_i$ and get $y_i$, and you never know what's inside, how much error is introduced by the box etc. Even if we really know that the relation is of the form given above, we don't know the true $beta_0,beta_1$. If we had known those quantities, we would easily recover $epsilon_i$. Instead, we estimate those, and get residuals.






                          share|cite|improve this answer









                          $endgroup$



                          In simple linear regression, we assume that the observations are randomly perturbed from the conditional expected value, i.e. $E[Y|X=x_i]$; so, each of your observations are assumed to be generated from a model of the form: $$Y=beta_0+beta_1X+epsilon , epsilonsim N(0,sigma^2)$$



                          This makes each $epsilon_i$ a RV by definition. Think about a box where you give $x_i$ and get $y_i$, and you never know what's inside, how much error is introduced by the box etc. Even if we really know that the relation is of the form given above, we don't know the true $beta_0,beta_1$. If we had known those quantities, we would easily recover $epsilon_i$. Instead, we estimate those, and get residuals.







                          share|cite|improve this answer












                          share|cite|improve this answer



                          share|cite|improve this answer










                          answered Jul 15 at 17:05









                          gunesgunes

                          12.1k1 gold badge5 silver badges22 bronze badges




                          12.1k1 gold badge5 silver badges22 bronze badges



























                              draft saved

                              draft discarded
















































                              Thanks for contributing an answer to Cross Validated!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f417529%2fare-the-errors-in-this-formulation-of-the-simple-linear-regression-model-random%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

                              Circuit construction for execution of conditional statements using least significant bitHow are two different registers being used as “control”?How exactly is the stated composite state of the two registers being produced using the $R_zz$ controlled rotations?Efficiently performing controlled rotations in HHLWould this quantum algorithm implementation work?How to prepare a superposed states of odd integers from $1$ to $sqrtN$?Why is this implementation of the order finding algorithm not working?Circuit construction for Hamiltonian simulationHow can I invert the least significant bit of a certain term of a superposed state?Implementing an oracleImplementing a controlled sum operation

                              Magento 2 “No Payment Methods” in Admin New OrderHow to integrate Paypal Express Checkout with the Magento APIMagento 1.5 - Sales > Order > edit order and shipping methods disappearAuto Invoice Check/Money Order Payment methodAdd more simple payment methods?Shipping methods not showingWhat should I do to change payment methods if changing the configuration has no effects?1.9 - No Payment Methods showing upMy Payment Methods not Showing for downloadable/virtual product when checkout?Magento2 API to access internal payment methodHow to call an existing payment methods in the registration form?