Big Sample size, Small coefficients, significant results. What should I do?Small and unbalanced sample sizes for two groups - what to do?Is significance of the p-value reliable with extremely small sample sizes?Pearson correlation is not significant although effect is .30 What could be the reason?Question about EFFECT SIZECorrelation - functional relationship small samples sizeWhat to do with a very big sample size? (effect size)R: linear regression: very small coefficient and R-squared but significant P valuesEstimate Sample size for ordinal logistic regressionShould I delete one year with small sample size from time series analysis?Sample size calculation Wilcoxon rank-sum test

How to design an effective polearm-bow hybrid?

How do I show and not tell a backstory?

Repeated! Factorials!

Is it possible to Clear (recover memory from) a specific index to a variable, while leaving other indices to the same variable untouched?

C# TCP server/client class

Is there any difference between "result in" and "end up with"?

Would the shaking of an earthquake be visible to somebody in a low-flying aircraft?

What is the reason behind water not falling from a bucket at the top of loop?

Is space radiation a risk for space film photography, and how is this prevented?

how to change dot to underline in multiple file-names?

The warlock of firetop mountain, what's the deal with reference 192?

Variable doesn't parse as string

Vectorised way to calculate mean of left and right neighbours in a vector

Make lens aperture in Tikz

How does Geralt transport his swords?

What's "halachic" about "Esav hates Ya'akov"?

How to call made-up data?

How do I handle a DM that plays favorites with certain players?

what can you do with Format View

Are the related objects in an SOQL query shared?

Would this winged human/angel be able to fly?

Broken bottom bracket?

Why is the Vasa Museum in Stockholm so Popular?

What could prevent players from leaving an island?



Big Sample size, Small coefficients, significant results. What should I do?


Small and unbalanced sample sizes for two groups - what to do?Is significance of the p-value reliable with extremely small sample sizes?Pearson correlation is not significant although effect is .30 What could be the reason?Question about EFFECT SIZECorrelation - functional relationship small samples sizeWhat to do with a very big sample size? (effect size)R: linear regression: very small coefficient and R-squared but significant P valuesEstimate Sample size for ordinal logistic regressionShould I delete one year with small sample size from time series analysis?Sample size calculation Wilcoxon rank-sum test






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








4












$begingroup$


I did some quantitative research and I used Rank-Order logistic regression in Stata. The the independent variables have almost 0 p-value which shows they have significant effect on dependent variable. But, since the sample size is big (35000 records) and coefficients are so small (e.g. 0.0001) then it shows that there is no relationship because when sample size is so big everything can get significant.
I tested the model with only 5000 records as well and I got the significant result as well.
What do you recommend me to do? should I use small sample size then the reviewers of my paper will not point to the problem of big sample size... or is there any other way that I can report my results and show that in fact the variables have significant effect?
I will appreciate any help.
Thanks










share|cite|improve this question









$endgroup$













  • $begingroup$
    One question: with the coefficients that are estimated as in the range of 0.0001 and are statistically significant, what's the distribution of associated covariate? An estimate of 0.0001 is huge if the standard deviation of associated covariate is 100,000.
    $endgroup$
    – Cliff AB
    Jul 25 at 23:37

















4












$begingroup$


I did some quantitative research and I used Rank-Order logistic regression in Stata. The the independent variables have almost 0 p-value which shows they have significant effect on dependent variable. But, since the sample size is big (35000 records) and coefficients are so small (e.g. 0.0001) then it shows that there is no relationship because when sample size is so big everything can get significant.
I tested the model with only 5000 records as well and I got the significant result as well.
What do you recommend me to do? should I use small sample size then the reviewers of my paper will not point to the problem of big sample size... or is there any other way that I can report my results and show that in fact the variables have significant effect?
I will appreciate any help.
Thanks










share|cite|improve this question









$endgroup$













  • $begingroup$
    One question: with the coefficients that are estimated as in the range of 0.0001 and are statistically significant, what's the distribution of associated covariate? An estimate of 0.0001 is huge if the standard deviation of associated covariate is 100,000.
    $endgroup$
    – Cliff AB
    Jul 25 at 23:37













4












4








4





$begingroup$


I did some quantitative research and I used Rank-Order logistic regression in Stata. The the independent variables have almost 0 p-value which shows they have significant effect on dependent variable. But, since the sample size is big (35000 records) and coefficients are so small (e.g. 0.0001) then it shows that there is no relationship because when sample size is so big everything can get significant.
I tested the model with only 5000 records as well and I got the significant result as well.
What do you recommend me to do? should I use small sample size then the reviewers of my paper will not point to the problem of big sample size... or is there any other way that I can report my results and show that in fact the variables have significant effect?
I will appreciate any help.
Thanks










share|cite|improve this question









$endgroup$




I did some quantitative research and I used Rank-Order logistic regression in Stata. The the independent variables have almost 0 p-value which shows they have significant effect on dependent variable. But, since the sample size is big (35000 records) and coefficients are so small (e.g. 0.0001) then it shows that there is no relationship because when sample size is so big everything can get significant.
I tested the model with only 5000 records as well and I got the significant result as well.
What do you recommend me to do? should I use small sample size then the reviewers of my paper will not point to the problem of big sample size... or is there any other way that I can report my results and show that in fact the variables have significant effect?
I will appreciate any help.
Thanks







statistical-significance regression-coefficients sample-size ordered-logit






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Jul 25 at 14:39









PSSPSS

4332 gold badges8 silver badges12 bronze badges




4332 gold badges8 silver badges12 bronze badges














  • $begingroup$
    One question: with the coefficients that are estimated as in the range of 0.0001 and are statistically significant, what's the distribution of associated covariate? An estimate of 0.0001 is huge if the standard deviation of associated covariate is 100,000.
    $endgroup$
    – Cliff AB
    Jul 25 at 23:37
















  • $begingroup$
    One question: with the coefficients that are estimated as in the range of 0.0001 and are statistically significant, what's the distribution of associated covariate? An estimate of 0.0001 is huge if the standard deviation of associated covariate is 100,000.
    $endgroup$
    – Cliff AB
    Jul 25 at 23:37















$begingroup$
One question: with the coefficients that are estimated as in the range of 0.0001 and are statistically significant, what's the distribution of associated covariate? An estimate of 0.0001 is huge if the standard deviation of associated covariate is 100,000.
$endgroup$
– Cliff AB
Jul 25 at 23:37




$begingroup$
One question: with the coefficients that are estimated as in the range of 0.0001 and are statistically significant, what's the distribution of associated covariate? An estimate of 0.0001 is huge if the standard deviation of associated covariate is 100,000.
$endgroup$
– Cliff AB
Jul 25 at 23:37










4 Answers
4






active

oldest

votes


















6












$begingroup$

I think it's been asked before. It's useful to realize that, without a prespecified sample size and alpha level, the $p$-value is just a measure of the sample size you ultimately wind up with. Not appealing. An approach I use is this: at what sample size would a 0.05 level be appropriate? Scale accordingly. For instance, I feel the 0.05 level is often suited to problems where there are 100 observations. That is: I would say WOW that is an interesting finding if it had a 1/20 chance of being a false positive. So if you had a sample size of 5,000, that's 50 times larger than 100. So divide your 0.05 level by 50 and come up with 0.001 as a significance level. This is in line with what Fisher advocated: don't do significance testing with p-values, compare them to the power of the study. The sample size is the simplest/rawest measure of the study's power. An overpowered study with a conventional 0.05 cut off makes absolutely no sense.



Usually, it is never advisable to choose a significance cut-off after viewing data and results. One might believe it might be kosher to arbitrarily choose a more stringent significance criterion post hoc. Actually, it only deceives readers into thinking you ran a better controlled trial than you did. Think of it this way: if you observed p = 0.04, you wouldn't be asking this question; the analysis would be a tidy inferential package.



Another way to look at it is this: just report the CI and that the analysis was statistically significant. For instance, you might have a 95% CI for a hazard ratio that goes from (0.01, 0.16) - the null is 1. It suffices to say that the p-value is really freakin' small, so you don't need to clutter the page displaying p=0.0000000023 (don't do this... only show p out to its precision, if 3 decimal places show p < 0.001 and never round to 0.000 - that shows you don't know what a p-value means.).






share|cite|improve this answer











$endgroup$














  • $begingroup$
    Thanks so much for your response :)
    $endgroup$
    – PSS
    Jul 25 at 15:01


















1












$begingroup$

You have encountered the gulf between "statistically significant" and "meaningful". As you point out, with sufficient sample size, you can assign statistical significance to arbitrarily small differences - there is no difference too small that can't be called "significant" with large enough N. You need to use domain knowledge to determine what is a "meaningful" difference. You might find, for example, that a new drug increases a person's lifespan by 10 seconds - even though you can be very confident that that increase is not due to random variation in your data, it's hardly a meaningful increase in lifespan.



Some of this will come from knowing about your problem and what people in the field consider meaningful. You could also try to think of future studies that might replicate your findings, and the typical N that they might use. If future studies will likely have a much lower N, you could calculate the effect size needed to replicate your findings in data of that size, and only report significant, meaningful, and feasibly reproducible results.






share|cite|improve this answer









$endgroup$






















    0












    $begingroup$

    When you have many samples and the observed effect is very small (small for the specified application), you can safely conclude that the independent variables do not have an important effect on the dependent variable. The effect size can be “statistically significant” and unimportant at the same time.



    Using small sample size and ignoring the results from the large samples is inappropriate. You owe that to the people that read your paper and design some new experiments based on your observations.






    share|cite|improve this answer











    $endgroup$






















      0












      $begingroup$

      I think you should decide on an "expected minimal effect size", i.e. the minimal coefficients you care to include in your model. Say, do you care about coefficients less than 0.0001, or 1, or 100? To clarify, the effect size is the degree to which the null hypothesis is false, or how large the coefficient actually is. It's a parameter of the population. On the other hand, the expected minimal effect size is the minimal amount of departure from the null you care to detect. It's a parameter of the test.



      Now that you have the sample size $N = 35000$, as well as some expected minimal effect size, a power analysis should reveal the relationship between $alpha$ and $beta$ given there parameters. Next, make another decision about how to balance your significance level and power by choosing a pair of $alpha$ and $beta$. (Technically, all these parameters must be decided before looking at the data, but at this point, I guess you can just pretend you didn't see them.) Then, carry out your test, compare $p$ with $alpha$, and draw a conclusion accordingly.



      By the way, I believe there are no reasons to exclude any record, unless you are doing cross-validation, for example. More data generally leads to more accurate inference, and additionally, discarding sample points in a selective manner may introduce bias.






      share|cite|improve this answer









      $endgroup$

















        Your Answer








        StackExchange.ready(function()
        var channelOptions =
        tags: "".split(" "),
        id: "65"
        ;
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function()
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled)
        StackExchange.using("snippets", function()
        createEditor();
        );

        else
        createEditor();

        );

        function createEditor()
        StackExchange.prepareEditor(
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: false,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        imageUploader:
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        ,
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        );



        );













        draft saved

        draft discarded


















        StackExchange.ready(
        function ()
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f419186%2fbig-sample-size-small-coefficients-significant-results-what-should-i-do%23new-answer', 'question_page');

        );

        Post as a guest















        Required, but never shown

























        4 Answers
        4






        active

        oldest

        votes








        4 Answers
        4






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        6












        $begingroup$

        I think it's been asked before. It's useful to realize that, without a prespecified sample size and alpha level, the $p$-value is just a measure of the sample size you ultimately wind up with. Not appealing. An approach I use is this: at what sample size would a 0.05 level be appropriate? Scale accordingly. For instance, I feel the 0.05 level is often suited to problems where there are 100 observations. That is: I would say WOW that is an interesting finding if it had a 1/20 chance of being a false positive. So if you had a sample size of 5,000, that's 50 times larger than 100. So divide your 0.05 level by 50 and come up with 0.001 as a significance level. This is in line with what Fisher advocated: don't do significance testing with p-values, compare them to the power of the study. The sample size is the simplest/rawest measure of the study's power. An overpowered study with a conventional 0.05 cut off makes absolutely no sense.



        Usually, it is never advisable to choose a significance cut-off after viewing data and results. One might believe it might be kosher to arbitrarily choose a more stringent significance criterion post hoc. Actually, it only deceives readers into thinking you ran a better controlled trial than you did. Think of it this way: if you observed p = 0.04, you wouldn't be asking this question; the analysis would be a tidy inferential package.



        Another way to look at it is this: just report the CI and that the analysis was statistically significant. For instance, you might have a 95% CI for a hazard ratio that goes from (0.01, 0.16) - the null is 1. It suffices to say that the p-value is really freakin' small, so you don't need to clutter the page displaying p=0.0000000023 (don't do this... only show p out to its precision, if 3 decimal places show p < 0.001 and never round to 0.000 - that shows you don't know what a p-value means.).






        share|cite|improve this answer











        $endgroup$














        • $begingroup$
          Thanks so much for your response :)
          $endgroup$
          – PSS
          Jul 25 at 15:01















        6












        $begingroup$

        I think it's been asked before. It's useful to realize that, without a prespecified sample size and alpha level, the $p$-value is just a measure of the sample size you ultimately wind up with. Not appealing. An approach I use is this: at what sample size would a 0.05 level be appropriate? Scale accordingly. For instance, I feel the 0.05 level is often suited to problems where there are 100 observations. That is: I would say WOW that is an interesting finding if it had a 1/20 chance of being a false positive. So if you had a sample size of 5,000, that's 50 times larger than 100. So divide your 0.05 level by 50 and come up with 0.001 as a significance level. This is in line with what Fisher advocated: don't do significance testing with p-values, compare them to the power of the study. The sample size is the simplest/rawest measure of the study's power. An overpowered study with a conventional 0.05 cut off makes absolutely no sense.



        Usually, it is never advisable to choose a significance cut-off after viewing data and results. One might believe it might be kosher to arbitrarily choose a more stringent significance criterion post hoc. Actually, it only deceives readers into thinking you ran a better controlled trial than you did. Think of it this way: if you observed p = 0.04, you wouldn't be asking this question; the analysis would be a tidy inferential package.



        Another way to look at it is this: just report the CI and that the analysis was statistically significant. For instance, you might have a 95% CI for a hazard ratio that goes from (0.01, 0.16) - the null is 1. It suffices to say that the p-value is really freakin' small, so you don't need to clutter the page displaying p=0.0000000023 (don't do this... only show p out to its precision, if 3 decimal places show p < 0.001 and never round to 0.000 - that shows you don't know what a p-value means.).






        share|cite|improve this answer











        $endgroup$














        • $begingroup$
          Thanks so much for your response :)
          $endgroup$
          – PSS
          Jul 25 at 15:01













        6












        6








        6





        $begingroup$

        I think it's been asked before. It's useful to realize that, without a prespecified sample size and alpha level, the $p$-value is just a measure of the sample size you ultimately wind up with. Not appealing. An approach I use is this: at what sample size would a 0.05 level be appropriate? Scale accordingly. For instance, I feel the 0.05 level is often suited to problems where there are 100 observations. That is: I would say WOW that is an interesting finding if it had a 1/20 chance of being a false positive. So if you had a sample size of 5,000, that's 50 times larger than 100. So divide your 0.05 level by 50 and come up with 0.001 as a significance level. This is in line with what Fisher advocated: don't do significance testing with p-values, compare them to the power of the study. The sample size is the simplest/rawest measure of the study's power. An overpowered study with a conventional 0.05 cut off makes absolutely no sense.



        Usually, it is never advisable to choose a significance cut-off after viewing data and results. One might believe it might be kosher to arbitrarily choose a more stringent significance criterion post hoc. Actually, it only deceives readers into thinking you ran a better controlled trial than you did. Think of it this way: if you observed p = 0.04, you wouldn't be asking this question; the analysis would be a tidy inferential package.



        Another way to look at it is this: just report the CI and that the analysis was statistically significant. For instance, you might have a 95% CI for a hazard ratio that goes from (0.01, 0.16) - the null is 1. It suffices to say that the p-value is really freakin' small, so you don't need to clutter the page displaying p=0.0000000023 (don't do this... only show p out to its precision, if 3 decimal places show p < 0.001 and never round to 0.000 - that shows you don't know what a p-value means.).






        share|cite|improve this answer











        $endgroup$



        I think it's been asked before. It's useful to realize that, without a prespecified sample size and alpha level, the $p$-value is just a measure of the sample size you ultimately wind up with. Not appealing. An approach I use is this: at what sample size would a 0.05 level be appropriate? Scale accordingly. For instance, I feel the 0.05 level is often suited to problems where there are 100 observations. That is: I would say WOW that is an interesting finding if it had a 1/20 chance of being a false positive. So if you had a sample size of 5,000, that's 50 times larger than 100. So divide your 0.05 level by 50 and come up with 0.001 as a significance level. This is in line with what Fisher advocated: don't do significance testing with p-values, compare them to the power of the study. The sample size is the simplest/rawest measure of the study's power. An overpowered study with a conventional 0.05 cut off makes absolutely no sense.



        Usually, it is never advisable to choose a significance cut-off after viewing data and results. One might believe it might be kosher to arbitrarily choose a more stringent significance criterion post hoc. Actually, it only deceives readers into thinking you ran a better controlled trial than you did. Think of it this way: if you observed p = 0.04, you wouldn't be asking this question; the analysis would be a tidy inferential package.



        Another way to look at it is this: just report the CI and that the analysis was statistically significant. For instance, you might have a 95% CI for a hazard ratio that goes from (0.01, 0.16) - the null is 1. It suffices to say that the p-value is really freakin' small, so you don't need to clutter the page displaying p=0.0000000023 (don't do this... only show p out to its precision, if 3 decimal places show p < 0.001 and never round to 0.000 - that shows you don't know what a p-value means.).







        share|cite|improve this answer














        share|cite|improve this answer



        share|cite|improve this answer








        edited Jul 26 at 15:53

























        answered Jul 25 at 14:50









        AdamOAdamO

        37.9k2 gold badges68 silver badges151 bronze badges




        37.9k2 gold badges68 silver badges151 bronze badges














        • $begingroup$
          Thanks so much for your response :)
          $endgroup$
          – PSS
          Jul 25 at 15:01
















        • $begingroup$
          Thanks so much for your response :)
          $endgroup$
          – PSS
          Jul 25 at 15:01















        $begingroup$
        Thanks so much for your response :)
        $endgroup$
        – PSS
        Jul 25 at 15:01




        $begingroup$
        Thanks so much for your response :)
        $endgroup$
        – PSS
        Jul 25 at 15:01













        1












        $begingroup$

        You have encountered the gulf between "statistically significant" and "meaningful". As you point out, with sufficient sample size, you can assign statistical significance to arbitrarily small differences - there is no difference too small that can't be called "significant" with large enough N. You need to use domain knowledge to determine what is a "meaningful" difference. You might find, for example, that a new drug increases a person's lifespan by 10 seconds - even though you can be very confident that that increase is not due to random variation in your data, it's hardly a meaningful increase in lifespan.



        Some of this will come from knowing about your problem and what people in the field consider meaningful. You could also try to think of future studies that might replicate your findings, and the typical N that they might use. If future studies will likely have a much lower N, you could calculate the effect size needed to replicate your findings in data of that size, and only report significant, meaningful, and feasibly reproducible results.






        share|cite|improve this answer









        $endgroup$



















          1












          $begingroup$

          You have encountered the gulf between "statistically significant" and "meaningful". As you point out, with sufficient sample size, you can assign statistical significance to arbitrarily small differences - there is no difference too small that can't be called "significant" with large enough N. You need to use domain knowledge to determine what is a "meaningful" difference. You might find, for example, that a new drug increases a person's lifespan by 10 seconds - even though you can be very confident that that increase is not due to random variation in your data, it's hardly a meaningful increase in lifespan.



          Some of this will come from knowing about your problem and what people in the field consider meaningful. You could also try to think of future studies that might replicate your findings, and the typical N that they might use. If future studies will likely have a much lower N, you could calculate the effect size needed to replicate your findings in data of that size, and only report significant, meaningful, and feasibly reproducible results.






          share|cite|improve this answer









          $endgroup$

















            1












            1








            1





            $begingroup$

            You have encountered the gulf between "statistically significant" and "meaningful". As you point out, with sufficient sample size, you can assign statistical significance to arbitrarily small differences - there is no difference too small that can't be called "significant" with large enough N. You need to use domain knowledge to determine what is a "meaningful" difference. You might find, for example, that a new drug increases a person's lifespan by 10 seconds - even though you can be very confident that that increase is not due to random variation in your data, it's hardly a meaningful increase in lifespan.



            Some of this will come from knowing about your problem and what people in the field consider meaningful. You could also try to think of future studies that might replicate your findings, and the typical N that they might use. If future studies will likely have a much lower N, you could calculate the effect size needed to replicate your findings in data of that size, and only report significant, meaningful, and feasibly reproducible results.






            share|cite|improve this answer









            $endgroup$



            You have encountered the gulf between "statistically significant" and "meaningful". As you point out, with sufficient sample size, you can assign statistical significance to arbitrarily small differences - there is no difference too small that can't be called "significant" with large enough N. You need to use domain knowledge to determine what is a "meaningful" difference. You might find, for example, that a new drug increases a person's lifespan by 10 seconds - even though you can be very confident that that increase is not due to random variation in your data, it's hardly a meaningful increase in lifespan.



            Some of this will come from knowing about your problem and what people in the field consider meaningful. You could also try to think of future studies that might replicate your findings, and the typical N that they might use. If future studies will likely have a much lower N, you could calculate the effect size needed to replicate your findings in data of that size, and only report significant, meaningful, and feasibly reproducible results.







            share|cite|improve this answer












            share|cite|improve this answer



            share|cite|improve this answer










            answered Jul 25 at 18:28









            Nuclear WangNuclear Wang

            2,9519 silver badges21 bronze badges




            2,9519 silver badges21 bronze badges
























                0












                $begingroup$

                When you have many samples and the observed effect is very small (small for the specified application), you can safely conclude that the independent variables do not have an important effect on the dependent variable. The effect size can be “statistically significant” and unimportant at the same time.



                Using small sample size and ignoring the results from the large samples is inappropriate. You owe that to the people that read your paper and design some new experiments based on your observations.






                share|cite|improve this answer











                $endgroup$



















                  0












                  $begingroup$

                  When you have many samples and the observed effect is very small (small for the specified application), you can safely conclude that the independent variables do not have an important effect on the dependent variable. The effect size can be “statistically significant” and unimportant at the same time.



                  Using small sample size and ignoring the results from the large samples is inappropriate. You owe that to the people that read your paper and design some new experiments based on your observations.






                  share|cite|improve this answer











                  $endgroup$

















                    0












                    0








                    0





                    $begingroup$

                    When you have many samples and the observed effect is very small (small for the specified application), you can safely conclude that the independent variables do not have an important effect on the dependent variable. The effect size can be “statistically significant” and unimportant at the same time.



                    Using small sample size and ignoring the results from the large samples is inappropriate. You owe that to the people that read your paper and design some new experiments based on your observations.






                    share|cite|improve this answer











                    $endgroup$



                    When you have many samples and the observed effect is very small (small for the specified application), you can safely conclude that the independent variables do not have an important effect on the dependent variable. The effect size can be “statistically significant” and unimportant at the same time.



                    Using small sample size and ignoring the results from the large samples is inappropriate. You owe that to the people that read your paper and design some new experiments based on your observations.







                    share|cite|improve this answer














                    share|cite|improve this answer



                    share|cite|improve this answer








                    edited Jul 25 at 18:32

























                    answered Jul 25 at 18:16









                    AliAli

                    11 bronze badge




                    11 bronze badge
























                        0












                        $begingroup$

                        I think you should decide on an "expected minimal effect size", i.e. the minimal coefficients you care to include in your model. Say, do you care about coefficients less than 0.0001, or 1, or 100? To clarify, the effect size is the degree to which the null hypothesis is false, or how large the coefficient actually is. It's a parameter of the population. On the other hand, the expected minimal effect size is the minimal amount of departure from the null you care to detect. It's a parameter of the test.



                        Now that you have the sample size $N = 35000$, as well as some expected minimal effect size, a power analysis should reveal the relationship between $alpha$ and $beta$ given there parameters. Next, make another decision about how to balance your significance level and power by choosing a pair of $alpha$ and $beta$. (Technically, all these parameters must be decided before looking at the data, but at this point, I guess you can just pretend you didn't see them.) Then, carry out your test, compare $p$ with $alpha$, and draw a conclusion accordingly.



                        By the way, I believe there are no reasons to exclude any record, unless you are doing cross-validation, for example. More data generally leads to more accurate inference, and additionally, discarding sample points in a selective manner may introduce bias.






                        share|cite|improve this answer









                        $endgroup$



















                          0












                          $begingroup$

                          I think you should decide on an "expected minimal effect size", i.e. the minimal coefficients you care to include in your model. Say, do you care about coefficients less than 0.0001, or 1, or 100? To clarify, the effect size is the degree to which the null hypothesis is false, or how large the coefficient actually is. It's a parameter of the population. On the other hand, the expected minimal effect size is the minimal amount of departure from the null you care to detect. It's a parameter of the test.



                          Now that you have the sample size $N = 35000$, as well as some expected minimal effect size, a power analysis should reveal the relationship between $alpha$ and $beta$ given there parameters. Next, make another decision about how to balance your significance level and power by choosing a pair of $alpha$ and $beta$. (Technically, all these parameters must be decided before looking at the data, but at this point, I guess you can just pretend you didn't see them.) Then, carry out your test, compare $p$ with $alpha$, and draw a conclusion accordingly.



                          By the way, I believe there are no reasons to exclude any record, unless you are doing cross-validation, for example. More data generally leads to more accurate inference, and additionally, discarding sample points in a selective manner may introduce bias.






                          share|cite|improve this answer









                          $endgroup$

















                            0












                            0








                            0





                            $begingroup$

                            I think you should decide on an "expected minimal effect size", i.e. the minimal coefficients you care to include in your model. Say, do you care about coefficients less than 0.0001, or 1, or 100? To clarify, the effect size is the degree to which the null hypothesis is false, or how large the coefficient actually is. It's a parameter of the population. On the other hand, the expected minimal effect size is the minimal amount of departure from the null you care to detect. It's a parameter of the test.



                            Now that you have the sample size $N = 35000$, as well as some expected minimal effect size, a power analysis should reveal the relationship between $alpha$ and $beta$ given there parameters. Next, make another decision about how to balance your significance level and power by choosing a pair of $alpha$ and $beta$. (Technically, all these parameters must be decided before looking at the data, but at this point, I guess you can just pretend you didn't see them.) Then, carry out your test, compare $p$ with $alpha$, and draw a conclusion accordingly.



                            By the way, I believe there are no reasons to exclude any record, unless you are doing cross-validation, for example. More data generally leads to more accurate inference, and additionally, discarding sample points in a selective manner may introduce bias.






                            share|cite|improve this answer









                            $endgroup$



                            I think you should decide on an "expected minimal effect size", i.e. the minimal coefficients you care to include in your model. Say, do you care about coefficients less than 0.0001, or 1, or 100? To clarify, the effect size is the degree to which the null hypothesis is false, or how large the coefficient actually is. It's a parameter of the population. On the other hand, the expected minimal effect size is the minimal amount of departure from the null you care to detect. It's a parameter of the test.



                            Now that you have the sample size $N = 35000$, as well as some expected minimal effect size, a power analysis should reveal the relationship between $alpha$ and $beta$ given there parameters. Next, make another decision about how to balance your significance level and power by choosing a pair of $alpha$ and $beta$. (Technically, all these parameters must be decided before looking at the data, but at this point, I guess you can just pretend you didn't see them.) Then, carry out your test, compare $p$ with $alpha$, and draw a conclusion accordingly.



                            By the way, I believe there are no reasons to exclude any record, unless you are doing cross-validation, for example. More data generally leads to more accurate inference, and additionally, discarding sample points in a selective manner may introduce bias.







                            share|cite|improve this answer












                            share|cite|improve this answer



                            share|cite|improve this answer










                            answered Jul 26 at 7:21









                            nalzoknalzok

                            4804 silver badges15 bronze badges




                            4804 silver badges15 bronze badges






























                                draft saved

                                draft discarded
















































                                Thanks for contributing an answer to Cross Validated!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid


                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.

                                Use MathJax to format equations. MathJax reference.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function ()
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f419186%2fbig-sample-size-small-coefficients-significant-results-what-should-i-do%23new-answer', 'question_page');

                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Get product attribute by attribute group code in magento 2get product attribute by product attribute group in magento 2Magento 2 Log Bundle Product Data in List Page?How to get all product attribute of a attribute group of Default attribute set?Magento 2.1 Create a filter in the product grid by new attributeMagento 2 : Get Product Attribute values By GroupMagento 2 How to get all existing values for one attributeMagento 2 get custom attribute of a single product inside a pluginMagento 2.3 How to get all the Multi Source Inventory (MSI) locations collection in custom module?Magento2: how to develop rest API to get new productsGet product attribute by attribute group code ( [attribute_group_code] ) in magento 2

                                Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

                                Magento 2.3: How do i solve this, Not registered handle, on custom form?How can i rewrite TierPrice Block in Magento2magento 2 captcha not rendering if I override layout xmlmain.CRITICAL: Plugin class doesn't existMagento 2 : Problem while adding custom button order view page?Magento 2.2.5: Overriding Admin Controller sales/orderMagento 2.2.5: Add, Update and Delete existing products Custom OptionsMagento 2.3 : File Upload issue in UI Component FormMagento2 Not registered handleHow to configured Form Builder Js in my custom magento 2.3.0 module?Magento 2.3. How to create image upload field in an admin form