Check if data (N datapoints) originate from known distributionComparing two discrete distributions (with small cell counts)Testing randomly generated data against its intended distributionnon-parametric two-sample equivalence tests with unequal sample sizesMaximum Likelihood questionHow to draw these t-statistics in RAren't normality tests backwards?Proper test for comparing two means from different distributions? (with limited data)p-value for complete spatial randomness testsHelp with preference testWhat should my critical region look like in this LR test for shifted exponential distribution with pdf $e^-(x-theta)mathbf1

Check if data (N datapoints) originate from known distributionComparing two discrete distributions (with small cell counts)Testing randomly generated data against its intended distributionnon-parametric two-sample equivalence tests with unequal sample sizesMaximum Likelihood questionHow to draw these t-statistics in RAren't normality tests backwards?Proper test for comparing two means from different distributions? (with limited data)p-value for complete spatial randomness testsHelp with preference testWhat should my critical region look like in this LR test for shifted exponential distribution with pdf $e^-(x-theta)mathbf1_x>theta$?

How to structure presentation to avoid getting questions that will be answered later in the presentation?

Word for giving preference to the oldest child

Why is Searing Smite not listed in the Roll20 Spell books?

Using Python in a Bash Script

Why do we need a voltage divider when we get the same voltage at the output as the input?

What do the novel titles of The Expanse series refer to?

Move arrows along a contour

Why are prop blades not shaped like household fan blades?

How to get Planck length in meters to 6 decimal places

Create two random teams from a list of players

A conjectural trigonometric identity

If I buy and download a game through second Nintendo account do I own it on my main account too?

Patio gate not at right angle to the house

Password management for kids - what's a good way to start?

GDPR Compliance - notification of data breach

What does 「ちんちんかいかい」 mean?

"Fewer errors means better products" or fewer errors mean better products."

How to litter train a cat if both my husband and I work away from home all day?

Academic progression in Germany, what happens after a postdoc? What is the next step?

How do I respond appropriately to an overseas company that obtained a visa for me without hiring me?

How do discovery writers hibernate?

Can living where Rare Earth magnetic ore is abundant provide any protection?

Planting Trees in Outer Space

Best Ergonomic Design for a handheld ranged weapon

Check if data (N datapoints) originate from known distribution

Comparing two discrete distributions (with small cell counts)Testing randomly generated data against its intended distributionnon-parametric two-sample equivalence tests with unequal sample sizesMaximum Likelihood questionHow to draw these t-statistics in RAren't normality tests backwards?Proper test for comparing two means from different distributions? (with limited data)p-value for complete spatial randomness testsHelp with preference testWhat should my critical region look like in this LR test for shifted exponential distribution with pdf $e^-(x-theta)mathbf1_x>theta$?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

Assuming that we have N data points $x_1, ldots, x_N$ and a known distribution $F_0(x)$ (not a Normal Distribution), i.e. we want to check if the system under study behaves 'typically'.

I would like to quantify the probability that those $N$ datapoints originate from this distribution. For large sample sizes, there are various statistical hypothesis tests that could be applied (e.g. chi-squared). However, if the sample size is small, the chi-square test will fail.

Intuitively, it should be possible to detect if those data points are clearly deviating, even with a small sample size. However, is there a mathematical test designed for these conditions?

edited Jul 22 at 8:14

kjetil b halvorsen

35.9k9 gold badges90 silver badges278 bronze badges

asked Jul 22 at 5:23

SZafon

211 bronze badge

$begingroup$
"Intuitively, it should be possible to detect if those data points are clearly deviating, even with a small sample size." Identifying the population from which a small sample was taken can be problematic.
$endgroup$
– BruceET
Jul 22 at 17:30

add a comment |

Assuming that we have N data points $x_1, ldots, x_N$ and a known distribution $F_0(x)$ (not a Normal Distribution), i.e. we want to check if the system under study behaves 'typically'.

Intuitively, it should be possible to detect if those data points are clearly deviating, even with a small sample size. However, is there a mathematical test designed for these conditions?

edited Jul 22 at 8:14

kjetil b halvorsen

35.9k9 gold badges90 silver badges278 bronze badges

asked Jul 22 at 5:23

SZafon

211 bronze badge

$begingroup$
"Intuitively, it should be possible to detect if those data points are clearly deviating, even with a small sample size." Identifying the population from which a small sample was taken can be problematic.
$endgroup$
– BruceET
Jul 22 at 17:30

add a comment |

Assuming that we have N data points $x_1, ldots, x_N$ and a known distribution $F_0(x)$ (not a Normal Distribution), i.e. we want to check if the system under study behaves 'typically'.

Intuitively, it should be possible to detect if those data points are clearly deviating, even with a small sample size. However, is there a mathematical test designed for these conditions?

edited Jul 22 at 8:14

kjetil b halvorsen

35.9k9 gold badges90 silver badges278 bronze badges

asked Jul 22 at 5:23

SZafon

211 bronze badge

Assuming that we have N data points $x_1, ldots, x_N$ and a known distribution $F_0(x)$ (not a Normal Distribution), i.e. we want to check if the system under study behaves 'typically'.

Intuitively, it should be possible to detect if those data points are clearly deviating, even with a small sample size. However, is there a mathematical test designed for these conditions?

hypothesis-testing mathematical-statistics

edited Jul 22 at 8:14

kjetil b halvorsen

35.9k9 gold badges90 silver badges278 bronze badges

asked Jul 22 at 5:23

SZafon

211 bronze badge

edited Jul 22 at 8:14

kjetil b halvorsen

35.9k9 gold badges90 silver badges278 bronze badges

asked Jul 22 at 5:23

SZafon

211 bronze badge

edited Jul 22 at 8:14

kjetil b halvorsen

35.9k9 gold badges90 silver badges278 bronze badges

edited Jul 22 at 8:14

kjetil b halvorsen

35.9k9 gold badges90 silver badges278 bronze badges

edited Jul 22 at 8:14

kjetil b halvorsen

35.9k9 gold badges90 silver badges278 bronze badges

asked Jul 22 at 5:23

SZafon

211 bronze badge

asked Jul 22 at 5:23

SZafon

211 bronze badge

asked Jul 22 at 5:23

SZafon

211 bronze badge

$begingroup$
"Intuitively, it should be possible to detect if those data points are clearly deviating, even with a small sample size." Identifying the population from which a small sample was taken can be problematic.
$endgroup$
– BruceET
Jul 22 at 17:30

add a comment |

$begingroup$
"Intuitively, it should be possible to detect if those data points are clearly deviating, even with a small sample size." Identifying the population from which a small sample was taken can be problematic.
$endgroup$
– BruceET
Jul 22 at 17:30

"Intuitively, it should be possible to detect if those data points are clearly deviating, even with a small sample size." Identifying the population from which a small sample was taken can be problematic.

– BruceET
Jul 22 at 17:30

add a comment |

2 Answers
2

active

oldest

votes

Maybe Kolmogorov-Smirnov test with correction for small samples provided by Jan Vrbik in: Vrbik, Jan (2018). "Small-Sample Corrections to Kolmogorov–Smirnov Test Statistic". Pioneer Journal of Theoretical and Applied Statistics. 15 (1–2): 15–23.

Correction itself is also described on Wikipedia site for Kolmogorov-Smirnov test: replace $D_N$ with

$$ D_N+frac 16sqrt N+frac D_N-14N$$

where $D_N$ is standard Kolmogorov-Smirnov statistic.

edited Jul 22 at 8:29

answered Jul 22 at 8:05

Łukasz Deryło

2,7591 gold badge6 silver badges17 bronze badges

add a comment |

Use R to generate 10 observations from a standard uniform
distribution:

set seed(722) # for reproducibility
x = runif(10)
summary(x); sd(x)
 Min. 1st Qu. Median Mean 3rd Qu. Max. 
 0.1270 0.4940 0.7454 0.6627 0.9070 0.9477 
[1] 0.293335 # SD

Use the Kolmogorov-Smirnov test to see if the sample is consistent with standard uniform. Appropriately, the answer is Yes because data were sampled from a standard uniform distribution: Large P-value, no rejection.

ks.test(x, punif)

 One-sample Kolmogorov-Smirnov test

data: x
D = 0.31507, p-value = 0.2217
alternative hypothesis: two-sided

Is the sample also consistent with $mathsfNorm(.5, sqrt1/12)?$ The mean and variance match, but
shapes differ. Notice that the parameters mean and standard deviation are specified. Again consistent, but we know the normal distribution is not correct.

 ks.test(x, pnorm, .5, sqrt(1/12))

 One-sample Kolmogorov-Smirnov test

data: x
D = 0.36246, p-value = 0.1104
alternative hypothesis: two-sided

However, the K-S test easily rejects that this sample is from $mathsfExp(rate=2),$ which has mean $1/2 = 0.5,$ but the wrong SD. This exponential distribution has almost 14% of its probability above $1,$ but our sample has no observation above 0.948.

ks.test(x, dexp, 2)

 One-sample Kolmogorov-Smirnov test

data: x
D = 1.5513, p-value < 2.2e-16
alternative hypothesis: two-sided

Notes: (1) See other pages on this site and the Internet, including
the relevant Wikipedia page, which has a brief explanation of the test and some remarks about cases in which parameters must be estimated from data.

(2) Several well-known statistical software programs have
procedures that check a sample against a list of often used
distributions to estimate parameters and see if any distribution is a fit. Often these are called 'distribution ID' procedures
and sometimes they are restricted to non-negative data.

For example, when the distribution ID procedure in Minitab
is asked to compare the small sample above to normal, lognormal, Weibull, and gamma families, here are the parameter estimates:

ML Estimates of Distribution Parameters

Distribution Location Shape Scale 
Normal* 0.66265 0.29334
Lognormal* -0.55937 0.66158
Weibull 2.62094 0.74268
Gamma 3.53947 0.18722

* Scale: Adjusted ML estimate

And here are appropriate probability plots with P-values of Anderson-Darling goodness-of-fit tests in
legends. The data are clearly inconsistent with distributions in the lognormal family.

enter image description here

(2) For very large sample sizes, Kolmogorov-Smirnov, Anderson-Darling and other goodness-of-fit tests can reject some
distributions as not fitting---even when the fit might be good
enough for some practical applications.

edited Jul 22 at 8:08

answered Jul 22 at 7:56

BruceET

12k1 gold badge8 silver badges26 bronze badges

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f418547%2fcheck-if-data-n-datapoints-originate-from-known-distribution%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Correction itself is also described on Wikipedia site for Kolmogorov-Smirnov test: replace $D_N$ with

$$ D_N+frac 16sqrt N+frac D_N-14N$$

where $D_N$ is standard Kolmogorov-Smirnov statistic.

edited Jul 22 at 8:29

answered Jul 22 at 8:05

Łukasz Deryło

2,7591 gold badge6 silver badges17 bronze badges

add a comment |

Correction itself is also described on Wikipedia site for Kolmogorov-Smirnov test: replace $D_N$ with

$$ D_N+frac 16sqrt N+frac D_N-14N$$

where $D_N$ is standard Kolmogorov-Smirnov statistic.

edited Jul 22 at 8:29

answered Jul 22 at 8:05

Łukasz Deryło

2,7591 gold badge6 silver badges17 bronze badges

add a comment |

Correction itself is also described on Wikipedia site for Kolmogorov-Smirnov test: replace $D_N$ with

$$ D_N+frac 16sqrt N+frac D_N-14N$$

where $D_N$ is standard Kolmogorov-Smirnov statistic.

edited Jul 22 at 8:29

answered Jul 22 at 8:05

Łukasz Deryło

2,7591 gold badge6 silver badges17 bronze badges

Correction itself is also described on Wikipedia site for Kolmogorov-Smirnov test: replace $D_N$ with

$$ D_N+frac 16sqrt N+frac D_N-14N$$

where $D_N$ is standard Kolmogorov-Smirnov statistic.

edited Jul 22 at 8:29

answered Jul 22 at 8:05

Łukasz Deryło

2,7591 gold badge6 silver badges17 bronze badges

edited Jul 22 at 8:29

answered Jul 22 at 8:05

Łukasz Deryło

2,7591 gold badge6 silver badges17 bronze badges

answered Jul 22 at 8:05

Łukasz Deryło

2,7591 gold badge6 silver badges17 bronze badges

answered Jul 22 at 8:05

Łukasz Deryło

2,7591 gold badge6 silver badges17 bronze badges

add a comment |

Use R to generate 10 observations from a standard uniform
distribution:

set seed(722) # for reproducibility
x = runif(10)
summary(x); sd(x)
 Min. 1st Qu. Median Mean 3rd Qu. Max. 
 0.1270 0.4940 0.7454 0.6627 0.9070 0.9477 
[1] 0.293335 # SD

ks.test(x, punif)

 One-sample Kolmogorov-Smirnov test

data: x
D = 0.31507, p-value = 0.2217
alternative hypothesis: two-sided

 ks.test(x, pnorm, .5, sqrt(1/12))

 One-sample Kolmogorov-Smirnov test

data: x
D = 0.36246, p-value = 0.1104
alternative hypothesis: two-sided

ks.test(x, dexp, 2)

 One-sample Kolmogorov-Smirnov test

data: x
D = 1.5513, p-value < 2.2e-16
alternative hypothesis: two-sided

For example, when the distribution ID procedure in Minitab
is asked to compare the small sample above to normal, lognormal, Weibull, and gamma families, here are the parameter estimates:

ML Estimates of Distribution Parameters

Distribution Location Shape Scale 
Normal* 0.66265 0.29334
Lognormal* -0.55937 0.66158
Weibull 2.62094 0.74268
Gamma 3.53947 0.18722

* Scale: Adjusted ML estimate

And here are appropriate probability plots with P-values of Anderson-Darling goodness-of-fit tests in
legends. The data are clearly inconsistent with distributions in the lognormal family.

enter image description here

edited Jul 22 at 8:08

answered Jul 22 at 7:56

BruceET

12k1 gold badge8 silver badges26 bronze badges

add a comment |

Use R to generate 10 observations from a standard uniform
distribution:

set seed(722) # for reproducibility
x = runif(10)
summary(x); sd(x)
 Min. 1st Qu. Median Mean 3rd Qu. Max. 
 0.1270 0.4940 0.7454 0.6627 0.9070 0.9477 
[1] 0.293335 # SD

ks.test(x, punif)

 One-sample Kolmogorov-Smirnov test

data: x
D = 0.31507, p-value = 0.2217
alternative hypothesis: two-sided

 ks.test(x, pnorm, .5, sqrt(1/12))

 One-sample Kolmogorov-Smirnov test

data: x
D = 0.36246, p-value = 0.1104
alternative hypothesis: two-sided

ks.test(x, dexp, 2)

 One-sample Kolmogorov-Smirnov test

data: x
D = 1.5513, p-value < 2.2e-16
alternative hypothesis: two-sided

For example, when the distribution ID procedure in Minitab
is asked to compare the small sample above to normal, lognormal, Weibull, and gamma families, here are the parameter estimates:

ML Estimates of Distribution Parameters

Distribution Location Shape Scale 
Normal* 0.66265 0.29334
Lognormal* -0.55937 0.66158
Weibull 2.62094 0.74268
Gamma 3.53947 0.18722

* Scale: Adjusted ML estimate

And here are appropriate probability plots with P-values of Anderson-Darling goodness-of-fit tests in
legends. The data are clearly inconsistent with distributions in the lognormal family.

enter image description here

edited Jul 22 at 8:08

answered Jul 22 at 7:56

BruceET

12k1 gold badge8 silver badges26 bronze badges

add a comment |

Use R to generate 10 observations from a standard uniform
distribution:

set seed(722) # for reproducibility
x = runif(10)
summary(x); sd(x)
 Min. 1st Qu. Median Mean 3rd Qu. Max. 
 0.1270 0.4940 0.7454 0.6627 0.9070 0.9477 
[1] 0.293335 # SD

ks.test(x, punif)

 One-sample Kolmogorov-Smirnov test

data: x
D = 0.31507, p-value = 0.2217
alternative hypothesis: two-sided

 ks.test(x, pnorm, .5, sqrt(1/12))

 One-sample Kolmogorov-Smirnov test

data: x
D = 0.36246, p-value = 0.1104
alternative hypothesis: two-sided

ks.test(x, dexp, 2)

 One-sample Kolmogorov-Smirnov test

data: x
D = 1.5513, p-value < 2.2e-16
alternative hypothesis: two-sided

For example, when the distribution ID procedure in Minitab
is asked to compare the small sample above to normal, lognormal, Weibull, and gamma families, here are the parameter estimates:

ML Estimates of Distribution Parameters

Distribution Location Shape Scale 
Normal* 0.66265 0.29334
Lognormal* -0.55937 0.66158
Weibull 2.62094 0.74268
Gamma 3.53947 0.18722

* Scale: Adjusted ML estimate

And here are appropriate probability plots with P-values of Anderson-Darling goodness-of-fit tests in
legends. The data are clearly inconsistent with distributions in the lognormal family.

enter image description here

edited Jul 22 at 8:08

answered Jul 22 at 7:56

BruceET

12k1 gold badge8 silver badges26 bronze badges

Use R to generate 10 observations from a standard uniform
distribution:

set seed(722) # for reproducibility
x = runif(10)
summary(x); sd(x)
 Min. 1st Qu. Median Mean 3rd Qu. Max. 
 0.1270 0.4940 0.7454 0.6627 0.9070 0.9477 
[1] 0.293335 # SD

ks.test(x, punif)

 One-sample Kolmogorov-Smirnov test

data: x
D = 0.31507, p-value = 0.2217
alternative hypothesis: two-sided

 ks.test(x, pnorm, .5, sqrt(1/12))

 One-sample Kolmogorov-Smirnov test

data: x
D = 0.36246, p-value = 0.1104
alternative hypothesis: two-sided

ks.test(x, dexp, 2)

 One-sample Kolmogorov-Smirnov test

data: x
D = 1.5513, p-value < 2.2e-16
alternative hypothesis: two-sided

For example, when the distribution ID procedure in Minitab
is asked to compare the small sample above to normal, lognormal, Weibull, and gamma families, here are the parameter estimates:

ML Estimates of Distribution Parameters

Distribution Location Shape Scale 
Normal* 0.66265 0.29334
Lognormal* -0.55937 0.66158
Weibull 2.62094 0.74268
Gamma 3.53947 0.18722

* Scale: Adjusted ML estimate

And here are appropriate probability plots with P-values of Anderson-Darling goodness-of-fit tests in
legends. The data are clearly inconsistent with distributions in the lognormal family.

enter image description here

edited Jul 22 at 8:08

answered Jul 22 at 7:56

BruceET

12k1 gold badge8 silver badges26 bronze badges

edited Jul 22 at 8:08

answered Jul 22 at 7:56

BruceET

12k1 gold badge8 silver badges26 bronze badges

answered Jul 22 at 7:56

BruceET

12k1 gold badge8 silver badges26 bronze badges

answered Jul 22 at 7:56

BruceET

12k1 gold badge8 silver badges26 bronze badges

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Cross Validated!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ttdfjt

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Grendel Contents Story Scholarship Depictions Notes References Navigation menu10.1093/notesj/gjn112Berserkeree

Log in Navigation menu

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Grendel Contents Story Scholarship Depictions Notes References Navigation menu10.1093/notesj/gjn112Berserkeree

Log in Navigation menu

2 Answers
2

2 Answers
2

2 Answers
2