How do I generate distribution of positive numbers only with min, max and mean?Calculating distribution from min, mean, and maxHow to perform goodness of fit test and how to assign probability with uniform distribution?How to generate a non-normal correlated bivariate distributionHow do I test for a symmetric distribution?Truncate lognormal distribution with excelAre there any inverse distribution graph that looks like this?Generate data with skewed distribution and known percentiles, mean and medianCalculating distribution minimum and maximum values from known p5/mode/p95 valuesSimulating data from an unknown distribution given min and max valuesEstimating gamma distribution parameters using sample mean and stdSample from a distribution if only mean median max etc. are given
How does LIDAR avoid getting confused in an environment being scanned by hundreds of other LIDAR?
Does the length of a password for Wi-Fi affect speed?
Pronouns when writing from the point of view of a robot
What could prevent players from leaving an island?
Plato and the knowledge of the forms
Traveling from Germany to other countries by train?
Best way to explain to my boss that I cannot attend a team summit because it is on Rosh Hashana or any other Jewish Holiday
…down the primrose path
Should I take out a personal loan to pay off credit card debt?
Examples of application problems of coordinate geomtry in the complex plane?
Generate a random point outside a given rectangle within a map
The meaning of "scale" in "because diversions scale so easily wealth becomes concentrated"
How to approach protecting my code as a research assistant? Should I be worried in the first place?
Why do dragons like shiny stuff?
Probably terminated or laid off soon; confront or not?
Premier League simulation
Tile the chessboard with four-colored triominoes
Does a humanoid possessed by a ghost register as undead to a paladin's Divine Sense?
Does this smartphone photo show Mars just below the Sun?
Is there a way to say "double + any number" in German?
How to switch an 80286 from protected to real mode?
Why should I "believe in" weak solutions to PDEs?
How to check a file was encrypted (really & correctly)
Minimum effort to detect a solved Rubik's Cube
How do I generate distribution of positive numbers only with min, max and mean?
Calculating distribution from min, mean, and maxHow to perform goodness of fit test and how to assign probability with uniform distribution?How to generate a non-normal correlated bivariate distributionHow do I test for a symmetric distribution?Truncate lognormal distribution with excelAre there any inverse distribution graph that looks like this?Generate data with skewed distribution and known percentiles, mean and medianCalculating distribution minimum and maximum values from known p5/mode/p95 valuesSimulating data from an unknown distribution given min and max valuesEstimating gamma distribution parameters using sample mean and stdSample from a distribution if only mean median max etc. are given
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
I am trying to generate a sample of 2000 rows. I have the following values
min = 80
max = 12000
mean = 500
I want to generate only positive numbers. I tried using triangular distribution and range rule(sd = max-min/4). The values were negative.
Is there anyway I can generate only positive numbers?
distributions simulation
$endgroup$
add a comment |
$begingroup$
I am trying to generate a sample of 2000 rows. I have the following values
min = 80
max = 12000
mean = 500
I want to generate only positive numbers. I tried using triangular distribution and range rule(sd = max-min/4). The values were negative.
Is there anyway I can generate only positive numbers?
distributions simulation
$endgroup$
2
$begingroup$
What kind of distribution of values do you want?
$endgroup$
– Dave
Jul 26 at 17:35
$begingroup$
as long as they are positive and has mean = 500, min = 80 and max = 12000
$endgroup$
– user3437212
Jul 26 at 17:36
$begingroup$
min and max is fine but the mean is way too off in triangular distribution, do you have any other suggestion?
$endgroup$
– user3437212
Jul 26 at 17:37
$begingroup$
Strongly related: stats.stackexchange.com/q/236449/35989
$endgroup$
– Tim♦
Jul 26 at 18:44
add a comment |
$begingroup$
I am trying to generate a sample of 2000 rows. I have the following values
min = 80
max = 12000
mean = 500
I want to generate only positive numbers. I tried using triangular distribution and range rule(sd = max-min/4). The values were negative.
Is there anyway I can generate only positive numbers?
distributions simulation
$endgroup$
I am trying to generate a sample of 2000 rows. I have the following values
min = 80
max = 12000
mean = 500
I want to generate only positive numbers. I tried using triangular distribution and range rule(sd = max-min/4). The values were negative.
Is there anyway I can generate only positive numbers?
distributions simulation
distributions simulation
edited Jul 28 at 2:59
Aaron Hall
4002 silver badges15 bronze badges
4002 silver badges15 bronze badges
asked Jul 26 at 17:22
user3437212user3437212
474 bronze badges
474 bronze badges
2
$begingroup$
What kind of distribution of values do you want?
$endgroup$
– Dave
Jul 26 at 17:35
$begingroup$
as long as they are positive and has mean = 500, min = 80 and max = 12000
$endgroup$
– user3437212
Jul 26 at 17:36
$begingroup$
min and max is fine but the mean is way too off in triangular distribution, do you have any other suggestion?
$endgroup$
– user3437212
Jul 26 at 17:37
$begingroup$
Strongly related: stats.stackexchange.com/q/236449/35989
$endgroup$
– Tim♦
Jul 26 at 18:44
add a comment |
2
$begingroup$
What kind of distribution of values do you want?
$endgroup$
– Dave
Jul 26 at 17:35
$begingroup$
as long as they are positive and has mean = 500, min = 80 and max = 12000
$endgroup$
– user3437212
Jul 26 at 17:36
$begingroup$
min and max is fine but the mean is way too off in triangular distribution, do you have any other suggestion?
$endgroup$
– user3437212
Jul 26 at 17:37
$begingroup$
Strongly related: stats.stackexchange.com/q/236449/35989
$endgroup$
– Tim♦
Jul 26 at 18:44
2
2
$begingroup$
What kind of distribution of values do you want?
$endgroup$
– Dave
Jul 26 at 17:35
$begingroup$
What kind of distribution of values do you want?
$endgroup$
– Dave
Jul 26 at 17:35
$begingroup$
as long as they are positive and has mean = 500, min = 80 and max = 12000
$endgroup$
– user3437212
Jul 26 at 17:36
$begingroup$
as long as they are positive and has mean = 500, min = 80 and max = 12000
$endgroup$
– user3437212
Jul 26 at 17:36
$begingroup$
min and max is fine but the mean is way too off in triangular distribution, do you have any other suggestion?
$endgroup$
– user3437212
Jul 26 at 17:37
$begingroup$
min and max is fine but the mean is way too off in triangular distribution, do you have any other suggestion?
$endgroup$
– user3437212
Jul 26 at 17:37
$begingroup$
Strongly related: stats.stackexchange.com/q/236449/35989
$endgroup$
– Tim♦
Jul 26 at 18:44
$begingroup$
Strongly related: stats.stackexchange.com/q/236449/35989
$endgroup$
– Tim♦
Jul 26 at 18:44
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
While the problem is very much ill-posed, since there is an infinite range of distributions satisfying these constraints, a possible solution is to find the maximum entropy distribution under the constraint of a support of $(80,12000)$ [thus using the uniform measure on that interval as the reference measure] and a mean of $mathbb E[X]=500$ is of the form
$$p(x)=expalpha+beta x,mathbb I_(80,1200)(x)$$
with
$$int_80^12000 expalpha+beta x,text dx=1qquadtextandqquad
int_80^12000 xexpalpha+beta x,text dx=500$$
which leads to
$$exp-alpha=beta^-1[exp12000beta-exp80beta]$$
and$$beta^-1expalpha[12000exp12000beta-80exp80beta]-beta^-1=500$$which can be solved numerically in $beta$. Leading to
$$beta^*=-.00238quadtextandquadalpha^*=-5.850$$which can be easily simulated as a truncated exponential distribution, by inversion of the cdf, e.g., using qexp()
in R. For instance,
function(n=1)
return(qexp(pexp(80,.00238)+runif(n)*
(pexp(12000,.00238)-pexp(80,.00238)),.00238))
If the question is instead about simulating a sample $X_1:2000$ such that $$min(X_1:2000)=80,quadmax(X_1:2000)=1200,quadbar X_1:2000=500$$
there is again an infinite range of solutions, the simplest being a uniform Multinomial distribution constrained by its minimum $X_(1)$ being 80 and its maximum $X_(2000)$ being 12000 since
$$underbraceX_(1)_80+cdots+underbraceX_(2000)_12000 = 80 + 987920 + 12000= underbrace2000_ptimes 500=underbrace10^6_n$$
namely proportional to
$$nchoose 80,n_2,cdots,n_p-1,12000mathbb I_80le n_1leldotsle n_p-1le 1200$$
This is equivalent to simulate a Multinomial
$$mathcal M_1998(987920,1/1998,ldots,1/1998)$$
constrained to $(80,1200)^1998$, ie
x=rmultinom(1,987920,rep(1,1998))
while (min(x)<80||max(x)>12000)
x=rmultinom(1,987920,rep(1,1998))
As an additional remark, let me add that observing a range of (80,12000) for a Multinomial $mathcal M(10⁶;2000)$ is extremely unlikely (in the above simulation, the first attempt is always successful) and a more satisfactory approach would be to infer first about the probability vector of a Multinomial $mathcal M(10⁶;2000;p)$ before predicting the remaining 1998 categories.
$endgroup$
$begingroup$
Is your comment about $X_1:2000$ about ensuring that the sample (not the population) has a mean of 500? Also, two clarifications would be helpful. 1) Why do something in the form of $expalpha + beta X$? 2) Once we solve for $alpha$ and $beta$, how do to simulate draws from that PDF?
$endgroup$
– Dave
Jul 26 at 18:31
$begingroup$
@Xi'an Thank you this helps!
$endgroup$
– user3437212
Jul 26 at 19:38
add a comment |
$begingroup$
If you don't care about the distribution aside from min, max, and mean, then there is a simple answer.
Take 96.476510067114100 percent of draws as 80 and 3.523489932885910 percent of draws as 12000. On average, you get 500, and you have your min and max. I calculated the percentages by solving a system of equations
$$a + b =1$$ $$80a + 12000b = 500$$
The first equation establishes the the values must sum to one, making sure that we are dealing with probabilities. The second equation get us our average of 500.
D <- rep(NA,2000) # define a vector of NAs to hold your sampled values
for (i in 1:2000)
X <- rbinom(1,1,0.96476510067114100) # determine which value you'll take, 80 or 12000
if (X==0)D[i] <- 12000 # declare observation i as 12000
if (X==1)D[i] <- 80 # declare observation i as 80
$endgroup$
$begingroup$
Thanks Dave! But I don't want only 80s and 12000s in my distribution. I would like a range of values but all positive
$endgroup$
– user3437212
Jul 26 at 18:03
$begingroup$
I'm sorry I was not clear earlier
$endgroup$
– user3437212
Jul 26 at 18:04
2
$begingroup$
We can construct a different distribution, but what requirements do you have?
$endgroup$
– Dave
Jul 26 at 18:05
$begingroup$
normal or a poisson but no negative values
$endgroup$
– user3437212
Jul 26 at 18:07
8
$begingroup$
Normal is out, since it can take all real values. More helpful, though, would be to know what you're considering either of those distributions. So...why normal or Poisson?
$endgroup$
– Dave
Jul 26 at 18:11
|
show 1 more comment
$begingroup$
Use for example a beta distribution, shifted and rescaled to your min and max.
The beta is easy to use here since it is bounded to the interval [0;1], but the mean can be placed by parameterization.
You have mean=alpha/(alpha+beta) and hence beta=alpha/mean - alpha, or in the rescaled version beta=alpha*(max-min)/(mean-min) - alpha. With the parameter alpha you can control the shape, whether you want more values in the extremes or not.
You can also consider a truncated normal distribution. This works quite similar. Again you have to decide for a shape by choosing the standard deviation. This is straight forward to use - fix min, max, mean, and sigma. Compute the resulting mu and you have your data distribution. But the shape of this distribution will look truncated, and not as elegant as a beta distribution.
Beta distributions are smooth. If you want something simpler consider simply using two uniform distributions. Without loss of generality, assume min=0 and max=1 by rescaling and shifting.
Split the interval at the (rescaled) mean. Sampling uniformly from [0;mean] with probability p has E[X]=mean/2 and from [mean;1] with 1-p has E[X]=(mean+1)/2. Combining these two and the desired outcome yields p*mean/2+(1-p)(mean+1)/2= mean and solving for p Yields p=1-mean.
Hence a simple strategy is to uniformly sample from [min;mean] with probability 1-(mean-min)/(max-min) and from [mean;max] otherwise. The drawback is the non-smooth (stepwise) CDF.
Ultimately, you could also design the CDF directly. This would be easy if you had fixed the median, but with the mean you'll need to take the values into account. The idea is that you might want to enforce a stepwise linear or polynomial CDF, and choose the function parameters such that the resulting mean is as desired. Please do the math for this yourself.
Last but not least: you are probably asking for a skewed distribution. I would rather fix the median, not the mean. This makes above constructions a lot easier and more meaningful. The mean of a skewed distribution is not too reliable.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f419348%2fhow-do-i-generate-distribution-of-positive-numbers-only-with-min-max-and-mean%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
While the problem is very much ill-posed, since there is an infinite range of distributions satisfying these constraints, a possible solution is to find the maximum entropy distribution under the constraint of a support of $(80,12000)$ [thus using the uniform measure on that interval as the reference measure] and a mean of $mathbb E[X]=500$ is of the form
$$p(x)=expalpha+beta x,mathbb I_(80,1200)(x)$$
with
$$int_80^12000 expalpha+beta x,text dx=1qquadtextandqquad
int_80^12000 xexpalpha+beta x,text dx=500$$
which leads to
$$exp-alpha=beta^-1[exp12000beta-exp80beta]$$
and$$beta^-1expalpha[12000exp12000beta-80exp80beta]-beta^-1=500$$which can be solved numerically in $beta$. Leading to
$$beta^*=-.00238quadtextandquadalpha^*=-5.850$$which can be easily simulated as a truncated exponential distribution, by inversion of the cdf, e.g., using qexp()
in R. For instance,
function(n=1)
return(qexp(pexp(80,.00238)+runif(n)*
(pexp(12000,.00238)-pexp(80,.00238)),.00238))
If the question is instead about simulating a sample $X_1:2000$ such that $$min(X_1:2000)=80,quadmax(X_1:2000)=1200,quadbar X_1:2000=500$$
there is again an infinite range of solutions, the simplest being a uniform Multinomial distribution constrained by its minimum $X_(1)$ being 80 and its maximum $X_(2000)$ being 12000 since
$$underbraceX_(1)_80+cdots+underbraceX_(2000)_12000 = 80 + 987920 + 12000= underbrace2000_ptimes 500=underbrace10^6_n$$
namely proportional to
$$nchoose 80,n_2,cdots,n_p-1,12000mathbb I_80le n_1leldotsle n_p-1le 1200$$
This is equivalent to simulate a Multinomial
$$mathcal M_1998(987920,1/1998,ldots,1/1998)$$
constrained to $(80,1200)^1998$, ie
x=rmultinom(1,987920,rep(1,1998))
while (min(x)<80||max(x)>12000)
x=rmultinom(1,987920,rep(1,1998))
As an additional remark, let me add that observing a range of (80,12000) for a Multinomial $mathcal M(10⁶;2000)$ is extremely unlikely (in the above simulation, the first attempt is always successful) and a more satisfactory approach would be to infer first about the probability vector of a Multinomial $mathcal M(10⁶;2000;p)$ before predicting the remaining 1998 categories.
$endgroup$
$begingroup$
Is your comment about $X_1:2000$ about ensuring that the sample (not the population) has a mean of 500? Also, two clarifications would be helpful. 1) Why do something in the form of $expalpha + beta X$? 2) Once we solve for $alpha$ and $beta$, how do to simulate draws from that PDF?
$endgroup$
– Dave
Jul 26 at 18:31
$begingroup$
@Xi'an Thank you this helps!
$endgroup$
– user3437212
Jul 26 at 19:38
add a comment |
$begingroup$
While the problem is very much ill-posed, since there is an infinite range of distributions satisfying these constraints, a possible solution is to find the maximum entropy distribution under the constraint of a support of $(80,12000)$ [thus using the uniform measure on that interval as the reference measure] and a mean of $mathbb E[X]=500$ is of the form
$$p(x)=expalpha+beta x,mathbb I_(80,1200)(x)$$
with
$$int_80^12000 expalpha+beta x,text dx=1qquadtextandqquad
int_80^12000 xexpalpha+beta x,text dx=500$$
which leads to
$$exp-alpha=beta^-1[exp12000beta-exp80beta]$$
and$$beta^-1expalpha[12000exp12000beta-80exp80beta]-beta^-1=500$$which can be solved numerically in $beta$. Leading to
$$beta^*=-.00238quadtextandquadalpha^*=-5.850$$which can be easily simulated as a truncated exponential distribution, by inversion of the cdf, e.g., using qexp()
in R. For instance,
function(n=1)
return(qexp(pexp(80,.00238)+runif(n)*
(pexp(12000,.00238)-pexp(80,.00238)),.00238))
If the question is instead about simulating a sample $X_1:2000$ such that $$min(X_1:2000)=80,quadmax(X_1:2000)=1200,quadbar X_1:2000=500$$
there is again an infinite range of solutions, the simplest being a uniform Multinomial distribution constrained by its minimum $X_(1)$ being 80 and its maximum $X_(2000)$ being 12000 since
$$underbraceX_(1)_80+cdots+underbraceX_(2000)_12000 = 80 + 987920 + 12000= underbrace2000_ptimes 500=underbrace10^6_n$$
namely proportional to
$$nchoose 80,n_2,cdots,n_p-1,12000mathbb I_80le n_1leldotsle n_p-1le 1200$$
This is equivalent to simulate a Multinomial
$$mathcal M_1998(987920,1/1998,ldots,1/1998)$$
constrained to $(80,1200)^1998$, ie
x=rmultinom(1,987920,rep(1,1998))
while (min(x)<80||max(x)>12000)
x=rmultinom(1,987920,rep(1,1998))
As an additional remark, let me add that observing a range of (80,12000) for a Multinomial $mathcal M(10⁶;2000)$ is extremely unlikely (in the above simulation, the first attempt is always successful) and a more satisfactory approach would be to infer first about the probability vector of a Multinomial $mathcal M(10⁶;2000;p)$ before predicting the remaining 1998 categories.
$endgroup$
$begingroup$
Is your comment about $X_1:2000$ about ensuring that the sample (not the population) has a mean of 500? Also, two clarifications would be helpful. 1) Why do something in the form of $expalpha + beta X$? 2) Once we solve for $alpha$ and $beta$, how do to simulate draws from that PDF?
$endgroup$
– Dave
Jul 26 at 18:31
$begingroup$
@Xi'an Thank you this helps!
$endgroup$
– user3437212
Jul 26 at 19:38
add a comment |
$begingroup$
While the problem is very much ill-posed, since there is an infinite range of distributions satisfying these constraints, a possible solution is to find the maximum entropy distribution under the constraint of a support of $(80,12000)$ [thus using the uniform measure on that interval as the reference measure] and a mean of $mathbb E[X]=500$ is of the form
$$p(x)=expalpha+beta x,mathbb I_(80,1200)(x)$$
with
$$int_80^12000 expalpha+beta x,text dx=1qquadtextandqquad
int_80^12000 xexpalpha+beta x,text dx=500$$
which leads to
$$exp-alpha=beta^-1[exp12000beta-exp80beta]$$
and$$beta^-1expalpha[12000exp12000beta-80exp80beta]-beta^-1=500$$which can be solved numerically in $beta$. Leading to
$$beta^*=-.00238quadtextandquadalpha^*=-5.850$$which can be easily simulated as a truncated exponential distribution, by inversion of the cdf, e.g., using qexp()
in R. For instance,
function(n=1)
return(qexp(pexp(80,.00238)+runif(n)*
(pexp(12000,.00238)-pexp(80,.00238)),.00238))
If the question is instead about simulating a sample $X_1:2000$ such that $$min(X_1:2000)=80,quadmax(X_1:2000)=1200,quadbar X_1:2000=500$$
there is again an infinite range of solutions, the simplest being a uniform Multinomial distribution constrained by its minimum $X_(1)$ being 80 and its maximum $X_(2000)$ being 12000 since
$$underbraceX_(1)_80+cdots+underbraceX_(2000)_12000 = 80 + 987920 + 12000= underbrace2000_ptimes 500=underbrace10^6_n$$
namely proportional to
$$nchoose 80,n_2,cdots,n_p-1,12000mathbb I_80le n_1leldotsle n_p-1le 1200$$
This is equivalent to simulate a Multinomial
$$mathcal M_1998(987920,1/1998,ldots,1/1998)$$
constrained to $(80,1200)^1998$, ie
x=rmultinom(1,987920,rep(1,1998))
while (min(x)<80||max(x)>12000)
x=rmultinom(1,987920,rep(1,1998))
As an additional remark, let me add that observing a range of (80,12000) for a Multinomial $mathcal M(10⁶;2000)$ is extremely unlikely (in the above simulation, the first attempt is always successful) and a more satisfactory approach would be to infer first about the probability vector of a Multinomial $mathcal M(10⁶;2000;p)$ before predicting the remaining 1998 categories.
$endgroup$
While the problem is very much ill-posed, since there is an infinite range of distributions satisfying these constraints, a possible solution is to find the maximum entropy distribution under the constraint of a support of $(80,12000)$ [thus using the uniform measure on that interval as the reference measure] and a mean of $mathbb E[X]=500$ is of the form
$$p(x)=expalpha+beta x,mathbb I_(80,1200)(x)$$
with
$$int_80^12000 expalpha+beta x,text dx=1qquadtextandqquad
int_80^12000 xexpalpha+beta x,text dx=500$$
which leads to
$$exp-alpha=beta^-1[exp12000beta-exp80beta]$$
and$$beta^-1expalpha[12000exp12000beta-80exp80beta]-beta^-1=500$$which can be solved numerically in $beta$. Leading to
$$beta^*=-.00238quadtextandquadalpha^*=-5.850$$which can be easily simulated as a truncated exponential distribution, by inversion of the cdf, e.g., using qexp()
in R. For instance,
function(n=1)
return(qexp(pexp(80,.00238)+runif(n)*
(pexp(12000,.00238)-pexp(80,.00238)),.00238))
If the question is instead about simulating a sample $X_1:2000$ such that $$min(X_1:2000)=80,quadmax(X_1:2000)=1200,quadbar X_1:2000=500$$
there is again an infinite range of solutions, the simplest being a uniform Multinomial distribution constrained by its minimum $X_(1)$ being 80 and its maximum $X_(2000)$ being 12000 since
$$underbraceX_(1)_80+cdots+underbraceX_(2000)_12000 = 80 + 987920 + 12000= underbrace2000_ptimes 500=underbrace10^6_n$$
namely proportional to
$$nchoose 80,n_2,cdots,n_p-1,12000mathbb I_80le n_1leldotsle n_p-1le 1200$$
This is equivalent to simulate a Multinomial
$$mathcal M_1998(987920,1/1998,ldots,1/1998)$$
constrained to $(80,1200)^1998$, ie
x=rmultinom(1,987920,rep(1,1998))
while (min(x)<80||max(x)>12000)
x=rmultinom(1,987920,rep(1,1998))
As an additional remark, let me add that observing a range of (80,12000) for a Multinomial $mathcal M(10⁶;2000)$ is extremely unlikely (in the above simulation, the first attempt is always successful) and a more satisfactory approach would be to infer first about the probability vector of a Multinomial $mathcal M(10⁶;2000;p)$ before predicting the remaining 1998 categories.
edited Jul 29 at 10:16
answered Jul 26 at 18:10
Xi'anXi'an
62k8 gold badges99 silver badges378 bronze badges
62k8 gold badges99 silver badges378 bronze badges
$begingroup$
Is your comment about $X_1:2000$ about ensuring that the sample (not the population) has a mean of 500? Also, two clarifications would be helpful. 1) Why do something in the form of $expalpha + beta X$? 2) Once we solve for $alpha$ and $beta$, how do to simulate draws from that PDF?
$endgroup$
– Dave
Jul 26 at 18:31
$begingroup$
@Xi'an Thank you this helps!
$endgroup$
– user3437212
Jul 26 at 19:38
add a comment |
$begingroup$
Is your comment about $X_1:2000$ about ensuring that the sample (not the population) has a mean of 500? Also, two clarifications would be helpful. 1) Why do something in the form of $expalpha + beta X$? 2) Once we solve for $alpha$ and $beta$, how do to simulate draws from that PDF?
$endgroup$
– Dave
Jul 26 at 18:31
$begingroup$
@Xi'an Thank you this helps!
$endgroup$
– user3437212
Jul 26 at 19:38
$begingroup$
Is your comment about $X_1:2000$ about ensuring that the sample (not the population) has a mean of 500? Also, two clarifications would be helpful. 1) Why do something in the form of $expalpha + beta X$? 2) Once we solve for $alpha$ and $beta$, how do to simulate draws from that PDF?
$endgroup$
– Dave
Jul 26 at 18:31
$begingroup$
Is your comment about $X_1:2000$ about ensuring that the sample (not the population) has a mean of 500? Also, two clarifications would be helpful. 1) Why do something in the form of $expalpha + beta X$? 2) Once we solve for $alpha$ and $beta$, how do to simulate draws from that PDF?
$endgroup$
– Dave
Jul 26 at 18:31
$begingroup$
@Xi'an Thank you this helps!
$endgroup$
– user3437212
Jul 26 at 19:38
$begingroup$
@Xi'an Thank you this helps!
$endgroup$
– user3437212
Jul 26 at 19:38
add a comment |
$begingroup$
If you don't care about the distribution aside from min, max, and mean, then there is a simple answer.
Take 96.476510067114100 percent of draws as 80 and 3.523489932885910 percent of draws as 12000. On average, you get 500, and you have your min and max. I calculated the percentages by solving a system of equations
$$a + b =1$$ $$80a + 12000b = 500$$
The first equation establishes the the values must sum to one, making sure that we are dealing with probabilities. The second equation get us our average of 500.
D <- rep(NA,2000) # define a vector of NAs to hold your sampled values
for (i in 1:2000)
X <- rbinom(1,1,0.96476510067114100) # determine which value you'll take, 80 or 12000
if (X==0)D[i] <- 12000 # declare observation i as 12000
if (X==1)D[i] <- 80 # declare observation i as 80
$endgroup$
$begingroup$
Thanks Dave! But I don't want only 80s and 12000s in my distribution. I would like a range of values but all positive
$endgroup$
– user3437212
Jul 26 at 18:03
$begingroup$
I'm sorry I was not clear earlier
$endgroup$
– user3437212
Jul 26 at 18:04
2
$begingroup$
We can construct a different distribution, but what requirements do you have?
$endgroup$
– Dave
Jul 26 at 18:05
$begingroup$
normal or a poisson but no negative values
$endgroup$
– user3437212
Jul 26 at 18:07
8
$begingroup$
Normal is out, since it can take all real values. More helpful, though, would be to know what you're considering either of those distributions. So...why normal or Poisson?
$endgroup$
– Dave
Jul 26 at 18:11
|
show 1 more comment
$begingroup$
If you don't care about the distribution aside from min, max, and mean, then there is a simple answer.
Take 96.476510067114100 percent of draws as 80 and 3.523489932885910 percent of draws as 12000. On average, you get 500, and you have your min and max. I calculated the percentages by solving a system of equations
$$a + b =1$$ $$80a + 12000b = 500$$
The first equation establishes the the values must sum to one, making sure that we are dealing with probabilities. The second equation get us our average of 500.
D <- rep(NA,2000) # define a vector of NAs to hold your sampled values
for (i in 1:2000)
X <- rbinom(1,1,0.96476510067114100) # determine which value you'll take, 80 or 12000
if (X==0)D[i] <- 12000 # declare observation i as 12000
if (X==1)D[i] <- 80 # declare observation i as 80
$endgroup$
$begingroup$
Thanks Dave! But I don't want only 80s and 12000s in my distribution. I would like a range of values but all positive
$endgroup$
– user3437212
Jul 26 at 18:03
$begingroup$
I'm sorry I was not clear earlier
$endgroup$
– user3437212
Jul 26 at 18:04
2
$begingroup$
We can construct a different distribution, but what requirements do you have?
$endgroup$
– Dave
Jul 26 at 18:05
$begingroup$
normal or a poisson but no negative values
$endgroup$
– user3437212
Jul 26 at 18:07
8
$begingroup$
Normal is out, since it can take all real values. More helpful, though, would be to know what you're considering either of those distributions. So...why normal or Poisson?
$endgroup$
– Dave
Jul 26 at 18:11
|
show 1 more comment
$begingroup$
If you don't care about the distribution aside from min, max, and mean, then there is a simple answer.
Take 96.476510067114100 percent of draws as 80 and 3.523489932885910 percent of draws as 12000. On average, you get 500, and you have your min and max. I calculated the percentages by solving a system of equations
$$a + b =1$$ $$80a + 12000b = 500$$
The first equation establishes the the values must sum to one, making sure that we are dealing with probabilities. The second equation get us our average of 500.
D <- rep(NA,2000) # define a vector of NAs to hold your sampled values
for (i in 1:2000)
X <- rbinom(1,1,0.96476510067114100) # determine which value you'll take, 80 or 12000
if (X==0)D[i] <- 12000 # declare observation i as 12000
if (X==1)D[i] <- 80 # declare observation i as 80
$endgroup$
If you don't care about the distribution aside from min, max, and mean, then there is a simple answer.
Take 96.476510067114100 percent of draws as 80 and 3.523489932885910 percent of draws as 12000. On average, you get 500, and you have your min and max. I calculated the percentages by solving a system of equations
$$a + b =1$$ $$80a + 12000b = 500$$
The first equation establishes the the values must sum to one, making sure that we are dealing with probabilities. The second equation get us our average of 500.
D <- rep(NA,2000) # define a vector of NAs to hold your sampled values
for (i in 1:2000)
X <- rbinom(1,1,0.96476510067114100) # determine which value you'll take, 80 or 12000
if (X==0)D[i] <- 12000 # declare observation i as 12000
if (X==1)D[i] <- 80 # declare observation i as 80
edited Jul 26 at 21:04
answered Jul 26 at 18:00
DaveDave
96110 bronze badges
96110 bronze badges
$begingroup$
Thanks Dave! But I don't want only 80s and 12000s in my distribution. I would like a range of values but all positive
$endgroup$
– user3437212
Jul 26 at 18:03
$begingroup$
I'm sorry I was not clear earlier
$endgroup$
– user3437212
Jul 26 at 18:04
2
$begingroup$
We can construct a different distribution, but what requirements do you have?
$endgroup$
– Dave
Jul 26 at 18:05
$begingroup$
normal or a poisson but no negative values
$endgroup$
– user3437212
Jul 26 at 18:07
8
$begingroup$
Normal is out, since it can take all real values. More helpful, though, would be to know what you're considering either of those distributions. So...why normal or Poisson?
$endgroup$
– Dave
Jul 26 at 18:11
|
show 1 more comment
$begingroup$
Thanks Dave! But I don't want only 80s and 12000s in my distribution. I would like a range of values but all positive
$endgroup$
– user3437212
Jul 26 at 18:03
$begingroup$
I'm sorry I was not clear earlier
$endgroup$
– user3437212
Jul 26 at 18:04
2
$begingroup$
We can construct a different distribution, but what requirements do you have?
$endgroup$
– Dave
Jul 26 at 18:05
$begingroup$
normal or a poisson but no negative values
$endgroup$
– user3437212
Jul 26 at 18:07
8
$begingroup$
Normal is out, since it can take all real values. More helpful, though, would be to know what you're considering either of those distributions. So...why normal or Poisson?
$endgroup$
– Dave
Jul 26 at 18:11
$begingroup$
Thanks Dave! But I don't want only 80s and 12000s in my distribution. I would like a range of values but all positive
$endgroup$
– user3437212
Jul 26 at 18:03
$begingroup$
Thanks Dave! But I don't want only 80s and 12000s in my distribution. I would like a range of values but all positive
$endgroup$
– user3437212
Jul 26 at 18:03
$begingroup$
I'm sorry I was not clear earlier
$endgroup$
– user3437212
Jul 26 at 18:04
$begingroup$
I'm sorry I was not clear earlier
$endgroup$
– user3437212
Jul 26 at 18:04
2
2
$begingroup$
We can construct a different distribution, but what requirements do you have?
$endgroup$
– Dave
Jul 26 at 18:05
$begingroup$
We can construct a different distribution, but what requirements do you have?
$endgroup$
– Dave
Jul 26 at 18:05
$begingroup$
normal or a poisson but no negative values
$endgroup$
– user3437212
Jul 26 at 18:07
$begingroup$
normal or a poisson but no negative values
$endgroup$
– user3437212
Jul 26 at 18:07
8
8
$begingroup$
Normal is out, since it can take all real values. More helpful, though, would be to know what you're considering either of those distributions. So...why normal or Poisson?
$endgroup$
– Dave
Jul 26 at 18:11
$begingroup$
Normal is out, since it can take all real values. More helpful, though, would be to know what you're considering either of those distributions. So...why normal or Poisson?
$endgroup$
– Dave
Jul 26 at 18:11
|
show 1 more comment
$begingroup$
Use for example a beta distribution, shifted and rescaled to your min and max.
The beta is easy to use here since it is bounded to the interval [0;1], but the mean can be placed by parameterization.
You have mean=alpha/(alpha+beta) and hence beta=alpha/mean - alpha, or in the rescaled version beta=alpha*(max-min)/(mean-min) - alpha. With the parameter alpha you can control the shape, whether you want more values in the extremes or not.
You can also consider a truncated normal distribution. This works quite similar. Again you have to decide for a shape by choosing the standard deviation. This is straight forward to use - fix min, max, mean, and sigma. Compute the resulting mu and you have your data distribution. But the shape of this distribution will look truncated, and not as elegant as a beta distribution.
Beta distributions are smooth. If you want something simpler consider simply using two uniform distributions. Without loss of generality, assume min=0 and max=1 by rescaling and shifting.
Split the interval at the (rescaled) mean. Sampling uniformly from [0;mean] with probability p has E[X]=mean/2 and from [mean;1] with 1-p has E[X]=(mean+1)/2. Combining these two and the desired outcome yields p*mean/2+(1-p)(mean+1)/2= mean and solving for p Yields p=1-mean.
Hence a simple strategy is to uniformly sample from [min;mean] with probability 1-(mean-min)/(max-min) and from [mean;max] otherwise. The drawback is the non-smooth (stepwise) CDF.
Ultimately, you could also design the CDF directly. This would be easy if you had fixed the median, but with the mean you'll need to take the values into account. The idea is that you might want to enforce a stepwise linear or polynomial CDF, and choose the function parameters such that the resulting mean is as desired. Please do the math for this yourself.
Last but not least: you are probably asking for a skewed distribution. I would rather fix the median, not the mean. This makes above constructions a lot easier and more meaningful. The mean of a skewed distribution is not too reliable.
$endgroup$
add a comment |
$begingroup$
Use for example a beta distribution, shifted and rescaled to your min and max.
The beta is easy to use here since it is bounded to the interval [0;1], but the mean can be placed by parameterization.
You have mean=alpha/(alpha+beta) and hence beta=alpha/mean - alpha, or in the rescaled version beta=alpha*(max-min)/(mean-min) - alpha. With the parameter alpha you can control the shape, whether you want more values in the extremes or not.
You can also consider a truncated normal distribution. This works quite similar. Again you have to decide for a shape by choosing the standard deviation. This is straight forward to use - fix min, max, mean, and sigma. Compute the resulting mu and you have your data distribution. But the shape of this distribution will look truncated, and not as elegant as a beta distribution.
Beta distributions are smooth. If you want something simpler consider simply using two uniform distributions. Without loss of generality, assume min=0 and max=1 by rescaling and shifting.
Split the interval at the (rescaled) mean. Sampling uniformly from [0;mean] with probability p has E[X]=mean/2 and from [mean;1] with 1-p has E[X]=(mean+1)/2. Combining these two and the desired outcome yields p*mean/2+(1-p)(mean+1)/2= mean and solving for p Yields p=1-mean.
Hence a simple strategy is to uniformly sample from [min;mean] with probability 1-(mean-min)/(max-min) and from [mean;max] otherwise. The drawback is the non-smooth (stepwise) CDF.
Ultimately, you could also design the CDF directly. This would be easy if you had fixed the median, but with the mean you'll need to take the values into account. The idea is that you might want to enforce a stepwise linear or polynomial CDF, and choose the function parameters such that the resulting mean is as desired. Please do the math for this yourself.
Last but not least: you are probably asking for a skewed distribution. I would rather fix the median, not the mean. This makes above constructions a lot easier and more meaningful. The mean of a skewed distribution is not too reliable.
$endgroup$
add a comment |
$begingroup$
Use for example a beta distribution, shifted and rescaled to your min and max.
The beta is easy to use here since it is bounded to the interval [0;1], but the mean can be placed by parameterization.
You have mean=alpha/(alpha+beta) and hence beta=alpha/mean - alpha, or in the rescaled version beta=alpha*(max-min)/(mean-min) - alpha. With the parameter alpha you can control the shape, whether you want more values in the extremes or not.
You can also consider a truncated normal distribution. This works quite similar. Again you have to decide for a shape by choosing the standard deviation. This is straight forward to use - fix min, max, mean, and sigma. Compute the resulting mu and you have your data distribution. But the shape of this distribution will look truncated, and not as elegant as a beta distribution.
Beta distributions are smooth. If you want something simpler consider simply using two uniform distributions. Without loss of generality, assume min=0 and max=1 by rescaling and shifting.
Split the interval at the (rescaled) mean. Sampling uniformly from [0;mean] with probability p has E[X]=mean/2 and from [mean;1] with 1-p has E[X]=(mean+1)/2. Combining these two and the desired outcome yields p*mean/2+(1-p)(mean+1)/2= mean and solving for p Yields p=1-mean.
Hence a simple strategy is to uniformly sample from [min;mean] with probability 1-(mean-min)/(max-min) and from [mean;max] otherwise. The drawback is the non-smooth (stepwise) CDF.
Ultimately, you could also design the CDF directly. This would be easy if you had fixed the median, but with the mean you'll need to take the values into account. The idea is that you might want to enforce a stepwise linear or polynomial CDF, and choose the function parameters such that the resulting mean is as desired. Please do the math for this yourself.
Last but not least: you are probably asking for a skewed distribution. I would rather fix the median, not the mean. This makes above constructions a lot easier and more meaningful. The mean of a skewed distribution is not too reliable.
$endgroup$
Use for example a beta distribution, shifted and rescaled to your min and max.
The beta is easy to use here since it is bounded to the interval [0;1], but the mean can be placed by parameterization.
You have mean=alpha/(alpha+beta) and hence beta=alpha/mean - alpha, or in the rescaled version beta=alpha*(max-min)/(mean-min) - alpha. With the parameter alpha you can control the shape, whether you want more values in the extremes or not.
You can also consider a truncated normal distribution. This works quite similar. Again you have to decide for a shape by choosing the standard deviation. This is straight forward to use - fix min, max, mean, and sigma. Compute the resulting mu and you have your data distribution. But the shape of this distribution will look truncated, and not as elegant as a beta distribution.
Beta distributions are smooth. If you want something simpler consider simply using two uniform distributions. Without loss of generality, assume min=0 and max=1 by rescaling and shifting.
Split the interval at the (rescaled) mean. Sampling uniformly from [0;mean] with probability p has E[X]=mean/2 and from [mean;1] with 1-p has E[X]=(mean+1)/2. Combining these two and the desired outcome yields p*mean/2+(1-p)(mean+1)/2= mean and solving for p Yields p=1-mean.
Hence a simple strategy is to uniformly sample from [min;mean] with probability 1-(mean-min)/(max-min) and from [mean;max] otherwise. The drawback is the non-smooth (stepwise) CDF.
Ultimately, you could also design the CDF directly. This would be easy if you had fixed the median, but with the mean you'll need to take the values into account. The idea is that you might want to enforce a stepwise linear or polynomial CDF, and choose the function parameters such that the resulting mean is as desired. Please do the math for this yourself.
Last but not least: you are probably asking for a skewed distribution. I would rather fix the median, not the mean. This makes above constructions a lot easier and more meaningful. The mean of a skewed distribution is not too reliable.
edited Jul 27 at 5:52
answered Jul 27 at 5:23
Anony-MousseAnony-Mousse
32.4k6 gold badges44 silver badges85 bronze badges
32.4k6 gold badges44 silver badges85 bronze badges
add a comment |
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f419348%2fhow-do-i-generate-distribution-of-positive-numbers-only-with-min-max-and-mean%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
$begingroup$
What kind of distribution of values do you want?
$endgroup$
– Dave
Jul 26 at 17:35
$begingroup$
as long as they are positive and has mean = 500, min = 80 and max = 12000
$endgroup$
– user3437212
Jul 26 at 17:36
$begingroup$
min and max is fine but the mean is way too off in triangular distribution, do you have any other suggestion?
$endgroup$
– user3437212
Jul 26 at 17:37
$begingroup$
Strongly related: stats.stackexchange.com/q/236449/35989
$endgroup$
– Tim♦
Jul 26 at 18:44