Why am I getting unevenly-spread results when using $RANDOM?Why does dd from /dev/random give different file sizes?How can I populate a file with random data?RSA 2048 keypair generation: via openssl 0.5s via gpg 30s, why the difference?Why writing to /dev/random does not make parallel reading from /dev/random faster?Using /dev/random, /dev/urandom to generate random datausing random in tar to decompress filesWhen to use /dev/random vs /dev/urandomUsing the NeuG TRNG with /dev/random?Bash: How to generate random float number using $RANDOMMy script produces the same output when using $RANDOM
About the number of real roots
Is purchasing foreign currency before going abroad a losing proposition?
Why hasn't the U.S. government paid war reparations to any country it attacked?
Do native speakers use ZVE or CPU?
Why doesn't the Lars family (and thus Luke) speak Huttese as their first language?
How do a planet's moons and a planet's rings interact?
Why do they not say "The Baby"
Why does java.time.Period#normalized() not normalize days?
Was adding milk to tea started to reduce employee tea break time?
Would letting a multiclass character rebuild their character to be single-classed be game-breaking?
QGIS Linestring rendering curves between vertex
TikZ Can I draw an arrow by specifying the initial point, direction, and length?
How can an advanced civilization forget how to manufacture its technology?
How might the United Kingdom become a republic?
Hacker Rank : Electronics Shop
Can I intentionally omit previous work experience or pretend it doesn't exist when applying for jobs?
Create dashed intersections with labels using pgfplots and tikz
Bob's unnecessary trip to the shops
Why is dry soil hydrophobic? Bad gardener paradox
Back to the nineties!
Is killing off one of my queer characters homophobic?
Cubic programming and beyond?
Rearranging the formula
Why does the autopilot disengage even when it does not receive pilot input?
Why am I getting unevenly-spread results when using $RANDOM?
Why does dd from /dev/random give different file sizes?How can I populate a file with random data?RSA 2048 keypair generation: via openssl 0.5s via gpg 30s, why the difference?Why writing to /dev/random does not make parallel reading from /dev/random faster?Using /dev/random, /dev/urandom to generate random datausing random in tar to decompress filesWhen to use /dev/random vs /dev/urandomUsing the NeuG TRNG with /dev/random?Bash: How to generate random float number using $RANDOMMy script produces the same output when using $RANDOM
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I read about RNGs on Wikipedia and $RANDOM
function on TLDP but it doesn't really explain this result:
$ max=$((6*3600))
$ for f in 1..100000; do echo $(($RANDOM%max/3600)); done | sort | uniq -c
21787 0
22114 1
21933 2
12157 3
10938 4
11071 5
Why are the values above about 2x more inclined to be 0, 1, 2 than 3, 4, 5 but when I change the max modulo they're almost equally spread over all 10 values?
$ max=$((9*3600))
$ for f in 1..100000; do echo $(($RANDOM%max/3600)); done | sort | uniq -c
11940 0
11199 1
10898 2
10945 3
11239 4
10928 5
10875 6
10759 7
11217 8
random
|
show 1 more comment
I read about RNGs on Wikipedia and $RANDOM
function on TLDP but it doesn't really explain this result:
$ max=$((6*3600))
$ for f in 1..100000; do echo $(($RANDOM%max/3600)); done | sort | uniq -c
21787 0
22114 1
21933 2
12157 3
10938 4
11071 5
Why are the values above about 2x more inclined to be 0, 1, 2 than 3, 4, 5 but when I change the max modulo they're almost equally spread over all 10 values?
$ max=$((9*3600))
$ for f in 1..100000; do echo $(($RANDOM%max/3600)); done | sort | uniq -c
11940 0
11199 1
10898 2
10945 3
11239 4
10928 5
10875 6
10759 7
11217 8
random
9
The usual answer to this is to reroll (discard the number you received and pick another) if you're between the maximum value for RANDOM and the highest possible value that can divide evenly into your modulo. That's not usual-to-RANDOM, that's usual-to-using-modulo-to-restrict-RNG-domain across all languages/tools/etc. implementing RNGs of that type.
– Charles Duffy
Jul 4 at 21:12
7
See my 2013 article on the source of this bias if you want some nice graphs of how bad it gets: ericlippert.com/2013/12/16/…
– Eric Lippert
Jul 5 at 17:41
1
"The generation of random numbers is too important to be left to chance." - Robert Coveyou. FYI though: most programs are unable to generate truly random numbers
– Jesse_b
Jul 5 at 19:54
@Eric Lippert thank you, I'll read it gladly!
– cprn
Jul 6 at 10:29
1
Note that, even though you are seeing issues due to modulo bias, the$RANDOM
variable does not use a good PRNG internally.
– forest
Jul 7 at 1:19
|
show 1 more comment
I read about RNGs on Wikipedia and $RANDOM
function on TLDP but it doesn't really explain this result:
$ max=$((6*3600))
$ for f in 1..100000; do echo $(($RANDOM%max/3600)); done | sort | uniq -c
21787 0
22114 1
21933 2
12157 3
10938 4
11071 5
Why are the values above about 2x more inclined to be 0, 1, 2 than 3, 4, 5 but when I change the max modulo they're almost equally spread over all 10 values?
$ max=$((9*3600))
$ for f in 1..100000; do echo $(($RANDOM%max/3600)); done | sort | uniq -c
11940 0
11199 1
10898 2
10945 3
11239 4
10928 5
10875 6
10759 7
11217 8
random
I read about RNGs on Wikipedia and $RANDOM
function on TLDP but it doesn't really explain this result:
$ max=$((6*3600))
$ for f in 1..100000; do echo $(($RANDOM%max/3600)); done | sort | uniq -c
21787 0
22114 1
21933 2
12157 3
10938 4
11071 5
Why are the values above about 2x more inclined to be 0, 1, 2 than 3, 4, 5 but when I change the max modulo they're almost equally spread over all 10 values?
$ max=$((9*3600))
$ for f in 1..100000; do echo $(($RANDOM%max/3600)); done | sort | uniq -c
11940 0
11199 1
10898 2
10945 3
11239 4
10928 5
10875 6
10759 7
11217 8
random
random
edited Jul 5 at 15:26
Stephen Kitt
195k26 gold badges463 silver badges535 bronze badges
195k26 gold badges463 silver badges535 bronze badges
asked Jul 4 at 9:10
cprncprn
5161 gold badge9 silver badges20 bronze badges
5161 gold badge9 silver badges20 bronze badges
9
The usual answer to this is to reroll (discard the number you received and pick another) if you're between the maximum value for RANDOM and the highest possible value that can divide evenly into your modulo. That's not usual-to-RANDOM, that's usual-to-using-modulo-to-restrict-RNG-domain across all languages/tools/etc. implementing RNGs of that type.
– Charles Duffy
Jul 4 at 21:12
7
See my 2013 article on the source of this bias if you want some nice graphs of how bad it gets: ericlippert.com/2013/12/16/…
– Eric Lippert
Jul 5 at 17:41
1
"The generation of random numbers is too important to be left to chance." - Robert Coveyou. FYI though: most programs are unable to generate truly random numbers
– Jesse_b
Jul 5 at 19:54
@Eric Lippert thank you, I'll read it gladly!
– cprn
Jul 6 at 10:29
1
Note that, even though you are seeing issues due to modulo bias, the$RANDOM
variable does not use a good PRNG internally.
– forest
Jul 7 at 1:19
|
show 1 more comment
9
The usual answer to this is to reroll (discard the number you received and pick another) if you're between the maximum value for RANDOM and the highest possible value that can divide evenly into your modulo. That's not usual-to-RANDOM, that's usual-to-using-modulo-to-restrict-RNG-domain across all languages/tools/etc. implementing RNGs of that type.
– Charles Duffy
Jul 4 at 21:12
7
See my 2013 article on the source of this bias if you want some nice graphs of how bad it gets: ericlippert.com/2013/12/16/…
– Eric Lippert
Jul 5 at 17:41
1
"The generation of random numbers is too important to be left to chance." - Robert Coveyou. FYI though: most programs are unable to generate truly random numbers
– Jesse_b
Jul 5 at 19:54
@Eric Lippert thank you, I'll read it gladly!
– cprn
Jul 6 at 10:29
1
Note that, even though you are seeing issues due to modulo bias, the$RANDOM
variable does not use a good PRNG internally.
– forest
Jul 7 at 1:19
9
9
The usual answer to this is to reroll (discard the number you received and pick another) if you're between the maximum value for RANDOM and the highest possible value that can divide evenly into your modulo. That's not usual-to-RANDOM, that's usual-to-using-modulo-to-restrict-RNG-domain across all languages/tools/etc. implementing RNGs of that type.
– Charles Duffy
Jul 4 at 21:12
The usual answer to this is to reroll (discard the number you received and pick another) if you're between the maximum value for RANDOM and the highest possible value that can divide evenly into your modulo. That's not usual-to-RANDOM, that's usual-to-using-modulo-to-restrict-RNG-domain across all languages/tools/etc. implementing RNGs of that type.
– Charles Duffy
Jul 4 at 21:12
7
7
See my 2013 article on the source of this bias if you want some nice graphs of how bad it gets: ericlippert.com/2013/12/16/…
– Eric Lippert
Jul 5 at 17:41
See my 2013 article on the source of this bias if you want some nice graphs of how bad it gets: ericlippert.com/2013/12/16/…
– Eric Lippert
Jul 5 at 17:41
1
1
"The generation of random numbers is too important to be left to chance." - Robert Coveyou. FYI though: most programs are unable to generate truly random numbers
– Jesse_b
Jul 5 at 19:54
"The generation of random numbers is too important to be left to chance." - Robert Coveyou. FYI though: most programs are unable to generate truly random numbers
– Jesse_b
Jul 5 at 19:54
@Eric Lippert thank you, I'll read it gladly!
– cprn
Jul 6 at 10:29
@Eric Lippert thank you, I'll read it gladly!
– cprn
Jul 6 at 10:29
1
1
Note that, even though you are seeing issues due to modulo bias, the
$RANDOM
variable does not use a good PRNG internally.– forest
Jul 7 at 1:19
Note that, even though you are seeing issues due to modulo bias, the
$RANDOM
variable does not use a good PRNG internally.– forest
Jul 7 at 1:19
|
show 1 more comment
2 Answers
2
active
oldest
votes
To expand on the topic of modulo bias, your formula is:
max=$((6*3600))
$(($RANDOM%max/3600))
And in this formula, $RANDOM
is a random value in the range 0-32767.
RANDOM Each time this parameter is referenced, a random integer between
0 and 32767 is generated.
It helps to visualize how this maps to possible values:
0 = 0-3599
1 = 3600-7199
2 = 7200-10799
3 = 10800-14399
4 = 14400-17999
5 = 18000-21599
0 = 21600-25199
1 = 25200-28799
2 = 28800-32399
3 = 32400-32767
So in your formula, the probability for 0, 1, 2 is twice that of 4, 5. And probability of 3 is slightly higher than 4, 5 too. Hence your result with 0, 1, 2 as winners and 4, 5 as losers.
When changing to 9*3600
, it turns out as:
0 = 0-3599
1 = 3600-7199
2 = 7200-10799
3 = 10800-14399
4 = 14400-17999
5 = 18000-21599
6 = 21600-25199
7 = 25200-28799
8 = 28800-32399
0 = 32400-32767
1-8 have the same probability, but there is still a slight bias for 0, and hence 0 was still the winner in your test with 100'000 iterations.
To fix the modulo bias, you should first simplify the formula (if you only want 0-5 then the modulo is 6, not 3600 or even crazier number, no sense in that). This simplification alone will reduce your bias by a lot (32766 maps to 0, 32767 to 1 giving a tiny bias to those two numbers).
To get rid of bias altogether, you need to re-roll, (for example) when $RANDOM
is lower than 32768 % 6
(eliminate the states that do not map perfectly to available random range).
max=6
for f in 1..100000
do
r=$RANDOM
while [ $r -lt $((32768 % $max)) ]; do r=$RANDOM; done
echo $(($r%max))
done | sort | uniq -c | sort -n
Test result:
16425 5
16515 1
16720 0
16769 2
16776 4
16795 3
The alternative would be using a different random source that does not have noticable bias (orders of magnitude larger than just 32768 possible values). But implementing a re-roll logic anyway doesn't hurt (even if it likely never comes to pass).
Your answer is largely correct, except: "you need to re-roll when $RANDOM is lower than 32768 % 6" should actually be "equal to or greater than floor((RANDMAX+1)/6)*6" (i.e. 32766), and fix the associated shell code below that.
– Nayuki
Jul 6 at 13:48
@Nayuki if you can point out a specific error (that applies within the given context) I'll be happy to correct it. My solution is just an example, there is different ways to do it. You can remove bias from start range, or end range, or somewhere in the middle, it makes no difference. You can calculate it better (and not do a modulo in every iteration). You can handle special cases such as arbitrary modulos and randmax values, also handle RANDMAX=INTMAX where RANDMAX+1 doesn't exist, but that was not the focus here.
– frostschutz
Jul 6 at 14:47
Your reply is significantly worse than your post. First of all, I pointed out specifically which phrase of yours is factually wrong. Note that "32768 % 6" == 2, so you want to reroll every time $RANDOM < 2? Regarding bias at start/end/midde of range, your entire post is about removing bias at the end of range, and my response caters to exactly that too. Third, you talk about handling RANDMAX=INTMAX, but in your answer you mentioned the value 32768 (= 32767+1) numerous times, which implies you are comfortable with computing RANDMAX+1.
– Nayuki
Jul 6 at 15:05
@Nayuki my code removes 0 and 1, yours removes 32766 and 32767 and I'd like you to elaborate: what difference does it make? I'm only human, I make mistakes, but all you said so far is "it's wrong" without explaining or showing why. Thank you.
– frostschutz
Jul 6 at 15:09
1
Never mind, figured it out. Sorry about the false alarm.
– Nayuki
Jul 6 at 15:17
add a comment |
This is modulo bias. If RANDOM
is well constructed, each value between 0 and 32767 is produced with equal probability. When you use modulo, you change the probabilities: the probabilities of all the values above the modulo are added to the values they map to.
In your example, 6×3600 is approximately two thirds of the range of values. The probabilities of the top third are therefore added to those of the bottom third, which means that values from 0 to 2 (approximately) are twice as likely to be produced as values from 3 to 5. 9×3600 is nearly 32767, so the modulo bias is much smaller and only affects values from 32400 to 32767.
To answer your main question, at least in Bash the random sequence is fully predictable if you know the seed. See intrand32
in variables.c
.
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f528343%2fwhy-am-i-getting-unevenly-spread-results-when-using-random%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
To expand on the topic of modulo bias, your formula is:
max=$((6*3600))
$(($RANDOM%max/3600))
And in this formula, $RANDOM
is a random value in the range 0-32767.
RANDOM Each time this parameter is referenced, a random integer between
0 and 32767 is generated.
It helps to visualize how this maps to possible values:
0 = 0-3599
1 = 3600-7199
2 = 7200-10799
3 = 10800-14399
4 = 14400-17999
5 = 18000-21599
0 = 21600-25199
1 = 25200-28799
2 = 28800-32399
3 = 32400-32767
So in your formula, the probability for 0, 1, 2 is twice that of 4, 5. And probability of 3 is slightly higher than 4, 5 too. Hence your result with 0, 1, 2 as winners and 4, 5 as losers.
When changing to 9*3600
, it turns out as:
0 = 0-3599
1 = 3600-7199
2 = 7200-10799
3 = 10800-14399
4 = 14400-17999
5 = 18000-21599
6 = 21600-25199
7 = 25200-28799
8 = 28800-32399
0 = 32400-32767
1-8 have the same probability, but there is still a slight bias for 0, and hence 0 was still the winner in your test with 100'000 iterations.
To fix the modulo bias, you should first simplify the formula (if you only want 0-5 then the modulo is 6, not 3600 or even crazier number, no sense in that). This simplification alone will reduce your bias by a lot (32766 maps to 0, 32767 to 1 giving a tiny bias to those two numbers).
To get rid of bias altogether, you need to re-roll, (for example) when $RANDOM
is lower than 32768 % 6
(eliminate the states that do not map perfectly to available random range).
max=6
for f in 1..100000
do
r=$RANDOM
while [ $r -lt $((32768 % $max)) ]; do r=$RANDOM; done
echo $(($r%max))
done | sort | uniq -c | sort -n
Test result:
16425 5
16515 1
16720 0
16769 2
16776 4
16795 3
The alternative would be using a different random source that does not have noticable bias (orders of magnitude larger than just 32768 possible values). But implementing a re-roll logic anyway doesn't hurt (even if it likely never comes to pass).
Your answer is largely correct, except: "you need to re-roll when $RANDOM is lower than 32768 % 6" should actually be "equal to or greater than floor((RANDMAX+1)/6)*6" (i.e. 32766), and fix the associated shell code below that.
– Nayuki
Jul 6 at 13:48
@Nayuki if you can point out a specific error (that applies within the given context) I'll be happy to correct it. My solution is just an example, there is different ways to do it. You can remove bias from start range, or end range, or somewhere in the middle, it makes no difference. You can calculate it better (and not do a modulo in every iteration). You can handle special cases such as arbitrary modulos and randmax values, also handle RANDMAX=INTMAX where RANDMAX+1 doesn't exist, but that was not the focus here.
– frostschutz
Jul 6 at 14:47
Your reply is significantly worse than your post. First of all, I pointed out specifically which phrase of yours is factually wrong. Note that "32768 % 6" == 2, so you want to reroll every time $RANDOM < 2? Regarding bias at start/end/midde of range, your entire post is about removing bias at the end of range, and my response caters to exactly that too. Third, you talk about handling RANDMAX=INTMAX, but in your answer you mentioned the value 32768 (= 32767+1) numerous times, which implies you are comfortable with computing RANDMAX+1.
– Nayuki
Jul 6 at 15:05
@Nayuki my code removes 0 and 1, yours removes 32766 and 32767 and I'd like you to elaborate: what difference does it make? I'm only human, I make mistakes, but all you said so far is "it's wrong" without explaining or showing why. Thank you.
– frostschutz
Jul 6 at 15:09
1
Never mind, figured it out. Sorry about the false alarm.
– Nayuki
Jul 6 at 15:17
add a comment |
To expand on the topic of modulo bias, your formula is:
max=$((6*3600))
$(($RANDOM%max/3600))
And in this formula, $RANDOM
is a random value in the range 0-32767.
RANDOM Each time this parameter is referenced, a random integer between
0 and 32767 is generated.
It helps to visualize how this maps to possible values:
0 = 0-3599
1 = 3600-7199
2 = 7200-10799
3 = 10800-14399
4 = 14400-17999
5 = 18000-21599
0 = 21600-25199
1 = 25200-28799
2 = 28800-32399
3 = 32400-32767
So in your formula, the probability for 0, 1, 2 is twice that of 4, 5. And probability of 3 is slightly higher than 4, 5 too. Hence your result with 0, 1, 2 as winners and 4, 5 as losers.
When changing to 9*3600
, it turns out as:
0 = 0-3599
1 = 3600-7199
2 = 7200-10799
3 = 10800-14399
4 = 14400-17999
5 = 18000-21599
6 = 21600-25199
7 = 25200-28799
8 = 28800-32399
0 = 32400-32767
1-8 have the same probability, but there is still a slight bias for 0, and hence 0 was still the winner in your test with 100'000 iterations.
To fix the modulo bias, you should first simplify the formula (if you only want 0-5 then the modulo is 6, not 3600 or even crazier number, no sense in that). This simplification alone will reduce your bias by a lot (32766 maps to 0, 32767 to 1 giving a tiny bias to those two numbers).
To get rid of bias altogether, you need to re-roll, (for example) when $RANDOM
is lower than 32768 % 6
(eliminate the states that do not map perfectly to available random range).
max=6
for f in 1..100000
do
r=$RANDOM
while [ $r -lt $((32768 % $max)) ]; do r=$RANDOM; done
echo $(($r%max))
done | sort | uniq -c | sort -n
Test result:
16425 5
16515 1
16720 0
16769 2
16776 4
16795 3
The alternative would be using a different random source that does not have noticable bias (orders of magnitude larger than just 32768 possible values). But implementing a re-roll logic anyway doesn't hurt (even if it likely never comes to pass).
Your answer is largely correct, except: "you need to re-roll when $RANDOM is lower than 32768 % 6" should actually be "equal to or greater than floor((RANDMAX+1)/6)*6" (i.e. 32766), and fix the associated shell code below that.
– Nayuki
Jul 6 at 13:48
@Nayuki if you can point out a specific error (that applies within the given context) I'll be happy to correct it. My solution is just an example, there is different ways to do it. You can remove bias from start range, or end range, or somewhere in the middle, it makes no difference. You can calculate it better (and not do a modulo in every iteration). You can handle special cases such as arbitrary modulos and randmax values, also handle RANDMAX=INTMAX where RANDMAX+1 doesn't exist, but that was not the focus here.
– frostschutz
Jul 6 at 14:47
Your reply is significantly worse than your post. First of all, I pointed out specifically which phrase of yours is factually wrong. Note that "32768 % 6" == 2, so you want to reroll every time $RANDOM < 2? Regarding bias at start/end/midde of range, your entire post is about removing bias at the end of range, and my response caters to exactly that too. Third, you talk about handling RANDMAX=INTMAX, but in your answer you mentioned the value 32768 (= 32767+1) numerous times, which implies you are comfortable with computing RANDMAX+1.
– Nayuki
Jul 6 at 15:05
@Nayuki my code removes 0 and 1, yours removes 32766 and 32767 and I'd like you to elaborate: what difference does it make? I'm only human, I make mistakes, but all you said so far is "it's wrong" without explaining or showing why. Thank you.
– frostschutz
Jul 6 at 15:09
1
Never mind, figured it out. Sorry about the false alarm.
– Nayuki
Jul 6 at 15:17
add a comment |
To expand on the topic of modulo bias, your formula is:
max=$((6*3600))
$(($RANDOM%max/3600))
And in this formula, $RANDOM
is a random value in the range 0-32767.
RANDOM Each time this parameter is referenced, a random integer between
0 and 32767 is generated.
It helps to visualize how this maps to possible values:
0 = 0-3599
1 = 3600-7199
2 = 7200-10799
3 = 10800-14399
4 = 14400-17999
5 = 18000-21599
0 = 21600-25199
1 = 25200-28799
2 = 28800-32399
3 = 32400-32767
So in your formula, the probability for 0, 1, 2 is twice that of 4, 5. And probability of 3 is slightly higher than 4, 5 too. Hence your result with 0, 1, 2 as winners and 4, 5 as losers.
When changing to 9*3600
, it turns out as:
0 = 0-3599
1 = 3600-7199
2 = 7200-10799
3 = 10800-14399
4 = 14400-17999
5 = 18000-21599
6 = 21600-25199
7 = 25200-28799
8 = 28800-32399
0 = 32400-32767
1-8 have the same probability, but there is still a slight bias for 0, and hence 0 was still the winner in your test with 100'000 iterations.
To fix the modulo bias, you should first simplify the formula (if you only want 0-5 then the modulo is 6, not 3600 or even crazier number, no sense in that). This simplification alone will reduce your bias by a lot (32766 maps to 0, 32767 to 1 giving a tiny bias to those two numbers).
To get rid of bias altogether, you need to re-roll, (for example) when $RANDOM
is lower than 32768 % 6
(eliminate the states that do not map perfectly to available random range).
max=6
for f in 1..100000
do
r=$RANDOM
while [ $r -lt $((32768 % $max)) ]; do r=$RANDOM; done
echo $(($r%max))
done | sort | uniq -c | sort -n
Test result:
16425 5
16515 1
16720 0
16769 2
16776 4
16795 3
The alternative would be using a different random source that does not have noticable bias (orders of magnitude larger than just 32768 possible values). But implementing a re-roll logic anyway doesn't hurt (even if it likely never comes to pass).
To expand on the topic of modulo bias, your formula is:
max=$((6*3600))
$(($RANDOM%max/3600))
And in this formula, $RANDOM
is a random value in the range 0-32767.
RANDOM Each time this parameter is referenced, a random integer between
0 and 32767 is generated.
It helps to visualize how this maps to possible values:
0 = 0-3599
1 = 3600-7199
2 = 7200-10799
3 = 10800-14399
4 = 14400-17999
5 = 18000-21599
0 = 21600-25199
1 = 25200-28799
2 = 28800-32399
3 = 32400-32767
So in your formula, the probability for 0, 1, 2 is twice that of 4, 5. And probability of 3 is slightly higher than 4, 5 too. Hence your result with 0, 1, 2 as winners and 4, 5 as losers.
When changing to 9*3600
, it turns out as:
0 = 0-3599
1 = 3600-7199
2 = 7200-10799
3 = 10800-14399
4 = 14400-17999
5 = 18000-21599
6 = 21600-25199
7 = 25200-28799
8 = 28800-32399
0 = 32400-32767
1-8 have the same probability, but there is still a slight bias for 0, and hence 0 was still the winner in your test with 100'000 iterations.
To fix the modulo bias, you should first simplify the formula (if you only want 0-5 then the modulo is 6, not 3600 or even crazier number, no sense in that). This simplification alone will reduce your bias by a lot (32766 maps to 0, 32767 to 1 giving a tiny bias to those two numbers).
To get rid of bias altogether, you need to re-roll, (for example) when $RANDOM
is lower than 32768 % 6
(eliminate the states that do not map perfectly to available random range).
max=6
for f in 1..100000
do
r=$RANDOM
while [ $r -lt $((32768 % $max)) ]; do r=$RANDOM; done
echo $(($r%max))
done | sort | uniq -c | sort -n
Test result:
16425 5
16515 1
16720 0
16769 2
16776 4
16795 3
The alternative would be using a different random source that does not have noticable bias (orders of magnitude larger than just 32768 possible values). But implementing a re-roll logic anyway doesn't hurt (even if it likely never comes to pass).
answered Jul 4 at 10:02
frostschutzfrostschutz
29.8k2 gold badges66 silver badges98 bronze badges
29.8k2 gold badges66 silver badges98 bronze badges
Your answer is largely correct, except: "you need to re-roll when $RANDOM is lower than 32768 % 6" should actually be "equal to or greater than floor((RANDMAX+1)/6)*6" (i.e. 32766), and fix the associated shell code below that.
– Nayuki
Jul 6 at 13:48
@Nayuki if you can point out a specific error (that applies within the given context) I'll be happy to correct it. My solution is just an example, there is different ways to do it. You can remove bias from start range, or end range, or somewhere in the middle, it makes no difference. You can calculate it better (and not do a modulo in every iteration). You can handle special cases such as arbitrary modulos and randmax values, also handle RANDMAX=INTMAX where RANDMAX+1 doesn't exist, but that was not the focus here.
– frostschutz
Jul 6 at 14:47
Your reply is significantly worse than your post. First of all, I pointed out specifically which phrase of yours is factually wrong. Note that "32768 % 6" == 2, so you want to reroll every time $RANDOM < 2? Regarding bias at start/end/midde of range, your entire post is about removing bias at the end of range, and my response caters to exactly that too. Third, you talk about handling RANDMAX=INTMAX, but in your answer you mentioned the value 32768 (= 32767+1) numerous times, which implies you are comfortable with computing RANDMAX+1.
– Nayuki
Jul 6 at 15:05
@Nayuki my code removes 0 and 1, yours removes 32766 and 32767 and I'd like you to elaborate: what difference does it make? I'm only human, I make mistakes, but all you said so far is "it's wrong" without explaining or showing why. Thank you.
– frostschutz
Jul 6 at 15:09
1
Never mind, figured it out. Sorry about the false alarm.
– Nayuki
Jul 6 at 15:17
add a comment |
Your answer is largely correct, except: "you need to re-roll when $RANDOM is lower than 32768 % 6" should actually be "equal to or greater than floor((RANDMAX+1)/6)*6" (i.e. 32766), and fix the associated shell code below that.
– Nayuki
Jul 6 at 13:48
@Nayuki if you can point out a specific error (that applies within the given context) I'll be happy to correct it. My solution is just an example, there is different ways to do it. You can remove bias from start range, or end range, or somewhere in the middle, it makes no difference. You can calculate it better (and not do a modulo in every iteration). You can handle special cases such as arbitrary modulos and randmax values, also handle RANDMAX=INTMAX where RANDMAX+1 doesn't exist, but that was not the focus here.
– frostschutz
Jul 6 at 14:47
Your reply is significantly worse than your post. First of all, I pointed out specifically which phrase of yours is factually wrong. Note that "32768 % 6" == 2, so you want to reroll every time $RANDOM < 2? Regarding bias at start/end/midde of range, your entire post is about removing bias at the end of range, and my response caters to exactly that too. Third, you talk about handling RANDMAX=INTMAX, but in your answer you mentioned the value 32768 (= 32767+1) numerous times, which implies you are comfortable with computing RANDMAX+1.
– Nayuki
Jul 6 at 15:05
@Nayuki my code removes 0 and 1, yours removes 32766 and 32767 and I'd like you to elaborate: what difference does it make? I'm only human, I make mistakes, but all you said so far is "it's wrong" without explaining or showing why. Thank you.
– frostschutz
Jul 6 at 15:09
1
Never mind, figured it out. Sorry about the false alarm.
– Nayuki
Jul 6 at 15:17
Your answer is largely correct, except: "you need to re-roll when $RANDOM is lower than 32768 % 6" should actually be "equal to or greater than floor((RANDMAX+1)/6)*6" (i.e. 32766), and fix the associated shell code below that.
– Nayuki
Jul 6 at 13:48
Your answer is largely correct, except: "you need to re-roll when $RANDOM is lower than 32768 % 6" should actually be "equal to or greater than floor((RANDMAX+1)/6)*6" (i.e. 32766), and fix the associated shell code below that.
– Nayuki
Jul 6 at 13:48
@Nayuki if you can point out a specific error (that applies within the given context) I'll be happy to correct it. My solution is just an example, there is different ways to do it. You can remove bias from start range, or end range, or somewhere in the middle, it makes no difference. You can calculate it better (and not do a modulo in every iteration). You can handle special cases such as arbitrary modulos and randmax values, also handle RANDMAX=INTMAX where RANDMAX+1 doesn't exist, but that was not the focus here.
– frostschutz
Jul 6 at 14:47
@Nayuki if you can point out a specific error (that applies within the given context) I'll be happy to correct it. My solution is just an example, there is different ways to do it. You can remove bias from start range, or end range, or somewhere in the middle, it makes no difference. You can calculate it better (and not do a modulo in every iteration). You can handle special cases such as arbitrary modulos and randmax values, also handle RANDMAX=INTMAX where RANDMAX+1 doesn't exist, but that was not the focus here.
– frostschutz
Jul 6 at 14:47
Your reply is significantly worse than your post. First of all, I pointed out specifically which phrase of yours is factually wrong. Note that "32768 % 6" == 2, so you want to reroll every time $RANDOM < 2? Regarding bias at start/end/midde of range, your entire post is about removing bias at the end of range, and my response caters to exactly that too. Third, you talk about handling RANDMAX=INTMAX, but in your answer you mentioned the value 32768 (= 32767+1) numerous times, which implies you are comfortable with computing RANDMAX+1.
– Nayuki
Jul 6 at 15:05
Your reply is significantly worse than your post. First of all, I pointed out specifically which phrase of yours is factually wrong. Note that "32768 % 6" == 2, so you want to reroll every time $RANDOM < 2? Regarding bias at start/end/midde of range, your entire post is about removing bias at the end of range, and my response caters to exactly that too. Third, you talk about handling RANDMAX=INTMAX, but in your answer you mentioned the value 32768 (= 32767+1) numerous times, which implies you are comfortable with computing RANDMAX+1.
– Nayuki
Jul 6 at 15:05
@Nayuki my code removes 0 and 1, yours removes 32766 and 32767 and I'd like you to elaborate: what difference does it make? I'm only human, I make mistakes, but all you said so far is "it's wrong" without explaining or showing why. Thank you.
– frostschutz
Jul 6 at 15:09
@Nayuki my code removes 0 and 1, yours removes 32766 and 32767 and I'd like you to elaborate: what difference does it make? I'm only human, I make mistakes, but all you said so far is "it's wrong" without explaining or showing why. Thank you.
– frostschutz
Jul 6 at 15:09
1
1
Never mind, figured it out. Sorry about the false alarm.
– Nayuki
Jul 6 at 15:17
Never mind, figured it out. Sorry about the false alarm.
– Nayuki
Jul 6 at 15:17
add a comment |
This is modulo bias. If RANDOM
is well constructed, each value between 0 and 32767 is produced with equal probability. When you use modulo, you change the probabilities: the probabilities of all the values above the modulo are added to the values they map to.
In your example, 6×3600 is approximately two thirds of the range of values. The probabilities of the top third are therefore added to those of the bottom third, which means that values from 0 to 2 (approximately) are twice as likely to be produced as values from 3 to 5. 9×3600 is nearly 32767, so the modulo bias is much smaller and only affects values from 32400 to 32767.
To answer your main question, at least in Bash the random sequence is fully predictable if you know the seed. See intrand32
in variables.c
.
add a comment |
This is modulo bias. If RANDOM
is well constructed, each value between 0 and 32767 is produced with equal probability. When you use modulo, you change the probabilities: the probabilities of all the values above the modulo are added to the values they map to.
In your example, 6×3600 is approximately two thirds of the range of values. The probabilities of the top third are therefore added to those of the bottom third, which means that values from 0 to 2 (approximately) are twice as likely to be produced as values from 3 to 5. 9×3600 is nearly 32767, so the modulo bias is much smaller and only affects values from 32400 to 32767.
To answer your main question, at least in Bash the random sequence is fully predictable if you know the seed. See intrand32
in variables.c
.
add a comment |
This is modulo bias. If RANDOM
is well constructed, each value between 0 and 32767 is produced with equal probability. When you use modulo, you change the probabilities: the probabilities of all the values above the modulo are added to the values they map to.
In your example, 6×3600 is approximately two thirds of the range of values. The probabilities of the top third are therefore added to those of the bottom third, which means that values from 0 to 2 (approximately) are twice as likely to be produced as values from 3 to 5. 9×3600 is nearly 32767, so the modulo bias is much smaller and only affects values from 32400 to 32767.
To answer your main question, at least in Bash the random sequence is fully predictable if you know the seed. See intrand32
in variables.c
.
This is modulo bias. If RANDOM
is well constructed, each value between 0 and 32767 is produced with equal probability. When you use modulo, you change the probabilities: the probabilities of all the values above the modulo are added to the values they map to.
In your example, 6×3600 is approximately two thirds of the range of values. The probabilities of the top third are therefore added to those of the bottom third, which means that values from 0 to 2 (approximately) are twice as likely to be produced as values from 3 to 5. 9×3600 is nearly 32767, so the modulo bias is much smaller and only affects values from 32400 to 32767.
To answer your main question, at least in Bash the random sequence is fully predictable if you know the seed. See intrand32
in variables.c
.
edited Jul 4 at 9:29
answered Jul 4 at 9:17
Stephen KittStephen Kitt
195k26 gold badges463 silver badges535 bronze badges
195k26 gold badges463 silver badges535 bronze badges
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f528343%2fwhy-am-i-getting-unevenly-spread-results-when-using-random%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
9
The usual answer to this is to reroll (discard the number you received and pick another) if you're between the maximum value for RANDOM and the highest possible value that can divide evenly into your modulo. That's not usual-to-RANDOM, that's usual-to-using-modulo-to-restrict-RNG-domain across all languages/tools/etc. implementing RNGs of that type.
– Charles Duffy
Jul 4 at 21:12
7
See my 2013 article on the source of this bias if you want some nice graphs of how bad it gets: ericlippert.com/2013/12/16/…
– Eric Lippert
Jul 5 at 17:41
1
"The generation of random numbers is too important to be left to chance." - Robert Coveyou. FYI though: most programs are unable to generate truly random numbers
– Jesse_b
Jul 5 at 19:54
@Eric Lippert thank you, I'll read it gladly!
– cprn
Jul 6 at 10:29
1
Note that, even though you are seeing issues due to modulo bias, the
$RANDOM
variable does not use a good PRNG internally.– forest
Jul 7 at 1:19