Why am I getting unevenly-spread results when using $RANDOM?Why does dd from /dev/random give different file sizes?How can I populate a file with random data?RSA 2048 keypair generation: via openssl 0.5s via gpg 30s, why the difference?Why writing to /dev/random does not make parallel reading from /dev/random faster?Using /dev/random, /dev/urandom to generate random datausing random in tar to decompress filesWhen to use /dev/random vs /dev/urandomUsing the NeuG TRNG with /dev/random?Bash: How to generate random float number using $RANDOMMy script produces the same output when using $RANDOM

About the number of real roots

Is purchasing foreign currency before going abroad a losing proposition?

Why hasn't the U.S. government paid war reparations to any country it attacked?

Do native speakers use ZVE or CPU?

Why doesn't the Lars family (and thus Luke) speak Huttese as their first language?

How do a planet's moons and a planet's rings interact?

Why do they not say "The Baby"

Why does java.time.Period#normalized() not normalize days?

Was adding milk to tea started to reduce employee tea break time?

Would letting a multiclass character rebuild their character to be single-classed be game-breaking?

QGIS Linestring rendering curves between vertex

TikZ Can I draw an arrow by specifying the initial point, direction, and length?

How can an advanced civilization forget how to manufacture its technology?

How might the United Kingdom become a republic?

Hacker Rank : Electronics Shop

Can I intentionally omit previous work experience or pretend it doesn't exist when applying for jobs?

Create dashed intersections with labels using pgfplots and tikz

Bob's unnecessary trip to the shops

Why is dry soil hydrophobic? Bad gardener paradox

Back to the nineties!

Is killing off one of my queer characters homophobic?

Cubic programming and beyond?

Rearranging the formula

Why does the autopilot disengage even when it does not receive pilot input?



Why am I getting unevenly-spread results when using $RANDOM?


Why does dd from /dev/random give different file sizes?How can I populate a file with random data?RSA 2048 keypair generation: via openssl 0.5s via gpg 30s, why the difference?Why writing to /dev/random does not make parallel reading from /dev/random faster?Using /dev/random, /dev/urandom to generate random datausing random in tar to decompress filesWhen to use /dev/random vs /dev/urandomUsing the NeuG TRNG with /dev/random?Bash: How to generate random float number using $RANDOMMy script produces the same output when using $RANDOM






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








14















I read about RNGs on Wikipedia and $RANDOM function on TLDP but it doesn't really explain this result:



$ max=$((6*3600))
$ for f in 1..100000; do echo $(($RANDOM%max/3600)); done | sort | uniq -c
21787 0
22114 1
21933 2
12157 3
10938 4
11071 5


Why are the values above about 2x more inclined to be 0, 1, 2 than 3, 4, 5 but when I change the max modulo they're almost equally spread over all 10 values?



$ max=$((9*3600))
$ for f in 1..100000; do echo $(($RANDOM%max/3600)); done | sort | uniq -c
11940 0
11199 1
10898 2
10945 3
11239 4
10928 5
10875 6
10759 7
11217 8









share|improve this question



















  • 9





    The usual answer to this is to reroll (discard the number you received and pick another) if you're between the maximum value for RANDOM and the highest possible value that can divide evenly into your modulo. That's not usual-to-RANDOM, that's usual-to-using-modulo-to-restrict-RNG-domain across all languages/tools/etc. implementing RNGs of that type.

    – Charles Duffy
    Jul 4 at 21:12







  • 7





    See my 2013 article on the source of this bias if you want some nice graphs of how bad it gets: ericlippert.com/2013/12/16/…

    – Eric Lippert
    Jul 5 at 17:41







  • 1





    "The generation of random numbers is too important to be left to chance." - Robert Coveyou. FYI though: most programs are unable to generate truly random numbers

    – Jesse_b
    Jul 5 at 19:54












  • @Eric Lippert thank you, I'll read it gladly!

    – cprn
    Jul 6 at 10:29






  • 1





    Note that, even though you are seeing issues due to modulo bias, the $RANDOM variable does not use a good PRNG internally.

    – forest
    Jul 7 at 1:19

















14















I read about RNGs on Wikipedia and $RANDOM function on TLDP but it doesn't really explain this result:



$ max=$((6*3600))
$ for f in 1..100000; do echo $(($RANDOM%max/3600)); done | sort | uniq -c
21787 0
22114 1
21933 2
12157 3
10938 4
11071 5


Why are the values above about 2x more inclined to be 0, 1, 2 than 3, 4, 5 but when I change the max modulo they're almost equally spread over all 10 values?



$ max=$((9*3600))
$ for f in 1..100000; do echo $(($RANDOM%max/3600)); done | sort | uniq -c
11940 0
11199 1
10898 2
10945 3
11239 4
10928 5
10875 6
10759 7
11217 8









share|improve this question



















  • 9





    The usual answer to this is to reroll (discard the number you received and pick another) if you're between the maximum value for RANDOM and the highest possible value that can divide evenly into your modulo. That's not usual-to-RANDOM, that's usual-to-using-modulo-to-restrict-RNG-domain across all languages/tools/etc. implementing RNGs of that type.

    – Charles Duffy
    Jul 4 at 21:12







  • 7





    See my 2013 article on the source of this bias if you want some nice graphs of how bad it gets: ericlippert.com/2013/12/16/…

    – Eric Lippert
    Jul 5 at 17:41







  • 1





    "The generation of random numbers is too important to be left to chance." - Robert Coveyou. FYI though: most programs are unable to generate truly random numbers

    – Jesse_b
    Jul 5 at 19:54












  • @Eric Lippert thank you, I'll read it gladly!

    – cprn
    Jul 6 at 10:29






  • 1





    Note that, even though you are seeing issues due to modulo bias, the $RANDOM variable does not use a good PRNG internally.

    – forest
    Jul 7 at 1:19













14












14








14


3






I read about RNGs on Wikipedia and $RANDOM function on TLDP but it doesn't really explain this result:



$ max=$((6*3600))
$ for f in 1..100000; do echo $(($RANDOM%max/3600)); done | sort | uniq -c
21787 0
22114 1
21933 2
12157 3
10938 4
11071 5


Why are the values above about 2x more inclined to be 0, 1, 2 than 3, 4, 5 but when I change the max modulo they're almost equally spread over all 10 values?



$ max=$((9*3600))
$ for f in 1..100000; do echo $(($RANDOM%max/3600)); done | sort | uniq -c
11940 0
11199 1
10898 2
10945 3
11239 4
10928 5
10875 6
10759 7
11217 8









share|improve this question
















I read about RNGs on Wikipedia and $RANDOM function on TLDP but it doesn't really explain this result:



$ max=$((6*3600))
$ for f in 1..100000; do echo $(($RANDOM%max/3600)); done | sort | uniq -c
21787 0
22114 1
21933 2
12157 3
10938 4
11071 5


Why are the values above about 2x more inclined to be 0, 1, 2 than 3, 4, 5 but when I change the max modulo they're almost equally spread over all 10 values?



$ max=$((9*3600))
$ for f in 1..100000; do echo $(($RANDOM%max/3600)); done | sort | uniq -c
11940 0
11199 1
10898 2
10945 3
11239 4
10928 5
10875 6
10759 7
11217 8






random






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jul 5 at 15:26









Stephen Kitt

195k26 gold badges463 silver badges535 bronze badges




195k26 gold badges463 silver badges535 bronze badges










asked Jul 4 at 9:10









cprncprn

5161 gold badge9 silver badges20 bronze badges




5161 gold badge9 silver badges20 bronze badges







  • 9





    The usual answer to this is to reroll (discard the number you received and pick another) if you're between the maximum value for RANDOM and the highest possible value that can divide evenly into your modulo. That's not usual-to-RANDOM, that's usual-to-using-modulo-to-restrict-RNG-domain across all languages/tools/etc. implementing RNGs of that type.

    – Charles Duffy
    Jul 4 at 21:12







  • 7





    See my 2013 article on the source of this bias if you want some nice graphs of how bad it gets: ericlippert.com/2013/12/16/…

    – Eric Lippert
    Jul 5 at 17:41







  • 1





    "The generation of random numbers is too important to be left to chance." - Robert Coveyou. FYI though: most programs are unable to generate truly random numbers

    – Jesse_b
    Jul 5 at 19:54












  • @Eric Lippert thank you, I'll read it gladly!

    – cprn
    Jul 6 at 10:29






  • 1





    Note that, even though you are seeing issues due to modulo bias, the $RANDOM variable does not use a good PRNG internally.

    – forest
    Jul 7 at 1:19












  • 9





    The usual answer to this is to reroll (discard the number you received and pick another) if you're between the maximum value for RANDOM and the highest possible value that can divide evenly into your modulo. That's not usual-to-RANDOM, that's usual-to-using-modulo-to-restrict-RNG-domain across all languages/tools/etc. implementing RNGs of that type.

    – Charles Duffy
    Jul 4 at 21:12







  • 7





    See my 2013 article on the source of this bias if you want some nice graphs of how bad it gets: ericlippert.com/2013/12/16/…

    – Eric Lippert
    Jul 5 at 17:41







  • 1





    "The generation of random numbers is too important to be left to chance." - Robert Coveyou. FYI though: most programs are unable to generate truly random numbers

    – Jesse_b
    Jul 5 at 19:54












  • @Eric Lippert thank you, I'll read it gladly!

    – cprn
    Jul 6 at 10:29






  • 1





    Note that, even though you are seeing issues due to modulo bias, the $RANDOM variable does not use a good PRNG internally.

    – forest
    Jul 7 at 1:19







9




9





The usual answer to this is to reroll (discard the number you received and pick another) if you're between the maximum value for RANDOM and the highest possible value that can divide evenly into your modulo. That's not usual-to-RANDOM, that's usual-to-using-modulo-to-restrict-RNG-domain across all languages/tools/etc. implementing RNGs of that type.

– Charles Duffy
Jul 4 at 21:12






The usual answer to this is to reroll (discard the number you received and pick another) if you're between the maximum value for RANDOM and the highest possible value that can divide evenly into your modulo. That's not usual-to-RANDOM, that's usual-to-using-modulo-to-restrict-RNG-domain across all languages/tools/etc. implementing RNGs of that type.

– Charles Duffy
Jul 4 at 21:12





7




7





See my 2013 article on the source of this bias if you want some nice graphs of how bad it gets: ericlippert.com/2013/12/16/…

– Eric Lippert
Jul 5 at 17:41






See my 2013 article on the source of this bias if you want some nice graphs of how bad it gets: ericlippert.com/2013/12/16/…

– Eric Lippert
Jul 5 at 17:41





1




1





"The generation of random numbers is too important to be left to chance." - Robert Coveyou. FYI though: most programs are unable to generate truly random numbers

– Jesse_b
Jul 5 at 19:54






"The generation of random numbers is too important to be left to chance." - Robert Coveyou. FYI though: most programs are unable to generate truly random numbers

– Jesse_b
Jul 5 at 19:54














@Eric Lippert thank you, I'll read it gladly!

– cprn
Jul 6 at 10:29





@Eric Lippert thank you, I'll read it gladly!

– cprn
Jul 6 at 10:29




1




1





Note that, even though you are seeing issues due to modulo bias, the $RANDOM variable does not use a good PRNG internally.

– forest
Jul 7 at 1:19





Note that, even though you are seeing issues due to modulo bias, the $RANDOM variable does not use a good PRNG internally.

– forest
Jul 7 at 1:19










2 Answers
2






active

oldest

votes


















36














To expand on the topic of modulo bias, your formula is:



max=$((6*3600))
$(($RANDOM%max/3600))


And in this formula, $RANDOM is a random value in the range 0-32767.



 RANDOM Each time this parameter is referenced, a random integer between
0 and 32767 is generated.


It helps to visualize how this maps to possible values:



0 = 0-3599
1 = 3600-7199
2 = 7200-10799
3 = 10800-14399
4 = 14400-17999
5 = 18000-21599
0 = 21600-25199
1 = 25200-28799
2 = 28800-32399
3 = 32400-32767


So in your formula, the probability for 0, 1, 2 is twice that of 4, 5. And probability of 3 is slightly higher than 4, 5 too. Hence your result with 0, 1, 2 as winners and 4, 5 as losers.



When changing to 9*3600, it turns out as:



0 = 0-3599
1 = 3600-7199
2 = 7200-10799
3 = 10800-14399
4 = 14400-17999
5 = 18000-21599
6 = 21600-25199
7 = 25200-28799
8 = 28800-32399
0 = 32400-32767


1-8 have the same probability, but there is still a slight bias for 0, and hence 0 was still the winner in your test with 100'000 iterations.



To fix the modulo bias, you should first simplify the formula (if you only want 0-5 then the modulo is 6, not 3600 or even crazier number, no sense in that). This simplification alone will reduce your bias by a lot (32766 maps to 0, 32767 to 1 giving a tiny bias to those two numbers).



To get rid of bias altogether, you need to re-roll, (for example) when $RANDOM is lower than 32768 % 6 (eliminate the states that do not map perfectly to available random range).



max=6
for f in 1..100000
do
r=$RANDOM
while [ $r -lt $((32768 % $max)) ]; do r=$RANDOM; done
echo $(($r%max))
done | sort | uniq -c | sort -n


Test result:



 16425 5
16515 1
16720 0
16769 2
16776 4
16795 3


The alternative would be using a different random source that does not have noticable bias (orders of magnitude larger than just 32768 possible values). But implementing a re-roll logic anyway doesn't hurt (even if it likely never comes to pass).






share|improve this answer























  • Your answer is largely correct, except: "you need to re-roll when $RANDOM is lower than 32768 % 6" should actually be "equal to or greater than floor((RANDMAX+1)/6)*6" (i.e. 32766), and fix the associated shell code below that.

    – Nayuki
    Jul 6 at 13:48











  • @Nayuki if you can point out a specific error (that applies within the given context) I'll be happy to correct it. My solution is just an example, there is different ways to do it. You can remove bias from start range, or end range, or somewhere in the middle, it makes no difference. You can calculate it better (and not do a modulo in every iteration). You can handle special cases such as arbitrary modulos and randmax values, also handle RANDMAX=INTMAX where RANDMAX+1 doesn't exist, but that was not the focus here.

    – frostschutz
    Jul 6 at 14:47












  • Your reply is significantly worse than your post. First of all, I pointed out specifically which phrase of yours is factually wrong. Note that "32768 % 6" == 2, so you want to reroll every time $RANDOM < 2? Regarding bias at start/end/midde of range, your entire post is about removing bias at the end of range, and my response caters to exactly that too. Third, you talk about handling RANDMAX=INTMAX, but in your answer you mentioned the value 32768 (= 32767+1) numerous times, which implies you are comfortable with computing RANDMAX+1.

    – Nayuki
    Jul 6 at 15:05











  • @Nayuki my code removes 0 and 1, yours removes 32766 and 32767 and I'd like you to elaborate: what difference does it make? I'm only human, I make mistakes, but all you said so far is "it's wrong" without explaining or showing why. Thank you.

    – frostschutz
    Jul 6 at 15:09






  • 1





    Never mind, figured it out. Sorry about the false alarm.

    – Nayuki
    Jul 6 at 15:17


















23














This is modulo bias. If RANDOM is well constructed, each value between 0 and 32767 is produced with equal probability. When you use modulo, you change the probabilities: the probabilities of all the values above the modulo are added to the values they map to.



In your example, 6×3600 is approximately two thirds of the range of values. The probabilities of the top third are therefore added to those of the bottom third, which means that values from 0 to 2 (approximately) are twice as likely to be produced as values from 3 to 5. 9×3600 is nearly 32767, so the modulo bias is much smaller and only affects values from 32400 to 32767.



To answer your main question, at least in Bash the random sequence is fully predictable if you know the seed. See intrand32 in variables.c.






share|improve this answer



























    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f528343%2fwhy-am-i-getting-unevenly-spread-results-when-using-random%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    36














    To expand on the topic of modulo bias, your formula is:



    max=$((6*3600))
    $(($RANDOM%max/3600))


    And in this formula, $RANDOM is a random value in the range 0-32767.



     RANDOM Each time this parameter is referenced, a random integer between
    0 and 32767 is generated.


    It helps to visualize how this maps to possible values:



    0 = 0-3599
    1 = 3600-7199
    2 = 7200-10799
    3 = 10800-14399
    4 = 14400-17999
    5 = 18000-21599
    0 = 21600-25199
    1 = 25200-28799
    2 = 28800-32399
    3 = 32400-32767


    So in your formula, the probability for 0, 1, 2 is twice that of 4, 5. And probability of 3 is slightly higher than 4, 5 too. Hence your result with 0, 1, 2 as winners and 4, 5 as losers.



    When changing to 9*3600, it turns out as:



    0 = 0-3599
    1 = 3600-7199
    2 = 7200-10799
    3 = 10800-14399
    4 = 14400-17999
    5 = 18000-21599
    6 = 21600-25199
    7 = 25200-28799
    8 = 28800-32399
    0 = 32400-32767


    1-8 have the same probability, but there is still a slight bias for 0, and hence 0 was still the winner in your test with 100'000 iterations.



    To fix the modulo bias, you should first simplify the formula (if you only want 0-5 then the modulo is 6, not 3600 or even crazier number, no sense in that). This simplification alone will reduce your bias by a lot (32766 maps to 0, 32767 to 1 giving a tiny bias to those two numbers).



    To get rid of bias altogether, you need to re-roll, (for example) when $RANDOM is lower than 32768 % 6 (eliminate the states that do not map perfectly to available random range).



    max=6
    for f in 1..100000
    do
    r=$RANDOM
    while [ $r -lt $((32768 % $max)) ]; do r=$RANDOM; done
    echo $(($r%max))
    done | sort | uniq -c | sort -n


    Test result:



     16425 5
    16515 1
    16720 0
    16769 2
    16776 4
    16795 3


    The alternative would be using a different random source that does not have noticable bias (orders of magnitude larger than just 32768 possible values). But implementing a re-roll logic anyway doesn't hurt (even if it likely never comes to pass).






    share|improve this answer























    • Your answer is largely correct, except: "you need to re-roll when $RANDOM is lower than 32768 % 6" should actually be "equal to or greater than floor((RANDMAX+1)/6)*6" (i.e. 32766), and fix the associated shell code below that.

      – Nayuki
      Jul 6 at 13:48











    • @Nayuki if you can point out a specific error (that applies within the given context) I'll be happy to correct it. My solution is just an example, there is different ways to do it. You can remove bias from start range, or end range, or somewhere in the middle, it makes no difference. You can calculate it better (and not do a modulo in every iteration). You can handle special cases such as arbitrary modulos and randmax values, also handle RANDMAX=INTMAX where RANDMAX+1 doesn't exist, but that was not the focus here.

      – frostschutz
      Jul 6 at 14:47












    • Your reply is significantly worse than your post. First of all, I pointed out specifically which phrase of yours is factually wrong. Note that "32768 % 6" == 2, so you want to reroll every time $RANDOM < 2? Regarding bias at start/end/midde of range, your entire post is about removing bias at the end of range, and my response caters to exactly that too. Third, you talk about handling RANDMAX=INTMAX, but in your answer you mentioned the value 32768 (= 32767+1) numerous times, which implies you are comfortable with computing RANDMAX+1.

      – Nayuki
      Jul 6 at 15:05











    • @Nayuki my code removes 0 and 1, yours removes 32766 and 32767 and I'd like you to elaborate: what difference does it make? I'm only human, I make mistakes, but all you said so far is "it's wrong" without explaining or showing why. Thank you.

      – frostschutz
      Jul 6 at 15:09






    • 1





      Never mind, figured it out. Sorry about the false alarm.

      – Nayuki
      Jul 6 at 15:17















    36














    To expand on the topic of modulo bias, your formula is:



    max=$((6*3600))
    $(($RANDOM%max/3600))


    And in this formula, $RANDOM is a random value in the range 0-32767.



     RANDOM Each time this parameter is referenced, a random integer between
    0 and 32767 is generated.


    It helps to visualize how this maps to possible values:



    0 = 0-3599
    1 = 3600-7199
    2 = 7200-10799
    3 = 10800-14399
    4 = 14400-17999
    5 = 18000-21599
    0 = 21600-25199
    1 = 25200-28799
    2 = 28800-32399
    3 = 32400-32767


    So in your formula, the probability for 0, 1, 2 is twice that of 4, 5. And probability of 3 is slightly higher than 4, 5 too. Hence your result with 0, 1, 2 as winners and 4, 5 as losers.



    When changing to 9*3600, it turns out as:



    0 = 0-3599
    1 = 3600-7199
    2 = 7200-10799
    3 = 10800-14399
    4 = 14400-17999
    5 = 18000-21599
    6 = 21600-25199
    7 = 25200-28799
    8 = 28800-32399
    0 = 32400-32767


    1-8 have the same probability, but there is still a slight bias for 0, and hence 0 was still the winner in your test with 100'000 iterations.



    To fix the modulo bias, you should first simplify the formula (if you only want 0-5 then the modulo is 6, not 3600 or even crazier number, no sense in that). This simplification alone will reduce your bias by a lot (32766 maps to 0, 32767 to 1 giving a tiny bias to those two numbers).



    To get rid of bias altogether, you need to re-roll, (for example) when $RANDOM is lower than 32768 % 6 (eliminate the states that do not map perfectly to available random range).



    max=6
    for f in 1..100000
    do
    r=$RANDOM
    while [ $r -lt $((32768 % $max)) ]; do r=$RANDOM; done
    echo $(($r%max))
    done | sort | uniq -c | sort -n


    Test result:



     16425 5
    16515 1
    16720 0
    16769 2
    16776 4
    16795 3


    The alternative would be using a different random source that does not have noticable bias (orders of magnitude larger than just 32768 possible values). But implementing a re-roll logic anyway doesn't hurt (even if it likely never comes to pass).






    share|improve this answer























    • Your answer is largely correct, except: "you need to re-roll when $RANDOM is lower than 32768 % 6" should actually be "equal to or greater than floor((RANDMAX+1)/6)*6" (i.e. 32766), and fix the associated shell code below that.

      – Nayuki
      Jul 6 at 13:48











    • @Nayuki if you can point out a specific error (that applies within the given context) I'll be happy to correct it. My solution is just an example, there is different ways to do it. You can remove bias from start range, or end range, or somewhere in the middle, it makes no difference. You can calculate it better (and not do a modulo in every iteration). You can handle special cases such as arbitrary modulos and randmax values, also handle RANDMAX=INTMAX where RANDMAX+1 doesn't exist, but that was not the focus here.

      – frostschutz
      Jul 6 at 14:47












    • Your reply is significantly worse than your post. First of all, I pointed out specifically which phrase of yours is factually wrong. Note that "32768 % 6" == 2, so you want to reroll every time $RANDOM < 2? Regarding bias at start/end/midde of range, your entire post is about removing bias at the end of range, and my response caters to exactly that too. Third, you talk about handling RANDMAX=INTMAX, but in your answer you mentioned the value 32768 (= 32767+1) numerous times, which implies you are comfortable with computing RANDMAX+1.

      – Nayuki
      Jul 6 at 15:05











    • @Nayuki my code removes 0 and 1, yours removes 32766 and 32767 and I'd like you to elaborate: what difference does it make? I'm only human, I make mistakes, but all you said so far is "it's wrong" without explaining or showing why. Thank you.

      – frostschutz
      Jul 6 at 15:09






    • 1





      Never mind, figured it out. Sorry about the false alarm.

      – Nayuki
      Jul 6 at 15:17













    36












    36








    36







    To expand on the topic of modulo bias, your formula is:



    max=$((6*3600))
    $(($RANDOM%max/3600))


    And in this formula, $RANDOM is a random value in the range 0-32767.



     RANDOM Each time this parameter is referenced, a random integer between
    0 and 32767 is generated.


    It helps to visualize how this maps to possible values:



    0 = 0-3599
    1 = 3600-7199
    2 = 7200-10799
    3 = 10800-14399
    4 = 14400-17999
    5 = 18000-21599
    0 = 21600-25199
    1 = 25200-28799
    2 = 28800-32399
    3 = 32400-32767


    So in your formula, the probability for 0, 1, 2 is twice that of 4, 5. And probability of 3 is slightly higher than 4, 5 too. Hence your result with 0, 1, 2 as winners and 4, 5 as losers.



    When changing to 9*3600, it turns out as:



    0 = 0-3599
    1 = 3600-7199
    2 = 7200-10799
    3 = 10800-14399
    4 = 14400-17999
    5 = 18000-21599
    6 = 21600-25199
    7 = 25200-28799
    8 = 28800-32399
    0 = 32400-32767


    1-8 have the same probability, but there is still a slight bias for 0, and hence 0 was still the winner in your test with 100'000 iterations.



    To fix the modulo bias, you should first simplify the formula (if you only want 0-5 then the modulo is 6, not 3600 or even crazier number, no sense in that). This simplification alone will reduce your bias by a lot (32766 maps to 0, 32767 to 1 giving a tiny bias to those two numbers).



    To get rid of bias altogether, you need to re-roll, (for example) when $RANDOM is lower than 32768 % 6 (eliminate the states that do not map perfectly to available random range).



    max=6
    for f in 1..100000
    do
    r=$RANDOM
    while [ $r -lt $((32768 % $max)) ]; do r=$RANDOM; done
    echo $(($r%max))
    done | sort | uniq -c | sort -n


    Test result:



     16425 5
    16515 1
    16720 0
    16769 2
    16776 4
    16795 3


    The alternative would be using a different random source that does not have noticable bias (orders of magnitude larger than just 32768 possible values). But implementing a re-roll logic anyway doesn't hurt (even if it likely never comes to pass).






    share|improve this answer













    To expand on the topic of modulo bias, your formula is:



    max=$((6*3600))
    $(($RANDOM%max/3600))


    And in this formula, $RANDOM is a random value in the range 0-32767.



     RANDOM Each time this parameter is referenced, a random integer between
    0 and 32767 is generated.


    It helps to visualize how this maps to possible values:



    0 = 0-3599
    1 = 3600-7199
    2 = 7200-10799
    3 = 10800-14399
    4 = 14400-17999
    5 = 18000-21599
    0 = 21600-25199
    1 = 25200-28799
    2 = 28800-32399
    3 = 32400-32767


    So in your formula, the probability for 0, 1, 2 is twice that of 4, 5. And probability of 3 is slightly higher than 4, 5 too. Hence your result with 0, 1, 2 as winners and 4, 5 as losers.



    When changing to 9*3600, it turns out as:



    0 = 0-3599
    1 = 3600-7199
    2 = 7200-10799
    3 = 10800-14399
    4 = 14400-17999
    5 = 18000-21599
    6 = 21600-25199
    7 = 25200-28799
    8 = 28800-32399
    0 = 32400-32767


    1-8 have the same probability, but there is still a slight bias for 0, and hence 0 was still the winner in your test with 100'000 iterations.



    To fix the modulo bias, you should first simplify the formula (if you only want 0-5 then the modulo is 6, not 3600 or even crazier number, no sense in that). This simplification alone will reduce your bias by a lot (32766 maps to 0, 32767 to 1 giving a tiny bias to those two numbers).



    To get rid of bias altogether, you need to re-roll, (for example) when $RANDOM is lower than 32768 % 6 (eliminate the states that do not map perfectly to available random range).



    max=6
    for f in 1..100000
    do
    r=$RANDOM
    while [ $r -lt $((32768 % $max)) ]; do r=$RANDOM; done
    echo $(($r%max))
    done | sort | uniq -c | sort -n


    Test result:



     16425 5
    16515 1
    16720 0
    16769 2
    16776 4
    16795 3


    The alternative would be using a different random source that does not have noticable bias (orders of magnitude larger than just 32768 possible values). But implementing a re-roll logic anyway doesn't hurt (even if it likely never comes to pass).







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Jul 4 at 10:02









    frostschutzfrostschutz

    29.8k2 gold badges66 silver badges98 bronze badges




    29.8k2 gold badges66 silver badges98 bronze badges












    • Your answer is largely correct, except: "you need to re-roll when $RANDOM is lower than 32768 % 6" should actually be "equal to or greater than floor((RANDMAX+1)/6)*6" (i.e. 32766), and fix the associated shell code below that.

      – Nayuki
      Jul 6 at 13:48











    • @Nayuki if you can point out a specific error (that applies within the given context) I'll be happy to correct it. My solution is just an example, there is different ways to do it. You can remove bias from start range, or end range, or somewhere in the middle, it makes no difference. You can calculate it better (and not do a modulo in every iteration). You can handle special cases such as arbitrary modulos and randmax values, also handle RANDMAX=INTMAX where RANDMAX+1 doesn't exist, but that was not the focus here.

      – frostschutz
      Jul 6 at 14:47












    • Your reply is significantly worse than your post. First of all, I pointed out specifically which phrase of yours is factually wrong. Note that "32768 % 6" == 2, so you want to reroll every time $RANDOM < 2? Regarding bias at start/end/midde of range, your entire post is about removing bias at the end of range, and my response caters to exactly that too. Third, you talk about handling RANDMAX=INTMAX, but in your answer you mentioned the value 32768 (= 32767+1) numerous times, which implies you are comfortable with computing RANDMAX+1.

      – Nayuki
      Jul 6 at 15:05











    • @Nayuki my code removes 0 and 1, yours removes 32766 and 32767 and I'd like you to elaborate: what difference does it make? I'm only human, I make mistakes, but all you said so far is "it's wrong" without explaining or showing why. Thank you.

      – frostschutz
      Jul 6 at 15:09






    • 1





      Never mind, figured it out. Sorry about the false alarm.

      – Nayuki
      Jul 6 at 15:17

















    • Your answer is largely correct, except: "you need to re-roll when $RANDOM is lower than 32768 % 6" should actually be "equal to or greater than floor((RANDMAX+1)/6)*6" (i.e. 32766), and fix the associated shell code below that.

      – Nayuki
      Jul 6 at 13:48











    • @Nayuki if you can point out a specific error (that applies within the given context) I'll be happy to correct it. My solution is just an example, there is different ways to do it. You can remove bias from start range, or end range, or somewhere in the middle, it makes no difference. You can calculate it better (and not do a modulo in every iteration). You can handle special cases such as arbitrary modulos and randmax values, also handle RANDMAX=INTMAX where RANDMAX+1 doesn't exist, but that was not the focus here.

      – frostschutz
      Jul 6 at 14:47












    • Your reply is significantly worse than your post. First of all, I pointed out specifically which phrase of yours is factually wrong. Note that "32768 % 6" == 2, so you want to reroll every time $RANDOM < 2? Regarding bias at start/end/midde of range, your entire post is about removing bias at the end of range, and my response caters to exactly that too. Third, you talk about handling RANDMAX=INTMAX, but in your answer you mentioned the value 32768 (= 32767+1) numerous times, which implies you are comfortable with computing RANDMAX+1.

      – Nayuki
      Jul 6 at 15:05











    • @Nayuki my code removes 0 and 1, yours removes 32766 and 32767 and I'd like you to elaborate: what difference does it make? I'm only human, I make mistakes, but all you said so far is "it's wrong" without explaining or showing why. Thank you.

      – frostschutz
      Jul 6 at 15:09






    • 1





      Never mind, figured it out. Sorry about the false alarm.

      – Nayuki
      Jul 6 at 15:17
















    Your answer is largely correct, except: "you need to re-roll when $RANDOM is lower than 32768 % 6" should actually be "equal to or greater than floor((RANDMAX+1)/6)*6" (i.e. 32766), and fix the associated shell code below that.

    – Nayuki
    Jul 6 at 13:48





    Your answer is largely correct, except: "you need to re-roll when $RANDOM is lower than 32768 % 6" should actually be "equal to or greater than floor((RANDMAX+1)/6)*6" (i.e. 32766), and fix the associated shell code below that.

    – Nayuki
    Jul 6 at 13:48













    @Nayuki if you can point out a specific error (that applies within the given context) I'll be happy to correct it. My solution is just an example, there is different ways to do it. You can remove bias from start range, or end range, or somewhere in the middle, it makes no difference. You can calculate it better (and not do a modulo in every iteration). You can handle special cases such as arbitrary modulos and randmax values, also handle RANDMAX=INTMAX where RANDMAX+1 doesn't exist, but that was not the focus here.

    – frostschutz
    Jul 6 at 14:47






    @Nayuki if you can point out a specific error (that applies within the given context) I'll be happy to correct it. My solution is just an example, there is different ways to do it. You can remove bias from start range, or end range, or somewhere in the middle, it makes no difference. You can calculate it better (and not do a modulo in every iteration). You can handle special cases such as arbitrary modulos and randmax values, also handle RANDMAX=INTMAX where RANDMAX+1 doesn't exist, but that was not the focus here.

    – frostschutz
    Jul 6 at 14:47














    Your reply is significantly worse than your post. First of all, I pointed out specifically which phrase of yours is factually wrong. Note that "32768 % 6" == 2, so you want to reroll every time $RANDOM < 2? Regarding bias at start/end/midde of range, your entire post is about removing bias at the end of range, and my response caters to exactly that too. Third, you talk about handling RANDMAX=INTMAX, but in your answer you mentioned the value 32768 (= 32767+1) numerous times, which implies you are comfortable with computing RANDMAX+1.

    – Nayuki
    Jul 6 at 15:05





    Your reply is significantly worse than your post. First of all, I pointed out specifically which phrase of yours is factually wrong. Note that "32768 % 6" == 2, so you want to reroll every time $RANDOM < 2? Regarding bias at start/end/midde of range, your entire post is about removing bias at the end of range, and my response caters to exactly that too. Third, you talk about handling RANDMAX=INTMAX, but in your answer you mentioned the value 32768 (= 32767+1) numerous times, which implies you are comfortable with computing RANDMAX+1.

    – Nayuki
    Jul 6 at 15:05













    @Nayuki my code removes 0 and 1, yours removes 32766 and 32767 and I'd like you to elaborate: what difference does it make? I'm only human, I make mistakes, but all you said so far is "it's wrong" without explaining or showing why. Thank you.

    – frostschutz
    Jul 6 at 15:09





    @Nayuki my code removes 0 and 1, yours removes 32766 and 32767 and I'd like you to elaborate: what difference does it make? I'm only human, I make mistakes, but all you said so far is "it's wrong" without explaining or showing why. Thank you.

    – frostschutz
    Jul 6 at 15:09




    1




    1





    Never mind, figured it out. Sorry about the false alarm.

    – Nayuki
    Jul 6 at 15:17





    Never mind, figured it out. Sorry about the false alarm.

    – Nayuki
    Jul 6 at 15:17













    23














    This is modulo bias. If RANDOM is well constructed, each value between 0 and 32767 is produced with equal probability. When you use modulo, you change the probabilities: the probabilities of all the values above the modulo are added to the values they map to.



    In your example, 6×3600 is approximately two thirds of the range of values. The probabilities of the top third are therefore added to those of the bottom third, which means that values from 0 to 2 (approximately) are twice as likely to be produced as values from 3 to 5. 9×3600 is nearly 32767, so the modulo bias is much smaller and only affects values from 32400 to 32767.



    To answer your main question, at least in Bash the random sequence is fully predictable if you know the seed. See intrand32 in variables.c.






    share|improve this answer





























      23














      This is modulo bias. If RANDOM is well constructed, each value between 0 and 32767 is produced with equal probability. When you use modulo, you change the probabilities: the probabilities of all the values above the modulo are added to the values they map to.



      In your example, 6×3600 is approximately two thirds of the range of values. The probabilities of the top third are therefore added to those of the bottom third, which means that values from 0 to 2 (approximately) are twice as likely to be produced as values from 3 to 5. 9×3600 is nearly 32767, so the modulo bias is much smaller and only affects values from 32400 to 32767.



      To answer your main question, at least in Bash the random sequence is fully predictable if you know the seed. See intrand32 in variables.c.






      share|improve this answer



























        23












        23








        23







        This is modulo bias. If RANDOM is well constructed, each value between 0 and 32767 is produced with equal probability. When you use modulo, you change the probabilities: the probabilities of all the values above the modulo are added to the values they map to.



        In your example, 6×3600 is approximately two thirds of the range of values. The probabilities of the top third are therefore added to those of the bottom third, which means that values from 0 to 2 (approximately) are twice as likely to be produced as values from 3 to 5. 9×3600 is nearly 32767, so the modulo bias is much smaller and only affects values from 32400 to 32767.



        To answer your main question, at least in Bash the random sequence is fully predictable if you know the seed. See intrand32 in variables.c.






        share|improve this answer















        This is modulo bias. If RANDOM is well constructed, each value between 0 and 32767 is produced with equal probability. When you use modulo, you change the probabilities: the probabilities of all the values above the modulo are added to the values they map to.



        In your example, 6×3600 is approximately two thirds of the range of values. The probabilities of the top third are therefore added to those of the bottom third, which means that values from 0 to 2 (approximately) are twice as likely to be produced as values from 3 to 5. 9×3600 is nearly 32767, so the modulo bias is much smaller and only affects values from 32400 to 32767.



        To answer your main question, at least in Bash the random sequence is fully predictable if you know the seed. See intrand32 in variables.c.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Jul 4 at 9:29

























        answered Jul 4 at 9:17









        Stephen KittStephen Kitt

        195k26 gold badges463 silver badges535 bronze badges




        195k26 gold badges463 silver badges535 bronze badges



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f528343%2fwhy-am-i-getting-unevenly-spread-results-when-using-random%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

            Circuit construction for execution of conditional statements using least significant bitHow are two different registers being used as “control”?How exactly is the stated composite state of the two registers being produced using the $R_zz$ controlled rotations?Efficiently performing controlled rotations in HHLWould this quantum algorithm implementation work?How to prepare a superposed states of odd integers from $1$ to $sqrtN$?Why is this implementation of the order finding algorithm not working?Circuit construction for Hamiltonian simulationHow can I invert the least significant bit of a certain term of a superposed state?Implementing an oracleImplementing a controlled sum operation

            Magento 2 “No Payment Methods” in Admin New OrderHow to integrate Paypal Express Checkout with the Magento APIMagento 1.5 - Sales > Order > edit order and shipping methods disappearAuto Invoice Check/Money Order Payment methodAdd more simple payment methods?Shipping methods not showingWhat should I do to change payment methods if changing the configuration has no effects?1.9 - No Payment Methods showing upMy Payment Methods not Showing for downloadable/virtual product when checkout?Magento2 API to access internal payment methodHow to call an existing payment methods in the registration form?