Question

I've been using $RANDOM to generate a random number between 1-15, in order to generate a little jitter between two systems. For example:

sleep $(( RANDOM %= 15 ))

If I run echo $(( RANDOM %= 15 )) every few minutes, it seems the random numbers are fairly random. But if I start running a script with this call every minute via cron, or even just echo the random number every few seconds, the randomness is gone—on my Mac, I end up with not-so-random values like 11 and 6, alternating, or 8, 4, and 2, in sequence. Not very random.

On one of my linux servers (CentOS 6.5 x64), I added the following bash script, which, after the first couple loops, just output 13 over and over again:

#!/bin/bash
for ((n = 0; n < 100; n++))
do
  echo $(( RANDOM %= 15 ))
done

My questions:

  1. Why is this happening? I know $RANDOM is unsuitable for encryption, but why is it so bad at generating random numbers in general?
  2. Is there any other easy way to get a more random number (even if needed in rapid succession) via bash script?
Was it helpful?

Solution

Assigning to RANDOM Sets the Seed Value

RANDOM is a special variable that provides a pseudo-random number. By using a modulo operator, you are vastly restricting the possible values, and by assigning to RANDOM you are changing the seed value to some member of your restricted set, which eventually seems to settle on 13.

The following gives me a reasonable distribution:

for i in {1..100}; do echo $(( RANDOM % 15 )); done

By using %= instead of just %, you are setting the seed value. Don't do that.

OTHER TIPS

You can generate a high quality random integer between 1 and 15 using to following command :

echo "$(od -An -N4 -tu4 /dev/urandom) % 15 + 1" | bc    

or even better

echo "$(od -An -N4 -tu4 /dev/random) % 15 + 1" | bc    

Moreover you should not assign a new seed using %= as it destroys the entropy gathered.

Lastly 100 generations is really not enough to assess the quality of a random number generator. You should at least generate one million values.

Few handy techniques useful to test the quality of a random number generator :

Optimum compression -- 0 %

Data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an unencoded representation would use, through use of specific encoding schemes. If the same data structure is repeated multiple times a short binary representation can stand in for long data structures and thus reduce the size of the compresses file. If our random data is truly random then we should NOT see any compression at all.

Chi square distribution -- between 10% and 90%

The chi-square test is the most commonly used test for the randomness of data, and is extremely sensitive to errors in pseudorandom sequence generators. The chi-square distribution is calculated for the stream of bytes in the file and expressed as an absolute number and a percentage which indicates how frequently a truly random sequence would exceed the value calculated. We interpret the percentage as the degree to which the sequence tested is suspected of being non-random. If the percentage is greater than 90% or less than 10%, the sequence is almost certainly not random.

Arithmetic mean -- 127.5. 15/2 in your case

This is simply the result of summing the all the bytes in the file and dividing by the file length. If the data is close to random, this should be about 127.5 . If the mean departs from this value then the values are consistently high or low.

Monte Carlo value for Pi -- 3.14159265

Each successive sequence of six bytes is used as 24 bit X and Y co-ordinates within a square. If the distance of the randomly-generated point is less than the radius of a circle inscribed within the square, the six-byte sequence is considered a hit. The percentage of hits can be used to calculate the value of Pi. For very large streams the value will approach the correct value of Pi if the sequence is close to random.

Serial correlation coefficient -- 0.0

This quantity measures the extent to which each byte in the file depends upon the previous byte. For random sequences, this value (this can be positive or negative) will, of course, be close to zero.

source : https://calomel.org/

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top