Why is $RANDOM not very random?

Question 1

Assigning to RANDOM Sets the Seed Value

RANDOM is a special variable that provides a pseudo-random number. By using a modulo operator, you are vastly restricting the possible values, and by assigning to RANDOM you are changing the seed value to some member of your restricted set, which eventually seems to settle on 13.

The following gives me a reasonable distribution:

for i in {1..100}; do echo $(( RANDOM % 15 )); done

By using %= instead of just %, you are setting the seed value. Don't do that.

Question 2

You can generate a high quality random integer between 1 and 15 using to following command :

echo "$(od -An -N4 -tu4 /dev/urandom) % 15 + 1" | bc

or even better

echo "$(od -An -N4 -tu4 /dev/random) % 15 + 1" | bc

Moreover you should not assign a new seed using %= as it destroys the entropy gathered.

Lastly 100 generations is really not enough to assess the quality of a random number generator. You should at least generate one million values.

Few handy techniques useful to test the quality of a random number generator :

Optimum compression -- 0 %

Data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an unencoded representation would use, through use of specific encoding schemes. If the same data structure is repeated multiple times a short binary representation can stand in for long data structures and thus reduce the size of the compresses file. If our random data is truly random then we should NOT see any compression at all.

Chi square distribution -- between 10% and 90%

The chi-square test is the most commonly used test for the randomness of data, and is extremely sensitive to errors in pseudorandom sequence generators. The chi-square distribution is calculated for the stream of bytes in the file and expressed as an absolute number and a percentage which indicates how frequently a truly random sequence would exceed the value calculated. We interpret the percentage as the degree to which the sequence tested is suspected of being non-random. If the percentage is greater than 90% or less than 10%, the sequence is almost certainly not random.

Arithmetic mean -- 127.5. 15/2 in your case

This is simply the result of summing the all the bytes in the file and dividing by the file length. If the data is close to random, this should be about 127.5 . If the mean departs from this value then the values are consistently high or low.

Monte Carlo value for Pi -- 3.14159265

Each successive sequence of six bytes is used as 24 bit X and Y co-ordinates within a square. If the distance of the randomly-generated point is less than the radius of a circle inscribed within the square, the six-byte sequence is considered a hit. The percentage of hits can be used to calculate the value of Pi. For very large streams the value will approach the correct value of Pi if the sequence is close to random.

Serial correlation coefficient -- 0.0

This quantity measures the extent to which each byte in the file depends upon the previous byte. For random sequences, this value (this can be positive or negative) will, of course, be close to zero.

source : https://calomel.org/