Issue when shuffling a vector with boost::random

https://stackoverflow.com/questions/10286722

02-06-2021
|

Question

I am using this code to generate a random permutation of a vector using a variation of Fisher-Yates randomization algorithm (I am going from the first element to the last, not the other way around). I am using a boost::random::mt11213b RNG globally in a program that is seeded with generator.seed(time(NULL)); when program starts, hence a wrapper singleton RandomNumber here.

boost::random::uniform_int_distribution<unsigned long> 
    distribution(0, vec.size()-1);

for (unsigned long i=0;i<vec.size();i++)
    std::swap(vec[i], vec[distribution(RandomNumber::getInstance().generator)]);

Some experiments have led me to believe that there may be an issue in this algorithm, in a nutshell. This is what I did

Created a vector of integer with length 100
Filled first 75 elements with 0 and the last 25 with 1
Shuffled an array.
Took the first 5 elements from the list and summed them.

I repeated this procedure a few thousand times (with a loop, not by hand :)) each time starting with a fresh vector. Then I computed the arithmetic mean of the sums and it came about 0.98 instead of the expected 1.25.

The funny thing is that if I start with a vector that has been shuffled once with the same algorithm instead of an ordered one, the result increases to 1.22 and if I don't discard the vector on each iteration but rather just shuffle it again, the result is around 1.25that is the expected value.

I am unsure as to what could be wrong. The algorithm looks sound, the only thing that I can think of that could have gone wrong is the seeding phase and the

boost::random::uniform_int_distribution<unsigned long> 
    distribution(0, vec.size()-1);

line that is called each time before a vector is shuffled (perhaps it should only be called once a program but that doesn't make mush sense)

Any help will be greatly appreciated!

Solution

If I had to take a guess as to the cause, you're not changing the distribution size each time around the loop. The Art of Computer Programming algorithm is here.

Once you shuffle up to n elements, you don't want to touch the first n again, because repeated applications of pseudo random numbers don't making things more random, they make them less random.

OTHER TIPS

No your algorithm is wrong. Consider the simple case with a vector of 4 numbers, your algorithm returns the following biased results

result      probability * 256
{1,2,3,4}   10
{1,2,4,3}   10
{1,3,2,4}   10
{1,3,4,2}   14
{1,4,2,3}   11
{1,4,3,2}   9
{2,1,3,4}   10
{2,1,4,3}   15
{2,3,1,4}   14
{2,3,4,1}   14
{2,4,1,3}   11
{2,4,3,1}   11
{3,1,2,4}   11
{3,1,4,2}   11
{3,2,1,4}   9
{3,2,4,1}   11
{3,4,1,2}   11
{3,4,2,1}   10
{4,1,2,3}   8
{4,1,3,2}   9
{4,2,1,3}   9
{4,2,3,1}   8
{4,3,1,2}   10
{4,3,2,1}   10

While a standard Fisher-Yates algorithm will give a uniform probability for all results.

If you want to shuffle a vector, use std::random_shuffle directly (see Using boost::random as the RNG for std::random_shuffle for some example codes).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow