Issue when shuffling a vector with boost::random
-
02-06-2021 - |
質問
I am using this code to generate a random permutation of a vector using a variation of Fisher-Yates randomization algorithm (I am going from the first element to the last, not the other way around). I am using a boost::random::mt11213b
RNG globally in a program that is seeded with generator.seed(time(NULL));
when program starts, hence a wrapper singleton RandomNumber
here.
boost::random::uniform_int_distribution<unsigned long>
distribution(0, vec.size()-1);
for (unsigned long i=0;i<vec.size();i++)
std::swap(vec[i], vec[distribution(RandomNumber::getInstance().generator)]);
Some experiments have led me to believe that there may be an issue in this algorithm, in a nutshell. This is what I did
- Created a vector of integer with length 100
- Filled first 75 elements with
0
and the last 25 with1
- Shuffled an array.
- Took the first 5 elements from the list and summed them.
I repeated this procedure a few thousand times (with a loop, not by hand :)) each time starting with a fresh vector. Then I computed the arithmetic mean of the sums and it came about 0.98
instead of the expected 1.25
.
The funny thing is that if I start with a vector that has been shuffled once with the same algorithm instead of an ordered one, the result increases to 1.22
and if I don't discard the vector on each iteration but rather just shuffle it again, the result is around 1.25
that is the expected value.
I am unsure as to what could be wrong. The algorithm looks sound, the only thing that I can think of that could have gone wrong is the seeding phase and the
boost::random::uniform_int_distribution<unsigned long>
distribution(0, vec.size()-1);
line that is called each time before a vector is shuffled (perhaps it should only be called once a program but that doesn't make mush sense)
Any help will be greatly appreciated!
解決
If I had to take a guess as to the cause, you're not changing the distribution size each time around the loop. The Art of Computer Programming algorithm is here.
Once you shuffle up to n elements, you don't want to touch the first n again, because repeated applications of pseudo random numbers don't making things more random, they make them less random.
他のヒント
No your algorithm is wrong. Consider the simple case with a vector of 4 numbers, your algorithm returns the following biased results
result probability * 256
{1,2,3,4} 10
{1,2,4,3} 10
{1,3,2,4} 10
{1,3,4,2} 14
{1,4,2,3} 11
{1,4,3,2} 9
{2,1,3,4} 10
{2,1,4,3} 15
{2,3,1,4} 14
{2,3,4,1} 14
{2,4,1,3} 11
{2,4,3,1} 11
{3,1,2,4} 11
{3,1,4,2} 11
{3,2,1,4} 9
{3,2,4,1} 11
{3,4,1,2} 11
{3,4,2,1} 10
{4,1,2,3} 8
{4,1,3,2} 9
{4,2,1,3} 9
{4,2,3,1} 8
{4,3,1,2} 10
{4,3,2,1} 10
While a standard Fisher-Yates algorithm will give a uniform probability for all results.
If you want to shuffle a vector, use std::random_shuffle
directly (see Using boost::random as the RNG for std::random_shuffle for some example codes).