Random numbers for multiple threads

Question 1

I'd use one instance to seed the others. I'm pretty sure you can do this safely fairly easily.

Even small changes in the state space cause fairly large changes downstream - if you can ensure they don't have exactly the same starting space (and no identical state prefix), I wouldn't worry about producing identical numbers. For instance, using just the values 1,2,3 to seed three threads would work fine - you don't even need to seed the entire space. Another advantage: by using clearly predictable seeds you can easily discredit the idea that you're cherry-picking any runs (assuming you're trying to demonstrate something).
It's trivial to seed in a way that means the resultant "children" are highly un-correlated. Just iterate in a breadth-first fashion; i.e. if you want to seed N x 623 int values, don't seed 623 values sequentially, but pick the first N and distribute, then the next N etc. Even if there's some correlation between the seeder and the children, the correlation between the various children should be virtually non-existant - and that's all you care about.
I'd prefer an algorithm that allows deterministic execution whenever possible, so depending on urandom is not attractive. This makes debugging easier.
Finally, and obviously - test. These PRNG are fairly robust, but by all means eyeball the results and do a few correlation tests inspired by what you're simulating. Most problems should be obvious - either you've seeded badly and there are obvious repeating subsequences, you you've seeded well, and then the quality is dictated by the PRNG limitations.
For final executions, after you're done testing, you can seed the first of 623 state values using urandom for peace of mind and/or the thread ID.

Question 2

I would go with #1, seed every prng from urandom. This ensures that the states are totally independent (as far as the seed data is independent). Typically there will be plenty of entropy available unless you have many threads. Also, depending on the algorithm used for /dev/urandom you almost certainly don't need to worry about it.

So you might use something like the following to create each prng:

#include <random>

std::mt19937 get_prng() {
    std::random_device r;
    std::seed_seq seed{r(), r(), r(), r(), r(), r(), r(), r()};
    return std::mt19937(seed);
}

You should verify that your implementation of std::random_device pulls from /dev/urandom under your configuration. And if it does use /dev/urandom by default then usually you can say std::random_device("/dev/random") if you want to use /dev/random instead.

Question 3

You could use a PRNG with a different algebraic structure to seed the different PRNGs. E.g. some sequence of MD5 hashes.

However I would opt for #5. If it works then it is fine. If it does not you can still optimize it.

The point is creating a good PRNG is much harder than one might expect. A good PRNG for threaded applications is most probably something that is still subject to research.

If the number of CPUs is low enough you could get away with leap frogging. E.g. if you have 4 cores, initialize all with the same values, but advance core 1 PRNG by 1, #2 by and #3 by 3. Then advance always 4 steps when you need a new number.

Question 4

Seed thread 1 with 1, seed thread 2 with 2, etc.

If you need monte carlo this will give you reproducible results, is easy to track and implement.

Question 5

Take a look at the following paper: Dynamic Creation of Pseudorandom Number Generators and the accompanying implementation: Dynamic Creator. It tackles this exact problem.

Question 6

If you really want to be mathematically correct, use the jump functions provided by the SFMT algorithm authors. Jump functions guarantee the minimum number of sequences between two different PRNG streams.

Practically speaking, however, a /dev/urandom initialization will suffice.

Question 7

I'd say #3 is the winner. Seed each thread with something like the processID or threadID; while it's technically possible you could have overlap, it's highly unlikely. Even consecutive numbers shouldn't be related in terms of seeds once you get out of the single digits (I don't know the Twister algorithm, but the worst PRNG I've seen was fine above 7). One million PRNGs isn't that many compared to the scope of most PRNG equations.

Finally, you could check fairly easily. Check the last seed generated by each thread against all numbers in each other thread. If the seed appears in the thread, then check the previous number generated in each thread; if they also match, then you have a collision and need to re-seed your streams and try again.

Question 8

There is an implementation (and published paper) specifically concerning the use of the Mersenne Twister for parallel computation. It is by the original authors of the MT. They refer to it as "Dynamic Creator", and it can be found here:

http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dc.html

That would be a very good place to study your specific usage of MT19937, particularly the paper there.

Random numbers for multiple threads

Problem

Alternatives

Questions