Question

I need to run a MonteCarlo simulations in parallel on different machines. The code is in c++, but the program is set up and launched with a python script that set a lot of things, in particular the random seed. The function setseed thake a 4 bytes unsigned integer

Using a simple

import time
setseed(int(time.time()))

is not very good because I submit the jobs to a queue on a cluster, they remain pending for some minutes then they starts, but the start time is impredicible, it can be that two jobs start at the same time (seconds), so I switch to:

setseet(int(time.time()*100))

but I'm not happy. What is the best solution? Maybe I can combine information from: time, machine id, process id. Or maybe the best solution is to read from /dev/random (linux machines)?

How to read 4 bytes from /dev/random?

f = open("/dev/random","rb")
f.read(4)

give me a string, I want an integer!

Was it helpful?

Solution

Reading from /dev/random is a good idea. Just convert the 4 byte string into an Integer:

f = open("/dev/random","rb")
rnd_str = f.read(4)

Either using struct:

import struct
rand_int = struct.unpack('I', rnd_string)[0]

Update Uppercase I is needed.

Or multiply and add:

rand_int = 0
for c in rnd_str:
    rand_int <<= 8
    rand_int += ord(c)

OTHER TIPS

You could simply copy over the four bytes into an integer, that should be the least of your worries.

But parallel pseudo-random number generation is a rather complex topic and very often not done well. Usually you generate seeds on one machine and distribute them to the others.

Take a look at SPRNG, which handles exactly your problem.

If this is Linux or a similar OS, you want /dev/urandom -- it always produces data immediately.

/dev/random may stall waiting for the system to gather randomness. It does produce cryptographic-grade random numbers, but that is overkill for your problem.

You can use a random number as the seed, which has the advantage of being operating-system agnostic (no /dev/random needed), with no conversion from string to int:

Why not simply use

random.randrange(-2**31, 2**31)

as the seed of each process? Slightly different starting times give wildly different seeds, this way…

You could also alternatively use the random.jumpahead method, if you know roughly how many random numbers each process is going to use (the documentation of random.WichmannHill.jumpahead is useful).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top