Question

I'm trying to run several instances of a piece of code (2000 instances or so) concurrently in a computing cluster. The way it works is that I submit the jobs and the cluster will run them as nodes open up every so often, with several jobs per node. This seems to produce the same values for a good number of the instances in their random number generation, which uses a time-seed.

Is there a simple alternative I can use instead? Reproducibility and security are not important, quick generation of unique seeds is. What would be the simplest approach to this, and if possible a cross platform approach would be good.

Was it helpful?

Solution

The rdtsc instruction is a pretty reliable (and random) seed.

In Windows it's accessible via the __rdtsc() intrinsic.

In GNU C, it's accessible via:

unsigned long long rdtsc(){
    unsigned int lo,hi;
    __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
    return ((unsigned long long)hi << 32) | lo;
}

The instruction measures the total pseudo-cycles since the processor was powered on. Given the high frequency of today's machines, it's extremely unlikely that two processors will return the same value even if they booted at the same time and are clocked at the same speed.

OTHER TIPS

I assume you have some process launching the other processes. Have it pass in the seed to use. Then you can have that master process just pass in a random number for each process to use as its seed. That way there's really only one arbitrary seed chosen... you can use time for that.

If you don't have a master process launching the others, then if each process at least has a unique index, then what you can do is have one process generate a series of random numbers in memory (if shared memory) or in a file (if shared disk) and then have each process pull the index'th random number out to use as their seed.

Nothing will give you a more even distribution of seeds than a series of random numbers from a single seed.

A combination of the PID and the time should be enough to get a unique seed. It's not 100% cross-platform, but getpid(3) on *nix platforms and GetProcessId on Windows will get you 99.9% of the way there. Something like this should work:

srand((time(NULL) & 0xFFFF) | (getpid() << 16));

You could also read data from /dev/urandom on *nix systems, but there's no equivalent to that on Windows.

unsigned seed;

read(open("/dev/urandom", O_RDONLY), &seed, sizeof seed);
srand(seed); // IRL, check for errors, close the fd, etc...

I would also recommend a better random number generator.

If C++11 can be used then consider std::random_device. I would suggest you to watch link for a comprehensive guide.

Extracting the essential message from the video link : You should never use srand & rand, but instead use std::random_device and std::mt19937 -- for most cases, the following would be what you want:

#include <iostream>
#include <random>
int main() {
    std::random_device rd;
    std::mt19937 mt(rd());
    std::uniform_int_distribution<int> dist(0,99);
    for (int i = 0; i < 16; i++) {
        std::cout << dist(mt) << " ";
    }
    std::cout << std::endl;
}

Instead of straight time as measured in seconds from the C std lib time() function, could you instead use the processor's counter? Most processors have a free running tick count, for example in x86/x64 there's the Time Stamp Counter:

The Time Stamp Counter is a 64-bit register present on all x86 processors since the Pentium. It counts the number of ticks since reset.

(That page also has many ways to access this counter on different platforms -- gcc/ms visual c/etc)

Keep in mind that the timestamp counter is not without flaws, it may not be synced across processors (you probably don't care for your application). And power saving features may clock up or down the processor (again you probably don't care).

Just an idea... generate a GUID (which is 16 bytes) and sum its 4-byte or 8-byte chunks (depending on the expected width of the seed), allowing integer wrap-around. Use the result as a seed.

GUIDs typically encapsulate characteristics of the computer that generated them (such as MAC address), which should make it rather improbable that two different machines will end-up generating the same random sequence.

This is obviously not portable, but finding appropriate APIs/libraries for your system should not be too hard (e.g. UuidCreate on Win32, uuid_generateon Linux).

Windows

Provides CryptGenRandom() and RtlGenRandom(). They will give you an array of random bytes, which you can use as seeds.

You can find the docs on the msdn pages.

Linux / Unixes

You can use Openssl's RAND_bytes() to get a random number of bytes on linux. It will use /dev/random by default.

Putting it together:

#ifdef _WIN32
  #include <NTSecAPI.h>
#else
  #include <openssl/rand.h> 
#endif

uint32_t get_seed(void)
{
  uint32_t seed = 0;

#ifdef _WIN32
  RtlGenRandom(&seed, sizeof(uint32_t) );
#else
  RAND_bytes(&seed, sizeof(uint32_t) ); 
#endif

  return seed;
}

Note that openssl provides a Cryptographically secure PRNG by default, so you could use it directly. More info here.

Assuming you're on a reasonably POSIX-ish system, you should have clock_gettime. This will give the current time in nanoseconds, which means for all practical purposes it's impossible to ever get the same value twice. (In theory bad implementations could have much lower resolution, e.g. just multiplying milliseconds by 1 million, but even half-decent systems like Linux give real nanosecond results.)

If uniqueness is important, you need to arrange for each node to know what IDs have been claimed by others. You could do this with a protocol asking "anyone claimed ID x?" or arranging in advance for each node to have a selection of IDs which have not been allocated to others.

(GUIDs use the machine's MAC, so would fall into the "arrange in advance" category.)

Without some form of agreement, you'll risk two nodes climing the same ID.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top