Вопрос

CURAND comes with an array of random number generators, but I have failed to find any comparison of the performance (and randomness) properties of each of them; mostly, I'd be interested in which generator to use for which application to gain maximum performance. I'd be happy if someone could quickly outline the differences between them or link me a resource that does so.

Thanks in advance.

Это было полезно?

Решение

This picture shows the performance for different RNGs.

Speed of cuRAND RNGs

For randomness, it should be only related to the RNG type/algorithm. So you can refer to Intel MKL doc. There's detail info and research papers in it. The type names in both CURAND and MKL are very similar.

http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/GUID-3D7D2650-A414-4C95-AF33-BE291BAB2AC3.htm

Другие советы

First difference is efficiency. XORWOW is default generator, but isn't always most efficient. For instance, Philox is faster for generating normally distributed floats.

cuRAND

Another difference is, that in practice You can generate more than one float with each call with some generators. For example, with Philox You can generate even 4 floats normally or uniformly distributed with each call, while with XORWOW you can generate max two floats normally or uniformly distributed.

__device__ float4 
curand_normal4 (curandStatePhilox4_32_10_t *state)

Next difference is period of pseudorandom sequence (Total state space of the PRNG before you start to see repeats). Xorwow has period about 2^190 (with the state set up after 2^67 for the same seed)*. For Philox, subsequence and offset together define offset in a sequence with period 2^128. Note that if You run millions of threads with the same seed You could run out of state space per thread and start seeing repeats. ((2^190) / (10^6)) / (2^67) = 1.0633824 × 10^31

One more difference is size of the states. For Xorwow sizeof(curandState_t) is 48 bytes and sizeof(curandStatePhilox4_32_10_t) is 64 bytes. When You run millions of threads (each thread has its own curand state) you can run out of device memory. 1024^2*64 ~= 64 megabytes per million threads.

XORWOW, Philox, MRR32k3a, MTGP32 are Pseudo-random generators while both Sobols are Quasi-ranom generators.

*When calling curand_init with a seed, it scrambles that seed and then skips ahead 2^67 numbers (this is kind of expensive but has some nice properties)

sources: https://developer.nvidia.com/cuRAND http://cs.brown.edu/courses/cs195v/lecture/week11.pdf

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top