curandState in constant memory (cuda random)

Question

I don't think that makes sense. __constant__ memory is constant, and can't be modified directly by threads running on the GPU. curandState, however, needs to be modified each time a random number is generated by a thread (otherwise, you will get the same number generated, over and over).

There's nothing wrong with giving every particle it's own state; that would be the typical usage for this scenario.

Since the retrieval and usage of curandState and the generation of random numbers is being done by an NVIDIA library on the GPU, you can assume that the NVIDIA engineers have done a reasonably good job of optimizing memory accesses so as to be efficient and coalesced, during the operation of retrieving and updating state, and generating random numbers.

__constant__ memory also has the characteristic that it services only one 32 bit value per SM per clock, so it's useful when all threads are accessing the same data element (i.e. broadcast) but not generally useful when each thread is accessing a different element (e.g. separate curandState) even if that access would normally coalesce, e.g. if it were in ordinary global memory.