Python numpy - Reproducibility of random numbers

https://stackoverflow.com/questions/16220585

13-04-2022
|

Вопрос

We have a very simple program (single-threaded) where we we do a bunch of random sample generation. For this we are using several calls of the numpy random functions (like normal or random_sample). Sometimes the result of one random call determines the number of times another random function is called.

Now I want to set a seed in the beginning s.th. multiple runs of my program should yield the same result. For this I'm using an instance of the numpy class RandomState. While this is the case in the beginning, at some time the results become different and this is why I'm wondering.

When I am doing everything correctly, having no concurrency and thereby a linear call of the functions AND no other random number generator involded, why does it not work?

Решение

Okay, David was right. The PRNGs in numpy work correctly. Throughout every minimal example I created, they worked as they are supposed to.

My problem was a different one, but finally I solved it. Do never loop over a dictionary within a deterministic algorithm. It seems that Python orders the items arbitrarily when calling the .item() function for getting in iterator.

So I am not that disappointed that this was this kind of error, because it is a useful reminder of what to think about when trying to do reproducible simulations.

Другие советы

If reproducibility is very important to you, I'm not sure I'd fully trust any PRNG to always produce the same output given the same seed. You might consider capturing the random numbers in one phase, saving them for reuse; then in a second phase, replay the random numbers you've captured. That's the only way to eliminate the possibility of non-reproducibility -- and it solves your current problem too.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow