Question

This might be a weird question, but here it goes:

I have a numerical simulation. It's not a particularly long program, but somewhat lengthy to explain what it's doing. I am running the simulation a thousand times and computing the average result and the variance, and the variance is quite small, on the order of 10^(-30).

However, I have noticed that when I run the program in python 3.3 things get weird. See in Python 2.7 and Python 3.2 I always get the same answer, every time. Same averages, same tiny variances.

But when I run it in Python 3.3, I get a different answer every time. That is, a different average, and different (but still tiny) variances. This is extremely odd, because the laws of probability say that this can't happen if the variance is actually that small. So I'm wondering, what the hell is going on in the 3.3 interpreter that changed since 3.2, that's causing my simulations to go crazy?

Here are some things I've thought of:

  • I might have a weird 32-bit/64-bit discrepancy in my versions of Python, but no I checked and they're all running 64-bit.
  • I might be having some errors in float/int conversions, but this would be taken care of in Python 3.2 since they made division return floats when appropriate, so the 3.2 and 3.3 results should be the same.
  • My simulations are represented as generators, so maybe something changed in 3.3 with generators, but I can't tell what that is.
  • There is some change in numerical floating point representations that I have no idea about.
  • There is some underlying change in one of those functions whose result is "undetermined" that affects the initial conditions of my algorithm. For example, somewhere in my code I order my data columns which were originally a dictionary using "list(table.keys())" and there may have been a change in how list decides to order a dictionaries keys from 3.2 to 3.3. But if that were the case, then the code should still do the same thing every time but it doesn't (it seems quite odd to intentionally make the ordering of a list random!).

Does anyone have pointers to what changed from 3.2 to 3.3 that might be causing my problems?

Was it helpful?

Solution

Your last bullet point is most likely the cause. At python3.3, hash randomization was enabled by default to address a security concern. Basically, the idea is that you now never know exactly how your strings will hash (which determines their order in the dictionary).

Here's a demo:

d = {"a": 1, "b": 2, "c": 3}
print(d)

On my machine, with python3.4, this results in 3 differently ordered results:

$ python3.4 test.py
{'a': 1, 'c': 3, 'b': 2}
$ python3.4 test.py
{'c': 3, 'b': 2, 'a': 1}
$ python3.4 test.py
{'b': 2, 'c': 3, 'a': 1}

Before hash randomization, if you knew how a string would hash, a malicious attacker with enough knowledge of your application could feed it data to cause dictionary lookup to run in O(n) time instead of the usual O(1) for dictionary lookups. That could cause serious performance degradation for some applications.

You can disable the hash randomization as documented here. At some point, they also introduced a -R flag to python which enabled hash randomization on an "opt in" basis. This option is at least available for python3.2, so you could use that to test our hypothesis.

OTHER TIPS

Set the environment variable

PYTHONHASHSEED

to 0 and see whether that helps (that's to save you the trouble of digging thru the link mgilson gave you ;-) ).

But do note that nothing has ever been defined about the order in which dictionaries are traversed. To get truly reproducible results, you need to impose your own order. For example, would there be any real problem in using

sorted(table)

instead? Then you could stop worrying about 32-bit vs 64-bit, hash randomization, future bugfixes changing the order, etc etc.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top