Question

In Python, I know that the value __hash__ returns for a given object is supposed to be the same for the lifetime of that object. But, out of curiosity, what happens if it isn't? What sort of havoc would this cause?

class BadIdea(object):
  def __hash__(self):
    return random.randint(0, 10000)

I know __contains__ and __getitem__ would behave strangely, and dicts and sets would act odd because of that. You also might end up with "orphaned" values in the dict/set.

What else could happen? Could it crash the interpreter, or corrupt internal structures?

Was it helpful?

Solution

Your main problem would indeed be with dicts and sets. If you insert an object into a dict/set, and that object's hash changes, then when you try to retrieve that object you will end up looking in a different spot in the dict/set's underlying array and hence won't find the object. This is precisely why dict keys should always be immutable.

Here's a small example: let's say we put o into a dict, and o's initial hash is 3. We would do something like this (a slight simplification but gets the point across):

Hash table:

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
|   |   |   | o |   |   |   |   |
+---+---+---+---+---+---+---+---+
              ^
              we put o here, since it hashed to 3

Now let's say the hash of o changes to 6. If we want to retrieve o from the dict, we'll look at spot 6, but there's nothing there! This will cause a false negative when querying the data structure. In reality, each element of the array above could have a "value" associated with it in the case of a dict, and there could be multiple elements in a single spot (e.g. a hash collision). Also, we'd generally take the hash value modulo the size of the array when deciding where to put the element. Irrespective of all these details, though, the example above still accurately conveys what could go wrong when the hash code of an object changes.

Could it crash the interpreter, or corrupt internal structures?

No, this won't happen. When we say an object's hash changing is "dangerous", we mean dangerous in the sense that it essentially defeats the purpose of hashing and makes the code difficult if not impossible to reason about. We don't mean dangerous in the sense that it could cause crashes.

OTHER TIPS

There's a great post on Github about it: What happens when you mess with hashing. First, you need to know that Python expects (quoted from the article):

  • The hash of an object does not change across the object's lifetime (in other words, a hashable object should be immutable).

  • a == b implies hash(a) == hash(b) (note that the reverse might not hold in the case of a hash collision).

Here's the code example which show the problem of a variant hash, with a slightly different class example, but the idea remains the same:

>>> class Bad(object): 
...     def __init__(self, arg): 
...         self.arg = arg 
...     def __hash__(self): 
...         return hash(self.arg) 
... 
>>> Bad(1) 
<__main__.Bad object at ...> 
>>> hash(Bad(1)) 
1 
>>> a = Bad(1) 
>>> b = {a:1} 
>>> a.arg = 2 
>>> hash(a) 
2 
>>> b[a] 
Traceback (most recent call last):
...
KeyError: <__main__.Bad object at ...>

Here, we implicitly changed the hash of a by mutating the argument of a that is used to compute the hash. As a result, the object is no longer found in a dictionary, which uses the hash to find the object.

Note that Python doesn't prevent me from doing this. I could make it if I want, by making __setattr__ raise AttributeError, but even then I could forcibly change it by modifying the object's __dict__. This is what is meant when we say that Python is a "consenting adults" language.

It won't make Python crash but unexpected behavior will happen with dict/set and everything based on object hash.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top