Question

Hash tables are supposed be high-performance mappings, and because Python dicts are implemented with hash tables, they're also highly performant. But I've encountered a strange result when looking at hash values of negative integers.

>>> for i in range(7):
...     print hash(i-4)
...     
-4
-3
-2
-2
0
1
2

But this apparently has no effect on dicts:

>>> d = dict()
>>> d[-1] = 'foo'
>>> d[-2] = 'bar'
>>> d
{-2: 'bar', -1: 'foo'}

Why does this happen, and why aren't dicts affected when I use them?

Was it helpful?

Solution

The fact that x has different hash value than y implies that x != y. But the converse is not true! So when x has hash value equal to y they are still checked for equality explicitly.

A situation when hash(x) == hash(y) and x != y is called a collision in the context of hash functions and is something that's likely to happen from time to time. You want to avoid it as much as possible but in general it is inevitable. You can read more about hash functions and collisions here.

OTHER TIPS

If you are asking why dicts aren't affected by a repeated hash value, that's because hash values don't need to be unique for hash tables to work.

Python implements simple hashing of integers where an integer's hash value resolves to itself. Since -1 is internally used to signal failure to produce the hash value, the value -1 is silently replaced with -2, which works just as well.

Hash tables perform better when hash values are different, but they can handle equal hash values. These are called hash collisions, and methods of dealing with them are one of the big ways in which hash tables are optimized and tuned.

hash(-1) == -2 because -1 is a special value used to signal errors in the C implementation. Hash codes can't take that value; if you try to define a class that gives a hash of -1, Python will detect it and use -2 instead.

-1 is an error code in the C code, and no hash function can return it lest it signal a C code error instead. C doesn't have exceptions, so the Python developers had to reserve one return value for signalling an error.

A dictionary doesn't use just a hash; it also tests for equality. Note that a hash table is small compared to the number of possible hash values, even when hash values are not equal they can still end up mapping to the same slot. If a hash value maps to the same slot and the keys are not equal, the hash is perturbed and a new slot is found.

Because -1 != -2, Python still keeps both keys apart.

See Overriding Python's Hashing Function in Dictionary and Why is the order in dictionaries and sets arbitrary? for some more details on how Python's dictionary uses a hash.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top