Question

This is kind of an academic point, but I feel I don't fully understand hash codes if I don't understand why this is recommended by books such as Effective Java and many SO questions.

Suppose:

public sealed class Point
{
    private readonly int x;
    private readonly int y;

    //constructor ommited

    //equals ommited
    
    public override int GetHashcode()
    {
       int hash = 17; //why should the initial value be non-zero?
       unchecked
       {
         hash = hash * 31 + x; //do not tell me why I should use primes - that is not the question
         hash = hash * 31 + y;
         return hash;
       }
    }
}

Now, supposedly, the reason for the initial value is that it reduces collisions where one of the components is zero.

I'm struggling to find any example where this helps.

Here is one example of a collision, but having an initial value makes no odds.

x   y   Hash Without initial value     Hash With initial value  
0   31  31                             16368                
1   0   31                             16368                

Ideally, I'm looking for a concrete example where the initial value prevents a collision.

My theory on why an initial value can never make a difference

//Given a prime p, initial value i, fields a,b,c, calculate hash h
h = i;
h = h*p + a;
h = h*p + b;
h = h*p + c;

Therefore:

h = ((i*p + a)*p + b)*p + c
  = (ipp + ap + b   )*p + c
  = ippp + app + bp + c

Therefore the inital value i will effect all hash codes in the same way by producing a constant value, in this case i*p3.

Was it helpful?

Solution 4

The choice of initial value can never make a difference to the hash.

Example:

//Given a prime p, initial value i, fields a,b,c, calculate hash h
h = i;
h = h*p + a;
h = h*p + b;
h = h*p + c;
h = h % 2^32;

Therefore:

h = (((ip  + a) * p + b) * p + c) % 2^32
  = (( ip² + ap     + b) * p + c) % 2^32
  = (  ip³ + ap²    + bp     + c) % 2^32
  = ip³ % 2^32 + (ap² + bp + c) % 2^32

Therefore the initial value i will effect all hash codes in the same way by adding a constant value to the hash, in this case i*p³ % 2^32.

OTHER TIPS

The initial value must be a prime number. Why? Because say you are hashing to get an index for an array of length = 20: [object.getHash()%20] is the index of the array where you will want to store your object. If you had used an even number: half of the addresses of your data structure would never be used...this is why you need to use an initial value: to minimize collisions...and maximize data structure usage

Using prime numbers have shown via experiment and testing that have good properties for hash functions.
Also hard-coded numbers you see in existing libraries e.g. 31 in Java have been found during testing that they are good options. As far as I know there is no some proof behind the choices of these "magic" numbers.They were selected only after field testing

Update:
If you use zero as initial value then your hash will be affected by member variables also zero.
E.g. hash = hash * 31 + x; will be 0 if x is 0 and your initial value is also 0.
Then you end up with y which could be 0 as well or a number that could happen to be very common in your application domain and end up with collisions

Initial value can be useful to distinguish between objects of different classes.

The hash function that you show above is just not very good, resulting very easily in collisions for objects with different property values. The idea of a hash function is that it results in a unique, or almost unique, value depending on the public properties.

So to get values that are as unique as possible:

  • use a good hash function that results in a nice distribution
  • use a proper initial value that distinguishes even more so that the chance of a Point and a Line returning the same hash becomes smaller.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top