Question

I need to create a 16 bit hash from a 32 bit number, and I'm trying to determine if a simple modulus 2^16 is appropriate.

The hash will be used in a 2^16 entry hash table for fast lookup of the 32 bit number.

My understanding is that if the data space has a fairly even distribution, that a simple mod 2^16 is fine - it shouldn't result in too many collisions.

In this case, my 32 bit number is the result of a modified adler32 checksum, using 2^16 as M.

So, in a general sense, is my understanding correct, that it's fine to use a simple mod n (where n is hashtable size) as a hashing function if I have an even data distribution?

And specifically, will adler32 give a random enough distribution for this?

Was it helpful?

Solution

Yes, if your 32-bit numbers are uniformly distributed over all possible values, then a modulo n of those will also be uniformly distributed over the n possible values.

Whether the results of your modified checksum algorithm are uniformly distributed is an entirely different question. That will depend on whether the data you are applying the algorithm to has enough data to roll over the sums several times. If you are applying the algorithm to short strings that don't roll over the sums, then the result will not be uniformly distributed.

If you want a hash function, then you should use a hash function. Neither Adler-32 nor any CRC is a good hash function. There are many very fast and effective hash functions available in the public domain. You can look at CityHash.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top