Question

I have been failing to understand the design of hash functions. I was going through an example. As you can see in the function comments, why should you choose 31 as the number to be multiplied. How do you decide? Is that something random or does it mean something?

unsigned int hash(hash_table_t *hashtable, char *str)
{
    unsigned int hashval;

    /* we start our hash out at 0 */
    hashval = 0;

    /* for each character, we multiply the old hash by 31 and add the current
     * character.  Remember that shifting a number left is equivalent to 
     * multiplying it by 2 raised to the number of places shifted.  So we 
     * are in effect multiplying hashval by 32 and then subtracting hashval.  
     * Why do we do this?  Because shifting and subtraction are much more 
     * efficient operations than multiplication.
     */
    for(; *str != '\0'; str++) hashval = *str + (hashval << 5) - hashval;

    /* we then return the hash value mod the hashtable size so that it will
     * fit into the necessary range
     */
    return hashval % hashtable->size;
}
Was it helpful?

Solution

The hash in question is known as the Bernstein Hash, Torek Hash, or simply the "times 33" hash. It is pretty popular due to its simplicity, speed, and decent distribution with English string data.

Your comments note it is actually multiplying by 31, which seemed arbitrary to you and actually is a bit arbitrary. The Apache Portable Runtime has a comment in their hash algorithm source which notes that many possible constants work well (33 being the most common). They all are odd and close to a power of 2, meaning they translate well into shifts and addition or subtraction.

Some other resources to help understand hashing:

OTHER TIPS

Here is a lecture on hash functions with 65k views. On youtube: http://www.youtube.com/watch?v=KW0UvOW0XIo

This is not exactly what you are asking for, however your questions reveals that your knowledge in hashing is limited. Better to read a tutorial or check a presentation.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top