Question

I am making a hash table and I need to make a hash function that is not only dependent on the size of the string key, as the periodic table elements have only 1 to 3 characters. How can I make a hash function that gives me an index perhaps based of the bytes of each char of the string?

Was it helpful?

Solution

Pretty much every hash function on strings hashes the characters; it's extremely rare to see strings hashed purely by their lengths.

One simple family of hash functions is shift-add-XOR, which as the name implies uses a combination of bitshifts, additions, and XORs to obtain a hash function from a string. It's easy to implement and gives a reasonably good distribution of keys.

That said, if you are guaranteed that you're just using periodic table symbols, you might want to try to find a perfect hash function for the elements. This is a hash function custom-built for the set of data that you're using and never has any collisions. Tools like gperf can be used to create such functions.

Hope this helps!

OTHER TIPS

The simplest solution is to use an existing one, like FNV. Be careful, however—some very widespread hash functions perform poorly when given a lot of very short strings (the one java.lang.String uses, for example). For a generic hash function, I generally use something like:

size_t
hash( std::string const& value )
{
    size_t result = 2166136261;
    for ( std::string::const_iterator current = value.begin();
            current != value.end();
            ++ current ) {
        result = 127 * result + static_cast< unsigned char >( *current );
    }
    return result;
}

On machines with slow multiplication, this is slightly faster than FNV, and I've yet to find a case where the distribution was significantly poorer.

You mention that the maximum string length is three, however. In this case, you can probably use an even simpler technique:

size_t
hash( std::string const& value )
{
    union {
        size_t results;
        char input[ sizeof( size_t ) ];
    } working = 0;
    assert( value.size() <= sizeof( size_t ) );
    value.copy( working.input, sizeof( size_t ) );
    return working.results;
}

Both of these guarantee unique hash values for all strings of printable ASCII characters with a length of less than sizeof( size_t ).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top