Question

I'm looking to encode a set of hexadecimal values stored in strings using a hash function. Since the hex ''alphabet'' is composed of only 16 letters, what would be the best hash algorithm with the least amount of collisions?

Was it helpful?

Solution

Bit of a too general question, as you left out any constraints on the hash function, and/or what you're going to do with the hashes. (On a side note, hashing isn't an encoding)

That being said, having an alphabet of 16 letters, you need 4 bit to store each (i.e. you could build a XOR sum over each two letters crammed into a single byte, to get an 8-bit hash.
Of course, that can be extended to any other word length, too (but you left out too much information)

for instance like this:

uint8_t
hexhash(const char *str)
{
        uint8_t res = 0;
        while (*str && *(str+1)) {
                res ^= (fromchar(*str) << 4) | fromchar(*(str+1));
                str += 2; //EDIT: forgot this in my original reply
        }
        return res;
}

(where 'fromchar' is a function to return 0 for '0', 1 for '1', ..., 15 for 'f')

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top