Question

I have a set (static, known in compile time) of about 2 million values, 20 bytes each. What I need is a fast O(1) way to check if a given value is in this set. It seems that perfect hash function with a bit array is ideal for this, but I can't find a simple way to create it. There are some utilities such as gperf, but they are too complicated. Also, in my case it's not necessary to have a close to 100% load factor, even 10% is enough, but with guarantee of no collisions. Another requirement for this function is simplicity, without many conditions: it will run on GPU. What would you advice for this case?

Was it helpful?

Solution 2

After reading more information about perfect hashing, I've decided not to try implementing it, and instead used cuckoo hashtable. It's much simpler and requires at most 2 accesses to the table (or any other number >1, the most used are 2..5) instead of 1 for perfect hashing.

OTHER TIPS

See my answer here. The problem is a bit different, but the solution could be tailored to suit your needs. The original uses a 100% load factor, but that could be easily changed. It works by shuffling the array in-place at startup-time (this could be done at compile time, but that would imply compiling generated code).

WRT the hashfunction: if you don't know anything about the contents of 20byte objects, any reasonable hashfunction (FNV, Jenkins, or mine) will be good enough.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top