سؤال

A Bloom Filter needs k hash functions, which return a value between 0 and m (m is the length of the bit array). I have to implement such a bloom filter and I already read some theoretical papers about those filter (how they work, how many hash functions you need, how the error behaves etc.)

Now I have two questions about hash functions:

  • How do I find k hash functions - which hash functions should I use?
  • How do I find hashs functions which return a value between 0 and m? Alternatively, how can I map the output of a hash function to the range 0-m?
هل كانت مفيدة؟

المحلول

نصائح أخرى

You can get a bunch of hash values at one time with a cryptographic hash function like MD5 or SHA. Divide the cryptographic hash value into N m-bit pieces, and treat them just like you would the output of N normal hash functions.

  • The characteristics of the k hash functions that should be used are
    given on the wikipedia page: Bloom Filter and go in the
    algorithm description where it says - The requirement of designing k different independent hash functions...

  • For good tutorials on universal hashing : Nice MIT lecture

U could use any hash functions....there are many hash simple hash functions available in the net. Refer http://en.wikipedia.org/wiki/List_of_hash_functions

Use any hash function to get a hash value and in order to get it in the range of 0-m , use the modulus(%) operator. i.e bitlocation = hash % m;

it might not be efficient ,but it works...

You can use the Kirsch-Mitzenmacher optimization:

hash_i = hash1 + i * hash2

Where hash1 is 32 bit, hash2 is 32 bit. In a look you could use:

hash1 = upper 32 bit of a 64 bit hash
hash2 = lower 32 bit of a 64 bit hash
hash = hash1
for(int i=0; i<k; i++) {
    hash += hash2
}
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top