Are different salted hashes, equivalent to different hashing algorithms for a bloom filter?

StackOverflow https://stackoverflow.com/questions/22600143

سؤال

As your data set gets larger, you need more hashing algorithms to keep a low false positive rate of 1%.

If I want my bloom filter to grow dynamically at run time, it's unknown how many hashing algorithms I will need. If I use the same (say MD5) hasher, but with randomly generated salts, that are appended to the value before hashing it, will this have the same effect as using a different hasher (say MD5, SHA1, etc)?

I use .NET C# for reference, but the language is almost irrelevant for this question.

هل كانت مفيدة؟

المحلول

MD5 is a pretty expensive way to generate hashes for a Bloom filter. You probably want to use something that executes a bit faster such as a Jenkins hash or one of its variants, or something along these lines.

As you've noted, the Bloom filter requires a lot of hash functions. Coming up with 17 unique hash functions is difficult at best. Fortunately, there's a way to avoid having to do that. I used the technique described in the paper Less Hashing, Same Performance: Building a Better Bloom Filter. This turned out to be very easy in C#, and the performance was very good.

The math in the paper can be a bit hard to follow, but you can get the gist of it fairly easily. And the paper describes a couple of different ways to generate multiple hash code values simply and quickly.

Also, Bloom filters aren't in general easy to size dynamically. If you want the Bloom filter to grow, you need have to specifically build a scalable Bloom filter that supports it. A Google search on [scalable bloom filter] will provide a number of references and some code samples.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top