Question

I am new to bloom filter concept. Please let me know your thoughts on this. I have 3 types of categories. Each type contains billions of categories.

  1. Do I need 3 bloom filter objects or is there any way to manage all the category types in object?

  2. I am using Apache hadoop bloom filter implementation i.e org.apache.hadoop.util.bloom.Filter. Is there any other implementation better than this?

  3. What should be the ideal bit array size to handle billion records?

Was it helpful?

Solution

  1. Do I need 3 bloom filter objects: depending on what you want to do (you didn't describe that), yes.

  2. Is there any other implementation: sure! Try using Google.

  3. Ideal bit array size: it depends on what you want to do. Try reading the Wikipedia article about Bloom filters. There are formulas to calculate the probability.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top