سؤال

I understand what makes bloom filters an attractive data structure; however, I'm finding it difficult to really understand when you can use them since you still have to perform the expensive operation you're trying to avoid to be certain that you haven't found a false positive. Because of this wouldn't they generally just add a lot of overhead? For example the wikipedia article for bloom filters suggests they can be used for data synchronization. I see how it would be great the first time around when the bloom filter is empty but say you haven't changed anything and you go to synchronize your data again. Now every lookup to the bloom filter will report that the file has already been copied but wouldn't we still have to preform the slower lookup task we were trying to avoid to actually make sure that's correct?

هل كانت مفيدة؟

المحلول

Basically, you use Bloom filters to avoid the long and arduous task of proving an item doesn't exist in the data structure. It's almost always harder to determine if something is missing than if it exists, so the filter helps to shore up losses searching for things you won't find anyway. It doesn't always work, but when it does you reap a huge benefit.

نصائح أخرى

Bloom filters are very efficient in the case of membership queries, i.e., to find out whether an element belongs to the set. The number of elements in the set does not affect the query performance.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top