سؤال

We have a set S of keys.

For membership queries (is k in S?), bloom filters often help us quickly determine that a key is not in the set.

How can we filter range queries (is there a key from the range [k1,k2] in S?) ?

هل كانت مفيدة؟

المحلول

You can solve this problem in time O(log n) using either Segment Trees or Fenwick Trees.

With segment trees, you can ask the question is there a bit-set in the range [a..b]? This question can be answered in time O(log n). Also, you can set (or unset) a single bit in time O(log n).

Similarly with Fenwick Trees.

Assumption: Keys k1, k2, etc... are integers - we must make this assumption so that we can make sense of the range [k1..k2].

نصائح أخرى

Do you have control of the length of the ranges you want to test? If the range is small (k2 - k1 < some smallish number), then you could just use the bloom filter from before and check each k in the range.

You can make testing multiple values faster by making sure the low order byte / bits of one of the hashes are passed through the hash function intact.

E.g. you have hash function f(x) and hash function g(x). You define f(x) such that f(x) = hash_function(x div 16) + x mod 16.

When you search, you can search the 2 bytes (16 bits) surrounding the f(x) result for a 1 bit. If you find one, test the corresponding value for a hit.

This means that you can search for matches 16 values at a time with a fast two-byte retrieval.

Note that playing with the hash functions this way may affect your results in other ways.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top