سؤال

I'm trying to make a configurable bloom filter. In the constructor you set the predicted necessary capacity of the filter (n), the desired error rate (p), and a list of hash functions (of size k).

According to Wikipedia, the following relation holds (m being the number of bits):

p = (1 - k * n / m) ** k

Since I get p, n and k as parameters, I need to solve for m; I get the following:

m = k * n / (1 - p ** (1 / k))

However, there are a few things that make me think I did something wrong. For starters, p ** (1 / k) will tend towards 1 for a large enough k, which means the whole fraction is ill defined (because you can conceivably divide by 0).

Another thing you may notice is that as p (the allowed maximum error rate) grows, so does m, which is totally backwards.

Where did I go wrong?

هل كانت مفيدة؟

المحلول

You did solve the equation correctly, however note that Wikipedia states:

The probability of all of them being 1, which would cause
the algorithm to erroneously claim that the element is in
the set, is often given as:

p ~= (1 - (1 - 1 / m) ** (k * n)) ** k ~= (1 - Exp(-k * n / m)) ** k

This is very different from what you've stated:

p = (1 - k * n / m) ** k

So what you really want to start with is

p = (1 - (1 - 1 / m) ** (k * n)) ** k

I worked this out to be

(1 - 1 / m) ** (k * n) = 1 - p ** (1 / k)
1 - 1 / m = (1 - p ** (1 / k)) ** (1 / (k * n))
m - 1 = m * (1 - p ** (1 / k)) ** (1 / (k * n))
m - m * (1 - p ** (1 / k)) ** (1 / (k * n)) = 1
m * (1 - (1 - p ** (1 / k)) ** (1 / (k * n))) = 1
m = 1 / (1 - (1 - p ** (1 / k)) ** (1 / (k * n)))
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top