Calculating the right number of bits in a bloom filter
-
26-04-2021 - |
سؤال
I'm trying to make a configurable bloom filter. In the constructor you set the predicted necessary capacity of the filter (n
), the desired error rate (p
), and a list of hash functions (of size k
).
According to Wikipedia, the following relation holds (m
being the number of bits):
p = (1 - k * n / m) ** k
Since I get p
, n
and k
as parameters, I need to solve for m
; I get the following:
m = k * n / (1 - p ** (1 / k))
However, there are a few things that make me think I did something wrong. For starters, p ** (1 / k)
will tend towards 1
for a large enough k
, which means the whole fraction is ill defined (because you can conceivably divide by 0
).
Another thing you may notice is that as p
(the allowed maximum error rate) grows, so does m
, which is totally backwards.
Where did I go wrong?
المحلول
You did solve the equation correctly, however note that Wikipedia states:
The probability of all of them being 1, which would cause
the algorithm to erroneously claim that the element is in
the set, is often given as:
p ~= (1 - (1 - 1 / m) ** (k * n)) ** k ~= (1 - Exp(-k * n / m)) ** k
This is very different from what you've stated:
p = (1 - k * n / m) ** k
So what you really want to start with is
p = (1 - (1 - 1 / m) ** (k * n)) ** k
I worked this out to be
(1 - 1 / m) ** (k * n) = 1 - p ** (1 / k)
1 - 1 / m = (1 - p ** (1 / k)) ** (1 / (k * n))
m - 1 = m * (1 - p ** (1 / k)) ** (1 / (k * n))
m - m * (1 - p ** (1 / k)) ** (1 / (k * n)) = 1
m * (1 - (1 - p ** (1 / k)) ** (1 / (k * n))) = 1
m = 1 / (1 - (1 - p ** (1 / k)) ** (1 / (k * n)))