How does flajolet martin sketch works?

Question 1

For a single word the distribution of R is a geometric distribution with p = 1/2, and its standard deviation is sqrt(2) ≈ 1.41.

So for a word with hash ending in 100000_b the algorithm will, indeed, yield 2⁵/0.77351 = 41.37. But the probability of that is only 1/64, which is consistent with the statement that the standard deviation of R is close to 1.

Question 2

http://ravi-bhide.blogspot.com/2011/04/flajolet-martin-algorithm.html

We had a good, random hash function that acted on strings and generated integers, what can we say about the generated integers? Since they are random themselves, we would expect:

1/2 of them to have their binary representation end in 0(i.e. divisible by 2),
1/4 of them to have their binary representation end in 00 (i.e. divisible by 4)
1/8 of them to have their binary representation end in 000 (i.e. divisible by 8)

Turning the problem around, if the hash function generated an integer ending in 0^m bits ..intuitively, the number of unique strings is around 2^m.

Question 3

What is really important to remember is that the Flajolet Martin Algorithm is meant to count distinct elements (lets say M distinct elements) from a set of N elements, when M is expected to be very very large.

There is no point of using the algorithm if N or M are small enough for us to store all distinct elements in memory.

In the case where N and M are really large, the probability of the estimate being close to 2^k is actually very reasonable.

There is an explanation of this at : http://infolab.stanford.edu/~ullman/mmds/ch4.pdf (page 143)