Question

this is a question given in a PDF about streaming algorithms (this isnt an assignment but im trying to understand)

Exercise 4.4.1 : Suppose our stream consists of the integers 3, 1, 4, 1, 5, 9, 2, 6, 5. Our hash functions will all be of the form h(x) = ax+ b mod 32 for some a and b. You should treat the result as a 5-bit binary integer. Determine the tail length for each stream element and the resulting estimate of the number of distinct elements if the hash function is:

(a) h(x) = 2x + 1 mod 32.

(b) h(x) = 3x + 7 mod 32.

(c) h(x) = 4x mod 32.

! Exercise 4.4.2 : Do you see any problems with the choice of hash functions in Exercise 4.4.1? What advice could you give someone who was going to use a hash function of the form h(x) = ax + b mod 2k ?

I have already resolved the first exercise, finding a max R tail length of 0 for (a) and 4 for (b) and (c), therefore the resulting estimation of distinct elements is respectively 1,16,16. (it is not asked to do averages/medians of the hash functions to find a better value)

However, I can't seem to figure the answer to the second exercice ? Is it simply to choose 'a' and 'b' in a certain manner ? or are these functions not good to generate equally randomly numbers with trailing 0s and no trailing 0s ?

Thank you in advance

you can observe the results of each hash functions by running this code : https://onlinegdb.com/rJXC4f4VL

Was it helpful?

Solution

However, I can't seem to figure the answer to the second exercice ? Is it simply to choose 'a' and 'b' in a certain manner ? or are these functions not good to generate equally randomly numbers with trailing 0s and no trailing 0s ?

You have the right idea. I believe what the question is trying to suggest is that this hash function has a problem with certain values of $a$ and $b$. Consider, in particular, $a = 16$. What will happen to the hash function in that case? How many possible values are there?

So, to pick a good value of $a$, we should probably make sure that $a = 16$ is not allowed. $a = 8$ is not good either. What do you suggest is a good choice for $a$?

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top