Question

I was reading "Programming Pearls" and I am really confused in one of the solution explanations--problem 9 in column 1.

The question was: When using bitmap data to represent a set of integers, the first phase initializes the set to empty. But initializing the space can itself take a great deal of time. Show how to circumvent this problem by designing a technique to initialize an entry of a vector to zero the first time it is accessed.

The answer was : The effect of initializing the vector data[0...n-1] can be accomplished with a signature contained in two additional n-element vectors, from and to, and an integer top. If the element data[i] has been initialized, then from[i] < top and to[*from*[i]] = i. Thus from is a simple signature, and to and top together make sure that from is not accidentally signed by the random contents of memory.

I have read this answer several times. I don't understand it.

Can someone explain it?

Thanks,

Was it helpful?

Solution

This page helped me out: http://comments.gmane.org/gmane.comp.programming.algogeeks/30667

But let me see if I can explain it.

Basically the problem is: initializing a vector to all 0s takes a non-trivial amount of time if the vector is sufficiently large. So, how can we avoid initialization to 0s by using more space? i.e. How can we differentiate random data in this vector from data that we intentionally put there?

Bentley's solution is to use a "from" (a map) and "to" (a signature [really just the reverse map to the index of from]) vectors of the same size as the data vector and "top" which is the number of elements in the data array so far. It is important that from[i] < top as explained below.

Using the example from the solution: We declare a data array and set the number of elements to be zero:

top = 0
data = int array of integers of size 1,000,000 
       (all random values since we did not initialize it)

Insert an element at index 1 (i=1 in this case). But now how do we know that that is not a random value? We use the map and the signature. The index of "from" is equal to the index of data.

from[i] = top
to[top] = i
data[i] = 0 (I don't think it matters whether you set it to 0 or your intended value of 3)
top++ (top is now 1)

So you might say that what if to[from[i]] == i by chance. Well since we state from[i] < top, this is not possible.

Examine the following two cases:

A) There are no elements inserted yet into the data array (i.e. top = 0) which implies from[i] < 0 which is not a valid array index. So this is not possible.

B) There is an element inserted (i.e. top > 0, lets say it is 1) Since from[i] < top => from[i] = 0. However, we inserted one element into the data array so we have explicitly set to[from[i]] = i.

The rests follows for top = 2...n

HTH

OTHER TIPS

I found the post in the following link to be very helpful in understanding the idea behind the constant time initialization trick.

Link: http://eli.thegreenplace.net/2008/08/23/initializing-an-array-in-constant-time/

Hope it helps-

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top