Question

I was reading "Programming Pearls" and I am really confused in one of the solution explanations.

The question was: "A file containing at most n positive integers, each less than n, where n = 10^7. Each positive integer could appear at most ten times. How would you sort the file?"

Given solution in the book: " If each integer appears at most ten times, then we can count its occurence in a four-bit half byte. Using the solution to Problem 5 (below) we can sort the complete file in a single pass with 10,000,000/2 bytes, or in k passes with 10,000,000/2k bytes"

Solution to problem 5 is: A two-pass algorithm first sorts the integers 0 through 4,999,999 using 5,000,000/8 = 625,000 words of storage, then sorts 5,000,000 through 9,999,999 in a second pass. A k-pass algorithm sorts at most n non-repeated positive integers less than n in time kn and space n/k.)

I am not getting how author is coming to 10,000,000/2k space to sort. I mean, based on the solution to problem 5, first we need 625K bytes of space to sort in first pass and additional 1/2 byte per integer to store the count right?

Could someone please help me understand this?

Thanks a lot.

Was it helpful?

Solution

Each positive integer could appear at most ten times.

You can use 4 bits per counter for this, not 2 bytes. If you group counters you can even lower this value, for example if you group 3 counters, that's 10*10*10=1000 combinations and you need 10 bits (=1024 values) for that.

OTHER TIPS

10,000,000 - because there are 10,000,000 possible values.

2 - because each byte consists of two half-bytes.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top