Question

I need to read data from a stream using the following algorithm:

-Count all consecutive set bits ("1"s) from the stream.

-Then, read k more bits from the stream. K is variable and changes throughout the program. Lets call the read data "m"

The decoded number then is

number = (consecutive_set_bits << k) + m;

This algorithm is executed a very large amount of times. Because of this, it is crucial that this piece of code be as fast as possible.

The main problem is that the number of coded numbers in a 1byte, two byte, four byte, etc. set is variable, and thus a trivial implementation (the only one that I have in my head right now) requires a loop that reads single bits from the stream. In the worst case, I have 14 iterations through the loop for just one coded coefficient.

Can I avoid this loop somehow?

Was it helpful?

Solution

The idea of sequentially extracting single bits is not too bad. If done right, it may be almost as fast as any other solution.

Bit sequences at arbitrary positions in a stream of granularity g, with g=16 for a stream of (16-bit) words for instance, can be handled block-wise on blocks of size g.

To extract the bits at positions s through e (with (e - s) <= g) from a stream as a 'right-aligned' number an example implementation may be:

shift = s % g

lowerBits = data[ floor( s / g ) ] >> shift
upperBits = data[ floor( e / g ) ] << (g - shift)

bitSequence = (lowerBits | upperBits) & ( (1 << (e-s)) -1 )[*]

[*] this last term only masks out any unneeded upper bits we may have got and makes them 0 in the final result.

(careful with the endianess of your data :))

Whether this will really speed things up or not cannot be determined in general. (Depends on the data being processed, the underlying computing hardware, the compiler used &c. Note that some divisions and one modulo operation are required which might slow down the algorithm significantly.)

Extracting bits one-by-one can be done quite efficiently in the same manner. For example:

blockIndex = floor( bitPosition / g )
bitIndex = bitPosition % g
nextBit = (data[ blockIndex ] >> bitIndex) & 1

This can of course be optimized to avoid the re-calculation of blockIndex and bitIndex if and when the bitPosition is always only incremented by 1.

Another common approach is to use a variable 'mask' to extract the single bits:

mask = 1
index = 0
while ( not all bits read ) { 
  block = data[index]
  if ( mask & block != 0 ) {
    // a 1 was encountered
  } else {
    // a 0 was encountered
  }
  mask = mask << 1
  if ( mask == 0 ) {
    mask = 1
    index = index + 1
  }
}

Note how the mask is used to both mask the current bit and keep track of when to advance to the next block of data. For this to work, mask must of course be of the same width g as the data blocks.

To sum it all up:

I don't think, in a general case, the solution can be more efficient than one loop iteration per bit read and any optimizations will only slightly change the performance in one direction or the other.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top