Frage

While working on the XKCD April Fool's skein hash collision problem I ran across this strange, fast, multiplicative method of counting the set bits in a word:

c = (v * 0x200040008001ULL & 0x111111111111111ULL) % 0xf;

Why does this work / what's going on? Can we generalize this method (for example, to work for our 128-bit values from the problem)?

Also, I can't help but think it's related to this question about moving bits around using a clever magic number.

War es hilfreich?

Lösung

This doesn't count the set bits in a 32-bit word, actually, since the output by the nature of the modulo operator must be less than 0xf (a.k.a. 15).

First, let's pay special attention to the modulo operator. Why 15? And why are we masking to the least significant bit in each nybble?

Well, note that each least significant nybble bit is of the value 16^k for some k. Note that 16 mod 15 is 1, therefore 16^k mod 15 is 1 for any non-negative integer value of k.

This is convenient since it means that 16^k1 + 16^k2 + ... + 16^kn = n mod 15.

Put another way, the modulo operator is effectively counting the number of set least significant nybble bits due to the above math -- as long as no other bits in the nybbles are set. (They'd just get in the way.)

However, we don't want to just count specially formatted bits in nybbles. We want to count the number of bits set in an arbitrary value. The trick is to get those value bits into those specially formatted nybbles by moving the bits around. The ultimate order of the nybbles isn't important, as long as we can move one bit of the value to one nybble. In theory since we're using 64-bit values to do the counting we can map each bit in a 16 bit value to its own nybble, giving 4 * 16 = 64 bits total, just within our 64-bit allowance. However, note that because we're using modulo 15, any value with 15 or 16 set bits will display as 0 or 1, respectively.

Now let's refocus on the strange constant: 0x200040008001ULL

Let's take note of which bits are set (where bit 0 is the least significant bit): 0, 15, 30, and 45. You may have noticed they're spaced in 15 bit intervals. This is convenient because for values that are less than 2^15 this multiplication just creates multiple shifted copies of the value in a 64-bit word. But when values become equal than or greater than 2^15 the copies start overlapping additively which is no longer useful for counting the bits particularly. That's okay, though, because with that modulo operation we aren't even able to reliably count up to 15 bits of information anyway. (However, if the result of the modulo operation is 0, we know either all or none of the bits are set, again assuming we only get values less than 2^15.)

So, we have shifted copies of our 15-bit number in our 64-bit register. The second step is that the mask extracts only the least significant bits of each nybble. Because the lowest significant bit of each nybble is equivalent to 1 (mod 15) the modulo operator effectively counts the number of least significant bits set in the nybbles.

The only detail remaining is to make sure that each bit in our 15-bit number lands in a least significant nybble bit slot exactly once.

Let's check:

The first bit set, 0, doesn't shift the value at all, giving our value bits 0 through 14.
This places value value bits 0, 4, 8, and 12 in a least significant nybble bit slot.

The second bit set, 15, gives our value bits 15 through 29.
This places our value bits 1, 5, 9, and 13 in bits 16, 20, 24, and 28.

The third bit set, 30, gives our value bits 30 through 44.
This places our value bits 2, 6, 10, and 14 in bits 32, 36, 40, and 44.

Finally, the forth bit set, 45, gives our value bits 45 through 59.
This places our value bits 3, 7, 11, and 15 in bits 48, 52, 56, and 60.

Bits accounted for:
0, 4, 8,  and 12
1, 5, 9,  and 13
2, 6, 10, and 14
3, 7, 11, and 15

It's easy to visually verify that this maps 16 bits. However, note the mask is actually 15 1's, not 16. So the bit placed in the last nybble (starting at bit 60, representing bit 15 of our value, the highest bit of a 16-bit value) is effectively ignored.

With that, the total technique is complete:

  1. Use multiplication to map each bit into a least significant nybble bit.
  2. Use a mask to select only the desired nybble bits.
  3. Note that a least significant nybble bit is equivalent to 1 (mod 15).
  4. Therefore, (mod 15) will simply add those bits together... up to 14 bits set.
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top