How does this code work to count number of 1-bits?

Question 1

This algorithm uses x as both the source of the computation and as memory. First, think about what the bitmasks are:

0x55 = 01010101
0x33 = 00110011
0x0f = 00001111
0xff = 11111111

Let's go with an 8 bit example: a = 01101010

a & 0x55 = 01000000; (a >> 1) & 0x55 = 00010101

If we add these two together, we get the bit count of each two bit pair. This result is at most 0x10, as the mask has only one bit set for each addition.

Now, we use the 0x33 mask, remember each consecutive 2 bits contain the result from the first operation. With this mask, we mask out consecutive 4 bits and add them. This result is at most 0x100. This goes on with the other masks, until we have the result in x.

Try it on paper, it will be pretty clear after a few steps.

Question 2

It's a divide and conquer approach. Note that the first line will give you the sum on subsequent pairs, next the sum on subsequent quadruples,... and finally the number of bits.

Example (16 bit so consider your code without the last line)

1011001111000110

In the fist line we take & with even bits and odd bits shifted by one. Then we add.

For even bits:

num:  10 11 00 11 11 00 01 10
mask: 01 01 01 01 01 01 01 01
res:  00 01 00 01 01 00 01 00

For odd bits:

num>>1: 01 01 10 01 11 10 00 11
mask:   01 01 01 01 01 01 01 01
res:    01 01 00 01 01 00 00 01

We add those results and get sums in subsequent pairs

01 10 00 10 10 00 01 01

We repeat the same with following masks and corresponding shifts

2nd: 0011 0011 0011 0011
3rd: 0000 1111 0000 1111
4th: 0000 0000 1111 1111

And we get:

2nd: 0011 0010 0010 0010  // 3 set in the first four, 2 in other quadrupels
3rd: 0000 0101 0000 0100  // 5 bits set in the first eight, 4 in the rest
4th: 0000 0000 0000 1001  // 9 bits in total

Question 3

To make this easier to explain, let me suppose an integer is 4 bits long and each bit is represent as 1,2,3,4. In order to get the number of 1-bits, you can first get the sum of 1 and 2 and the sum of 3 and 4 and then add these sums up.

x & (0x5) will eliminate 1 and 3, and x & (0xa) will eliminate 2 and 4. x & (0x5) + (x & (0xa) >> 1) will use 1 2 bits to store the sum of 1 and 2 and use 3 4 bits to store the sum of 3 and 4. x & (0x5) + (x & (0xa) >> 1) is same as x & (0x5) + (x >> 1) & (0x5).

Now we have the sums of 1 2 and 3 4 all stored in x, we can get the result after we add them up. Same as above, x & (0x3) will get the sum of 3 4 and x & (0x12) will get the sum of 1 2. x & (0x3) + (x & (0x12)) >> 2 will get the final result. x & (0x3) + (x & (0x12)) >> 2 is same as x & (0x3) + (x >> 2) & (0x3).

So you can see what happen inside. In your case, the first line you can get the sum of 1 2 and 3 4 and 5 6 and go on. And in the second line you get the sum of 1 2 3 4 and 5 6 7 8 and go on. So by doing this 5 times, you get the number of all 32 bits.

Question 4

The first line treats the integer as an array of 16 2-bit integers, the result of the expression is 0 if 0 bits in the 2-bit pair were set, 1 if 1 bit in the 2-bit pair was set (01 bin or 10 bin), and 2 if both bits of the 2 bit pair were set. This is because we consider the lower of every two bits of x, and the lower of every two bits of x shifted right by one; adding them together. We know no carries will occur outside of the 2-bit pairs because our summands are limited to 0 or 1. The result is then stored back into x.

x = (x & (0x55555555)) + ((x >> 1) & (0x55555555));

The next line does the same thing, this time treating every 2-bits as a separate integer, storing their sum in every 4-bits the two summands used to occupy. After this operation the integer essentially becomes an array of 8 4-bit integers. Before the operation each summand is either 0, 1, or 2 (decimal) because it corresponds to the counts from the last line. Because of this we know each sum will be no greater than 4. Since each 4-bit int has a maximum value of 15 (decimal), we know this will also not overflow. As above, the result is stored back into x.

x = (x & (0x33333333)) + ((x >> 2) & (0x33333333));

We do the same as above, this time suming every pair of 4-bits into every set of 8 bits.

x = (x & (0x0f0f0f0f)) + ((x >> 4) & (0x0f0f0f0f));

We then sum ever pair of 8-bits into every pair of 16 bits (the upper and lower half of x).

x = (x & (0x00ff00ff)) + ((x >> 8) & (0x00ff00ff));

We now sum the upper and lower half of x, and we are left with a 32-bit value equal to the number of bits set in the value of x.

x = (x & (0x0000ffff)) + ((x >> 16) & (0x0000ffff));

The key here is that each step does an inplace pairwise count of bits until we are left with the total bit count in the 32-bit integer it self.