How does this color blending trick that works on color components in parallel work?

Question

(a ^ b) & 0x00010101 is what the least significant bits of the channels would have been in a + b if no carry had come from the right.

Subtracting it from the sum guarantees that the bit that is shifted into the most significant bit of the next channel is just the carry from that channel, untainted by this channel. Of course that also means that this channel is no longer effected by the carry from the next channel.

An other way to look this, not the way it does it but a way that may help you understand it, is that effectively the inputs are changed so that their sum is even for all channels. The carries then go nicely into the least significant bits (which are zero, because even), without disturbing anything. Of course what it actually does is sort of the other way around, first it just sums them, and only then does it ensure that the sums are even for all channels. But the order doesn't matter.

More concretely, there are 4 cases (before the carry from the next channel is applied):

the lsb of a channel is 0 and there is no carry from the next channel.
the lsb of a channel is 0 and there is a carry from the next channel.
the lsb of a channel is 1 and there is no carry from the next channel.
the lsb of a channel is 1 and there is a carry from the next channel.

The first two cases are trivial. The shift puts the carried bit back in channel it belongs to, it doesn't even matter whether it was 0 or 1.

Case 3 is more interesting. If the lsb is 1, that means the shift would shift that bit into the most significant bit of the next channel. That's bad. That bit has to be unset somehow - but you can't just mask it away because maybe you're in case 4.

Case 4 is the most interesting. If the lsb is 1 and there is a carry into that bit, it rolls over to a 0 and the carry is propagated. That can't be undone by masking, but it can be done by reversing the process, ie subtracting 1 from the lsb (which puts it back to 1 and undoes any damage done by the propagated carry).

As you can see, in both case 3 and case 4, the cure is subtracting 1 from the lsb, and those are also the cases in which the lsb really wanted to be 1 (though maybe it isn't any more, due to a carry from the next channel), and in both case 1 and 2, you don't have to anything (in other words, subtract 0). That exactly corresponds to subtracting "what the lsb would have been in a + b if no carry had come from the right".

Also, the blue channel can only fall into cases 1 or 3 (there is no next channel which could carry), and the shift would just discard that bit instead of putting it in the next channel (because there is none). So alternatively, you may write (note the mask has lost the least significant 1)

public static int blendRGB(int a, int b) {
    return (a + b - ((a ^ b) & 0x00010100)) >> 1;
}

Doesn't really make any difference, though.

To make it work for ARGB8888, you can switch to the good old "SWAR average":

// channel-by-channel average, no alpha blending
public static int blendARGB(int a, int b) {
    return (a & b) + (((a ^ b) & 0xFEFEFEFE) >>> 1);
}

Which is a variation on a recursive way to define addition: x + y = (x ^ y) + ((x & y) << 1) which computes the sum without carries, then adds the carries in separately. The base case is when one of the operands is zero.

Both halves are effectively shifted right by 1, in such a way that the carry out of the most significant bit is never lost. The mask ensures that bits don't move to a channel to the right, and simultaneously ensures that a carry won't propagate out of its channel.