You want to pack some bits from adjacent bytes into one byte. That can be achieved by combining the lowest 7 bits of the left byte shifted left with the lowest 7 bits of the right byte shifted right:
void pack(const uint8_t in[16], uint8_t out[14])
{
out[ 0] = (in[ 0] & 0x7f) << 1 | (in[ 1] & 0x7f) >> 6;
out[ 1] = (in[ 1] & 0x7f) << 2 | (in[ 2] & 0x7f) >> 5;
out[ 2] = (in[ 2] & 0x7f) << 3 | (in[ 3] & 0x7f) >> 4;
out[ 3] = (in[ 3] & 0x7f) << 4 | (in[ 4] & 0x7f) >> 3;
out[ 4] = (in[ 4] & 0x7f) << 5 | (in[ 5] & 0x7f) >> 2;
out[ 5] = (in[ 5] & 0x7f) << 6 | (in[ 6] & 0x7f) >> 1;
out[ 6] = (in[ 6] & 0x7f) << 7 | (in[ 7] & 0x7f) >> 0;
out[ 7] = (in[ 8] & 0x7f) << 1 | (in[ 9] & 0x7f) >> 6;
out[ 8] = (in[ 9] & 0x7f) << 2 | (in[10] & 0x7f) >> 5;
out[ 9] = (in[10] & 0x7f) << 3 | (in[11] & 0x7f) >> 4;
out[10] = (in[11] & 0x7f) << 4 | (in[12] & 0x7f) >> 3;
out[11] = (in[12] & 0x7f) << 5 | (in[13] & 0x7f) >> 2;
out[12] = (in[13] & 0x7f) << 6 | (in[14] & 0x7f) >> 1;
out[13] = (in[14] & 0x7f) << 7 | (in[15] & 0x7f) >> 0;
}
Although there is a clear pattern for each of the blocks, it is probably faster to code this without a loop, because the loop control and shift arithmetic won't take any time. The code might be sped up by precalculating an auxiliary input array with all the most significant bits already removed, so you don't have to extract the lowest 7 bits (x & 0x7f
) twice for each bit. (The last right shift by 0 doesn't do anything, but the compiler will optimise it away. I've kept it for symmetry.)