Question

I do a lot of bit vector operations in my software. For example: suppose I need to store boolean information about a candidate 'n', I do the following:

uint64_t *information_vector;
uint32_t pos = n / 64;
uint32_t bit_pos = n % 64;

information_vector[pos] |= (1 << bit_pos);

and I follow similar procedure while reading that information:

uint32_t pos = n / 64;
uint32_t bit_pos = n % 64;
if (information_vector[pos] & (1 << bit_pos)) {
       // do something
}

In the meantime, I also write the information_vector to the disk and read it back again. Now, I am trying to solve a bug which is giving me nightmares and it struck me that Endianess might be a culprit here but I can not explain. Is there any way I can check? Is this bit vector manipulation generally endian safe and across architectures?

I also see that somewhere in the code I set some other information in another bit vector for the same candidate as:

uint8_t byte_position = n / 8;
uint8_t bit_position = n % 8;
another_information_vector[byte_position] |= (1 << bit_position);

I usually find common set of attributes by and-ing these bit vectors.

Was it helpful?

Solution

For most cases, the safest variant is to operate on byte level, so, divisor is 8. OTOH it can be suboptimal in some cases. There are architectures without direct access to a byte, or with expensive access, compared with a word access.

On a little-endian machine, the same approach works unchanged when selecting any reasonable divisor (8, 16, 32, 64). For example, for bit index 22, byte-level access deals with bit numbered 6 of the byte with index 2; short-word access deals with bit 6 of short-word with 1; and so forth.

On a big-endian machine, this needs replacing of 1 << bit_position with 1 << (BITS_PER_CELL-1-bit_position), or (the same) HIGHEST_BIT >> bit_position, where HIGHEST_BIT is 0x80 for uint8_t, 0x80000000 for uin32_t, etc. And, bit index 0 will mean MSB of byte 0, as opposed to little-endian case where it means LSB of byte 0.

(A similar effect can be seen on serial wires. In RS232 or Ethernet, bytes are transmitted from LSB to MSB. The individual/group bit in MAC address is the very first one on the wire but it's LSB of the first octet.)

OTHER TIPS

This is certainly endian safe across architectures within CPU. Writing to disk from one architecture and then reading it back on a different architecture will depend on how you are reading and writing it to disk. This is no different than the problems that you would have in writing any multi-byte number to disk and reading it back. Both ends have to interpret that number the same. If in this example you are just writing the 8 bytes to disk and then reading them on a different endian architecture, then you are going to have the bytes swapped.

Generally speaking, if you always access your bit vector using the same type (in your case uint64_t), and the endian-ness of all systems on which you access the data is the same, then Endian-ness will not become a problem.

The easiest way to reassure yourself though, is to cast the address of the object to char* and dereference, which will let you see one byte at a time in the order they are laid out in memory.

Update: I just observed that your third block of code seems to compute byte_position by doing n % 8.

If you are sometimes writing out an array of uint64_t, and sometimes treating it as an array of uint8_t, then your results will probably be unexpected if your system is little endian.

The best way to avoid this problem is to keep your types consistent.

To make this problem more concrete, consider the following example:

#include <stdio.h>
#include <stdint.h>

int main(){
    uint64_t myVector = 1 << 2; // set second bit of LSB
    uint8_t * ptr = (uint8_t *) &myVector;
    int i;
    for (i = 0; i < 8; i++)
       printf("%x\n", ptr[i]);
}

On my little-endian x86 system, this will print 4 followed by 7 0's, because the Most Significant Byte is stored at the address at the highest address in the uint64_t. This might run counter to your intuition, if you are used to thinking of the bits laid out from Most Significant to Least Significant, left to right.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top