Question

This occurred to me when looking at stencil computations in numpy. This python library compiles compute-intensive components, so I believe that it's a valid example. Here making selections on an array is done with small offsets (0, 1, -1) a lot of times. Would a c++ compiler take into account that it doesn't need a full integer/char/bool type and "pack" bits into bytes?

For example, could selecting every second element from 0 to 7, be represented as a single byte instead of 8 bytes:

0,1,0,1,0,1,0,1

instead of

0,0,0,0,0,0,0,0

0,0,0,0,0,0,0,1

0,0,0,0,0,0,0,0

0,0,0,0,0,0,0,1

0,0,0,0,0,0,0,0

0,0,0,0,0,0,0,1

0,0,0,0,0,0,0,0

0,0,0,0,0,0,0,1

Was it helpful?

Solution

Yes, such masks and stencils will be performed efficiently. But at least when writing C, this is not automatic.

In C and C++, every object must have an address. Bools are objects. The smallest addressable value is a char. So bools will generally be one byte large. While this matters for data structures (structs, arrays), it doesn't really matter for local variables – they will end up being used as a register anyway and usually won't take up physical memory.

As a special case, the C++ standard library has specialized std::vector<bool> to use one bit per entry. However, this leads to some complications. Whereas indexing like xs[i] would usually return a reference to the vector's value type, it can't do that here because bits are not individually addressable. Instead, it has to return a proxy object with an overloaded assignment operator. For fixed-sized arrays, std::bitset can be used.

Within structures, it is possible to use the bitfield feature to determine how many bits a field will use. For example:

struct Foo {
  bool x : 1;
  bool y : 1;
};

While the struct as a whole will likely be one byte large, the x and y fields only are one bit wide so they will only accept the values 0 and 1. Note that there are various restrictions on such field: it's not possible to get a reference/pointer to a field, and they do not have a defined offset in the struct.

Within the CPU, certain SIMD instructions (e.g. from AVX extensions) use bitmasks to select items in an array, making it fairly convenient and extremely efficient to use masked arrays. However, it can be difficult for compilers to use such extensions automatically. Typically, the program would use hand-written assembly – this is still somewhat common in highly optimized numeric libraries. Compilers define platform-specific types representing various SIMD concepts like “vector of eight floats” or “bitmask”, and can then use intrinsic functions that are implemented with certain assembly instructions. While you cannot make use of such features from Python directly, you can assume that underlying libraries like Numpy do make full use of such features if available on your CPU.

Licensed under: CC-BY-SA with attribution
scroll top