Frage

I was working on this class last night as a type-safe wrapper for memory aligned objects. I have the byte array and the math to access the byte array's memory for reading and writing as T. I am curious, though, how I can provide the most efficient access to the aligned T.

I tried using a public T & called Value which I would initialize to the aligned T in the constructor initializer list. Like this:

template <typename T, size_t alignment = 64>
struct Aligned {
private:
    std::uint8_t bytes[sizeof(T) + alignment - 1];
public:
    T & Value;
    Aligned(T const & value = T()) : Value(*reinterpret_cast<T *>((intptr_t)bytes + (alignment - 1) & ~intptr_t(alignment - 1))) {
        Value = value;
    }
};

That increases the size of the class by sizeof(T *) since T & Value needs to store the address of the aligned T.

My other approach is to not store the address but to calculate it each time access is required, via accessor methods...

#include <array>
#include <cstdint>

template <typename T, size_t alignment = 64>
struct Aligned {
private:
    std::array<uint8_t, sizeof(T) + alignment - 1> const bytes;
public:
    T const & value() const {
        return *reinterpret_cast<T *>((intptr_t)bytes.data() + (alignment - 1) & ~intptr_t(alignment - 1));
    }
    void value(T const & x) {
        *reinterpret_cast<T *>((intptr_t)bytes.data() + (alignment - 1) & ~intptr_t(alignment - 1)) = x;
    }
    Aligned(T const & x = T()) {
        value(x);
    }
};

This approach will require pointer arithmetic and a pointer dereference (I think?) for each access but adds nothing to the size of the class.

Are there any other approaches or tricks to get both advantages?

War es hilfreich?

Lösung 2

I think option 1 looks neater, and I don't think there is any benefit with option 2.

However, if you need to know which gives you best performance, you really need to run the code in a way that can measure performance. Me, or anyone else, looking at the code and saying "A looks better than B" is no good - compilers aren't 100% predictable, and sometimes the choice that "Looks good" isn't the best choice. This is something I say about ALL performance posts, and there is a good reason for that. I have personally experienced it where you look at two pieces of code, saying "Well, they are going to take the same time, they are almost identical", but because there is some subtle difference, the performance is noticeably better in case A than in case B (or the other way around).

Make sure you don't just test the trivial case here, you need a few different variations, such as a struct with a fair number of members, large and small array, as well as the simple int, long long, double, etc.

Andere Tipps

If you have access to C++11, you can use the new alignas keyword to get the compiler to align a type or variable for you.

alignas(64) classA myA;
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top