문제

Whilst reading a book about physics engine development recently, I came across a design decision which I have never even considered before. This relates to the way the raw bytes in memory are addressed by the CPU.

Consider the following class:

class Foo
{
    public:
        float x;
        float y;
        float z;

        /* Constructors and Methods */

    private:
        float padding;
}

The author claims that the padding, increasing the size of the object to a quad-word in x86 architecture, results in a noticeable benefit to performance. This is because 4 words sit more cleanly in memory than 3, what does this mean? Padding out an object with redundant data to increase performance seems pretty paradoxical to me.

This also begs another question, what about objects that are 1 or 2 words in size? If my class is something like:

class Bar
{
    public:
        float x;
        float y;

        /* Constructors and Methods */

    private:
        /* padding ?? */
}

Should I add padding to this class so that it sits more cleanly in memory?

도움이 되었습니까?

해결책

It is the compiler's responsibility to decide what reasonable padding (assuming typical access patterns) is. The compiler does know a whole lot more about your machine than you'll ever will. Besides, your machine will be with you a couple of years; the program will be around for decades, running on a wide range of platforms, subject to a mind boggling variety of usage patterns. What is the best for today's i7 could very well be the worst for tomorrow's i8 or ARMv11.

Obfuscating code in pursuit of elusive "performance" falls squarely into premature optimization. Always remember that your time (writing, testing, debugging, understanding again after a week's time, on tweaked code) is much, much more expensive than the possibly wasted computer time (unless said code is run thousands of times a day on millions of machines, that is). Code tweaking makes no sense at all until you have hard facts showing that the performance isn't enough, and measurements telling you that shuffling that structure around is a bottleneck worth worrying about.

다른 팁

Processors doesnt "read" the memory byte by byte as humans, they process it chunk by chunk, of variable sizes depending of the processor. It's called memory access granularity;

By "memory aligning" your object, the acess time may be faster and you can also avoid data fragmentation.

You can read more about data alignement here

Edit: I'm not saying that it's a good or bad practice, just sharing what I know about it.

There are two really important things to say in answer to this question.

First, if you're going to tweak code for performance benefits, and if you've decided it's worthwhile (for whatever reason), you must first write a benchmark. You must be able to try both and measure the difference.

Second, tweaks of this kind will depend on how the assembly language interacts with the hardware. You must be able to read assembly language code and understand the different instructions sets and hardware accessing modes in order to understand why these tweaks might work.

Finally, your question has no answer in isolation. It depends on whether those objects are allocated individually or are in collections; whether there are other objects next to them; and how the compiler generates code for each case. In all likelihood alignment on a power-of-two boundary will be faster than misalignment, but a collection that fits in a cache is faster than one that doesn't. I wouldn't expect padding 8 or 4 bytes to improve performance, but if it was important, I would try it and test the result.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top