문제

I'm making a vector/matrix library. (GCC, ARM NEON, iPhone)

typedef struct{ float v[4]; } Vector;
typedef struct{ Vector v[4]; } Matrix;

I passed struct data as pointer to avoid performance degrade from data copying when calling function. So I designed function like this at first:

void makeTranslation(const Vector* factor, Matrix* restrict result);

But, if function is inline, is there any reason to pass values as pointer for performance? Do those variables being copied too? How about register and caches? I tried to redesign function like this:

inline Matrix makeTranslation(const Vector factor) __attribute__ ((always_inline));

How do you think about calling costs of each cases?

  • I added 'const' to 2nd signature to reflect suggestions.
도움이 되었습니까?

해결책

When the function is inline typically no copying of variables is directly involved with the call. Variables will still be moved around and put on the stack sometimes as a normal part of execution but not as a direct result of the function call. (When you run out of registers, some values may get put on the stack, etc... but only if needed.) So the overhead of the "call" basically disappears when a function is inlined (No more setting up/tearing down the stack frame, no more unconditional jump, no more pushing/poping parameters.)

If you can rely on your always_inline attribute to always inline the function, then you should also not pass the Vector by pointer (if it isn't modified). The reason for this is that passing it by pointer requires the vector's address be taken, which means that the compiler must ensure that it has an address and thus it cannot exist only in CPU registers. This can slow things down if it isn't needed, and when you take the address of something the compiler will always ensure it has an address because the compiler can't be sure the address isn't needed.

Because of the pass-by-pointer, this code will ALWAYS have an instruction to get the object's address, and at least one dereference to get at a member's value. If you pass-by-value then this MAY still happen, but the compiler MAY be able to optimize all of that away.

Don't forget that overuse of inlining can significantly increase the size of the compiler binary code. In certain cases having large code segments (as a result of inline functions) can cause more instruction cache misses with will result in slower performance because the CPU is constantly having to go out to main memory to fetch parts of your program because some of it is too big to fit in the small L1 cache. This may be especially important in embedded processors (like the iPhone) because these processors typically have small caches.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top