Question

Clang has a C/C++ extension that which allows you to treat vector values as first-class citizens:

typedef double double4 __attribute__((ext_vector_type(4));
// easy assignment
double4 a = {1, 2, 3, 4};
double4 b = {4, 3, 2, 1};
// basic operators work component-wise
double4 c = a + b; // {5, 5, 5, 5}
// you can even swizzle elements!
double4 d = a.zyxw; // {3, 2, 1, 4}

I would believe that these vectors make use of the underlying platform's SIMD instructions (SSE on Intel Macs, NEON on ARM). However, I'm not too sure how the Mac OS calling convention deals with vector types.

Will it be more efficient to pass vectors by reference or by copy? The difference might not be huge, but since I'll be passing around a lot of vectors, I figured I might pick up the right habit as soon as possible.

Was it helpful?

Solution

A quick test shows that in your example double4 arguments are passed on the stack but returned in registers xmm0 and xmm1. This is a bit weird. float4 arguments on the other hand are passed in registers xmm0 up to xmm7 and results returned in xmm0, as you would expect.

Apple uses the System V Application Binary Interface. AMD64 Architecture Processor Supplement. for Mac OS X. If I interpret that document correctly, everything should be passed in registers. I am not sure what clang is doing here. Maybe this is still work in progress and may change in the future? If they do, it may break your program when you try to mix old and new behavior.

For performance, passing vectors per value with clang is not a problem. If your functions are not extremely short, there should be no noticable difference. If you do use very small functions, you should try to convince the compiler to inline them (e.g. by declaring them static).

EDIT: Regarding AVX extensions: if you enable them, the compiler uses registers ymm0 to ymm7 for arguments and ymm0 for results. In that case a double4 occupies a single ymm register instead of a xmm register pair.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top