Question

I have written some AVX2 code to run on a Haswell i7 processor. The same codebase is also used on non-Haswell processors, where the same code should be replaced with their SSE equivalents. I was wondering is there a way for the compiler to ignore AVX2 instructions on non-Haswell processors. I need something like:

public void useSSEorAVX(...){
    IF (compiler directive detected AVX2)
        AVX2 code (this part is ready)
    ELSE
        SSE code  (this part is also ready)
    }
}

Right now I am commenting out related code before compiling but there must be some more efficient way to do this. I am using Ubuntu and gcc. Thanks for your help.

Was it helpful?

Solution 2

If you just want to do this at compile-time then you can do this:

#ifdef __AVX2__
    // AVX2 code
#elif __SSE__
    // SSE code
#else
    // scalar code
#endif

Note that when you compile with gcc -mavx2 ... then __AVX2__ gets defined automatically. Similarly for __SSE__. (Note also that you can check what's pre-defined by your compiler for any given command line switching using the incantation gcc -dM -E -mavx2 - < /dev/null.)

If you want to do run-time dispatching though then that's a little more complicated.

OTHER TIPS

I don't think it's a good idea to make separate executable unless you have to. In your case you can make a CPU dispatcher. I did this recently for GCC and Visual studio.

Let's assume you have a function called product for SSE and AVX. You put the SSE version in a file product_SSE.cpp and the AVX2 version in a file product_AVX2.cpp. You compile each one separately (e.g. with -msse2 and -mavx2). Then make a module like this:

extern "C" void product_SSE(float *a, float *b, float *c, int n);
extern "C" void product_AVX2(float *a, float *b, float *c, int n); 
           void product_dispatch(float *a, float *b, float *c, int n); 
void (*fp)(float* a, float *b, float *c, int n) = product_dispatch;

inline void product_dispatch(float *a, float *b, float *c, int n) {
    int iset = instrset_detect();
    if(iset==8) {
        fp = product_AVX2
    }
    else {
        fp = product_SSE
    }
    fp(a,b,c,n);
}

inline void product(float *a, float *b, float*c, int bs) {
    fp(a,b,c,n);
}

You compile that module with the lower common instruction set (e.g. with SSE2). Now when you call product it first calls product_dispatch sets the function pointer fpto either product_AVX2 or product_SSE and then calls the function from the function pointer. The second time you call productit jumps right to product_AVX2or product_SSE. This way you don't have to have separate executable.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top