Question

I have a code that needs to run fast and I am optimizing the heck out of the inner loop that is run several hundred trillion times.

In pursuit of this, I have been writing several different versions of the code in this inner loop, some using naive methods, some using SSE intrinsics, etc etc. I did all of this with the idea that when I run it it on a particular hardware combination I could run a test, see which implementation / compiler commands combination worked best and run it.

At first when it was only two different methods I used a simple conditional compilation inside the loop as follows

do
{
    #ifdef naive_loop
    //more code here
    #endif
    #ifdef partially_unrolled_loop
    //more code here
    #endif
}
while( runNumber < maxRun );

Later as the number of variations and different things I tried grew, it turned into this:

#ifdef naive_loop
void CalcRunner::loopFunction()
{
//code goes here
}
#endif
#ifdef partially_unrolled_loop
void CalcRunner::loopFunction()
{
//code goes here
}
#endif
#ifdef sse_intrinsics
void CalcRunner::loopFunction()
{
//code goes here
}
#endif
//etc

However this is making my file become enormous and annoying to read. Is there a more elegant way to do this?

Was it helpful?

Solution

You can use template and template specialization to do the job. For example:

template <typename T>
class CalcRunner;

template <>
class CalcRunner<naive_loop>
{
   void loopFunction(void){...}
};

template <>
class CalcRunner<partially_unrolled_loop>
{
   void loopFunction(void){...}
};

// Now instantiate what you wanna at compiler time

typename CalcRunner<partially_unrolled_loop> CalcRunner_t

int main()
{
   CalcRunner_t runner;
   runner.loopFunction();
}

OTHER TIPS

Why don't you just put the different implementations in different files, and conditionally include the proper one? That's what people usually do for multiplatform code.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top