Question

20-30 years ago arithmetic operations like division were one of the most costly operations for CPUs. Saving one division in a piece of repeatedly called code was a significant performance gain. But today CPUs have fast arithmetic operations and since they heavily use instruction pipelining, conditionals can disrupt efficient execution. If I want to optimize code for speed, should I prefer arithmetic operations in favor of conditionals?

Example 1

Suppose we want to implement operations modulo n. What will perform better:

int c = a + b;
result = (c >= n) ? (c - n) : c;

or

result = (a + b) % n;

?

Example 2

Let's say we're converting 24-bit signed numbers to 32-bit. What will perform better:

int32_t x = ...;
result = (x & 0x800000) ? (x | 0xff000000) : x;

or

result = (x << 8) >> 8;

?

Was it helpful?

Solution

All the low hanging fruits are already picked and pickled by authors of compilers and guys who build hardware. If you are the kind of person who needs to ask such question, you are unlikely to be able to optimize anything by hand.

While 20 years ago it was possible for a relatively competent programmer to make some optimizations by dropping down to assembly, nowadays it is the domain of experts, specializing in the target architecture; also, optimization requires not only knowing the program, but knowing the data it will process. Everything comes down to heuristics, tests under different conditions etc.

Simple performance questions no longer have simple answers.

OTHER TIPS

If you want to optimise for speed, you should just tell your compiler to optimise for speed. Modern compilers will generally outperform you in this area.

I've sometimes been surprised trying to relate assembly code back to the original source for this very reason.

Optimise your source code for readability and let the compiler do what it's best at.

I expect that in example #1, the first will perform better. The compiler will probably apply some bit-twiddling trick to avoid a branch. But you're taking advantage of knowledge that it's extremely unlikely that the compiler can deduce: namely that the sum is always in the range [0:2*n-2] so a single subtraction will suffice.

For example #2, the second way is both faster on modern CPUs and simpler to follow. A judicious comment would be appropriate in either version. (I wouldn't be surprised to see the compiler convert the first version into the second.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top