Question

In this answer somebody writes

[..] most compilers won't optimize a + b + c + d to (a + b) + (c + d) (this is an optimization since the second expression can be pipelined better)

The original question was about how certain expressions involving float value can or can not be re-ordered due to the imprecision of floating point arithmetic.

I'm more interested in the above part though: why - say, with unsigned int values - would it be easier to generate code which exploits CPU pipelines if a+b+c+d is rewritten as (a+b)+(c+d)?

Was it helpful?

Solution

a+b and c+d can be calculated in parallel.

Like this:

x = a+b
y = c+d
return x+y // requires x and y

vs

x = a+b
y = x+c // requires x
return y+d // requires y (and thus x)

When calculating y one has to wait for the result of x to come in first, there is a data dependency between them. See Instruction-level parallelism on Wikipedia.

OTHER TIPS

With unsigned int? It wouldn't. Integer operations can be reordered freely without any risk of affecting the result, so any half-decent compiler should generate the same code for both expressions, because they only mean something different when discussing floats.

If your compiler generates an intermediate SSA, it might come out looking like:

AB = a + b;
ABC = AB + c;
ABCD = ABC + d;

in the first case, and:

AB = a + b;
CD = c + d;
ABCD = AB + CD;

In case 1, each term includes the previous term, so even if the ALU is capable of adding multiple terms at once, it has to wait for the result of the previous operation to start the next. In case two, a processor like a modern x86 with multiple ALU pipelines can calculate AB and CD independently simultaneously.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top