質問

In the GCC (version 4.8.2) manual, the following is stated:

-ftree-loop-if-convert-stores:
Attempt to also if-convert conditional jumps containing memory writes. This transformation can be unsafe for multi-threaded programs as it transforms conditional memory writes into unconditional memory writes. For example,

   for (i = 0; i < N; i++)
      if (cond)
        A[i] = expr;

is transformed to

   for (i = 0; i < N; i++)
       A[i] = cond ? expr : A[i];

potentially producing data races.

I wonder, however, if there is a performance gain by using the operator? versus the if statement.

  • In the first piece of code, A[i] is set to expr only if the condition is met. If it is not met, then the code inside the statement is skipped.
  • In the second one, A[i] seems to be written regardless of the condition; the condition only affects the value it is set to.

By using operator?, we are also doing a check; however, we are adding some overhead in the case that the condition is not met. Have I missed something?

役に立ちましたか?

解決

What is says is that conditional jumps are converted to conditional move instructions, the cmove family of instructions. They improve speed because they do not stall the processor pipeline like jumps do.

With a jump instructions, you don't know in advanced which instructions to load, so a prediction is used and a branch is loaded in the pipeline. If the prediction was correct, all is well, the next instructions are already executing on the pipeline. However, after the jump is evaluated, if the prediction was wrong, all the following instructions already in the pipeline are useless, so the pipeline must be freed, and the correct instructions are loaded. Modern processors contain 16-30 stages of pipe, and a branch mispredictions degrade performance severely. Conditional moves bypass this because they do not insert branches in the program flow.

But does cmove always write?

From Intel x86 Instruction Set Reference:

The CMOVcc instructions check the state of one or more of the status flags in the EFLAGS register [..] and perform a move operation if the flags are in a specified state (or condition). [..] If the condition is not satisfied, a move is not performed and execution continues with the instruction following the CMOVcc instruction.

Edit

Upon further investigating gcc manual, I got confused, because as far as I know the compiler doesn't optimize transforming C code into another C code, but uses internal data structures like Control Flow Graphs so I really don't know what they mean with their example. I suppose they mean the C equivalent of the new flow generated. I am not sure anymore if this optimization is about generating cmoves.

Edit 2

Since cmove operates with registers and not memory, this

if (cond)
  A[i] = expr

cannot generate cmove.

However this

 A[i] = cond ? expr : A[i];

can.

Suppose we have in bx the expr value.

load A[i] into ax
cmp // cond
cmove ax, bx
store ax into &A[i]

So in order to use cmove you have to read A[i] value and write it back if cond if false, which is not equivalent with the if statement, but with the ternary operator.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top