Branch Predicting [duplicate]

Question 1

It's for C. I'm asking for ASM

Which means that you'll have to implement the semantics of __builtin_expect yourself. It isn't difficult, just awkward. If the branch predictor doesn't have history for the branch then it will assume that the branch is taken when the branch is backwards and not taken when it is forward.

So you may have to restructure your code, using the opposite condition in the branch instruction and move code to accommodate that. An unconditional forward jump with a conditional backward branch is common, the way a C compiler implements a for() loop for example.

Question 2

"In the event that a branch hint is necessary, the following instruction prefixes can be added before a branch instruction to change the way the static predictor behaves: 0x3E – statically predict a branch as taken 0x2E – statically predict a branch as not taken"

http://software.intel.com/en-us/articles/branch-and-loop-reorganization-to-prevent-mispredicts

Question 3

The simplest heuristic is to use the branch instruction which matches the most common path. If, for example, you are coding a loop it is usually best to have the branch instruction follow the looping path rather than the exit condition. The goal is to use the branch instruction which evaluates to true most frequently.

If you choose exactly the wrong instruction, let's say you code a loop that almost always exits immediately, the processor may cache that the sense of this branch is backwards and adjust it's prediction. This is only likely to happen for heavily used code though.

Question 4

Some important assembly branch prediction optimization, Source: wiki

There is a one-cycle stall in instruction fetch when a branch is predicted taken. It is therefore referable to structure code so that the most likely code path is the one where the branch is not taken.

Unconditional branches may be mispredicted, and load instructions which follow branches may be decoded and cause cache accesses. Avoid placing load instructions after branches unless you intend the CPU to prefetch the addresses they reference.

Although instructions are decoded in pairs, the branch predictor can only predict one branch target per cycle. If you have a conditional branch which is immediately followed by another branch, and the first branch is likely to be taken, place a no-operation instruction between the branches to prevent decoding and prediction of the second branch. Inserting NOPs is detrimental if the first branch is infrequently taken.