While Pentium 4 is the only generation which actually respects the branch-hint instructions, most CPUs do have some form of static branch prediction, which can be used to achieve the same effect. This answer is a bit tangential to the original question, but I think this would be valuable information to anyone who comes to this page.
The Intel optimisation guide and Agner Fog's guide (which have been mentioned here already) both have excellent descriptions of this feature.
Intel has this to say about generations newer than Core 2:
Make the fall-through code following a conditional branch be the likely target for a branch with a forward target
So conditional branches which jump forward in the code are predicted to be not-taken, by the static prediction algorithm.
This is consistent with what GCC seems to have generated using __builtin_expect
: the 'expected' return 1
/ return 2
code is placed in the not-taken paths from the conditional branches, which will be statically predicted as not-taken.
Additionally:
Branches that do not have a history in the Branch Target Buffer are predicted using a static prediction algorithm:
So in the 'expected' not-taken paths where GCC has placed unconditional jmp
s to the end of the function, those jumps will be statically predicted as taken (i.e. not skipped).
Intel also says:
make the fall-through code following a conditional branch be the unlikely target for a branch with a backward target
So conditional branches which jump backwards in the code are predicted to be taken, by the static prediction algorithm.
According to Agner Fog, most Pentiums also follow this algorithm:
On PPro, P2, P3, P4 and P4E, a control transfer instruction which has not been seen before, or which is not in the Branch Target Buffer, is predicted to fall through if it goes forwards, and to be taken if it goes backwards (e.g. a loop). Static prediction takes longer time than dynamic prediction on these processors.
However, the Core 2 family (and Pentium M) has a completely different policy:
These processors do not use static prediction. The predictor simply makes a random prediction the first time a branch is seen, depending on what happens to be in the BTB entry that is assigned to the new branch. There is simply a 50% chance of making the right prediction of jump or no jump, but the predicted target is correct.
As do AMD processors apparently:
A branch is predicted not taken the first time it is seen. A branch is predicted always taken after the first time it has been taken. Dynamic prediction is used only after a branch has been taken and then not taken. Branch hint prefixes have no effect.
There is one additional factor to consider: CPUs generally like to execute in a linear fashion, so even correctly-predicted taken branches are often more expensive than correctly-predicted not-taken branches.