Which kind of branch prediction is more important?

https://cs.stackexchange.com/questions/73

16-10-2019
|

Question

I have observed that there are two different types of states in branch prediction.

In superscalar execution, where the branch prediction is very important, and it is mainly in execution delay rather than fetch delay.
In the instruction pipeline, where the fetching is more problem since the instructions do not actually get executed till later.

Which of these is very important (as in which of these really matters in the CPU now-a-days)? If both are equally important or in case the second one is more important then Why do we not have two instruction pipeline (probably of half the length ) and then depending on the branches, just choose one of them and then again start the population from the beginning?

Solution

I don't know which case is prevalent, but I would like to offer some thoughts on your proposal of double pipelines.

First of all, you would need double the wire which would consume double the energy and produce double the heat while active. While not needed, on the other hand, it would be completely useless. So one could argue that it is not a good use of resources that are rare on modern processors.

More fundamentally, unless you prefer one branch based on probability, how do you know which version to write through? If none, you might not save anything because other processors might have to wait on your decision, anyways. If you stick with one of them, you have essentially the same rollback probability and cost as you have now.

Let us do a rough calculation. For sake of simplicity, let us assume that handling two pipelines instead of one does not cause additional management overhead. The expected cost (e.g. energy, heat) is $C = c + p(c_r + c)$ with one pipeline ($c$ the cost of executing either alternative, $p$ the probability for rollback and $c_r$ the cost for rollback without the cost for executing the other alternative) but either $2c$ or even $2c + pc_r$ -- either is a lot larger than $C$ if $p$ and $c_r$ are relatively small, and $p$ certainly is (as far as I know, modern branch predictions have accuracies over 90%). And we don't get much for this cost! Expected execution times are $t + p(t_r + t)$ with one pipeline and $t$ resp. $t + pt_r$ with two; as $p$ is small, time savings are negligible.

OTHER TIPS

In one sense, the effect of branch prediction is more critical in the fetching of instructions since an instruction which is not fetched cannot be executed.

With respect to executing both paths of a branch, this is called eager execution and has been researched somewhat substantially. Augustus K. Uht and Vijay Sindagi's "Disjoint Eager Execution: An Optimal Form of Speculative Execution" (1995) might be a worth a look.

Eager execution has several problems. For deep speculation, the number of paths that must be tracked can grow exponentially (each forked branch path may encounter a branch). Branch prediction is also often very accurate (>90% correct), so always executing both paths would be wasteful. Eager execution can also "contaminate" caches with useless content. (The above mentioned paper proposed intelligently limited eager execution to avoid some of these issues.) Limited eager fetching the alternate path has fewer issues and can be somewhat attractive in reducing misprediction recovery delay in shorter pipelines.

Another approach that has been proposed is dynamically predicating "hammock" branches (short forward branches that join back to the main path of instruction flow). Artur Klauser et al.'s "Dynamic Hammock Predication for Non-predicated Instruction Set Architectures" (1998) might be worth a read for that idea. (Hyesoon Kim et al.'s "Wish Branches: Combining Conditional Branching and Predication for Adaptive Predicated Execution" proposes adding to an ISA branches which facilitate predication of hammocks and extends this method of predication to hard-to-predict loop branches.)

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange