Question

Increasing the size of a branch prediction table implies that the two branches in a program are less likely to share a common predictor. A single predictor predicting a single branch instruction is generally more accurate than is the same predictor serving more than one branch instruction.

List a sequence of branch taken and not-taken actions to show a simple example of a 2-bit predictor sharing (several different branch instructions are mapped into the same entry of the prediction table) that reduces the branch misprediction rate, compared to the situation where separate predictor entries are used for each branch. (Note: Be sure to show the outcomes of two different branch instructions and specifically indicate the order of these outcomes and which branch they correspond to)

Can someone explain to me what this question is asking for specifically? Also, what does "2-bit predictor sharing (several different branch instructions are mapped into the same entry of the prediction table)" and "separate predictor entries are used for each branch" mean? I've been reading and rereading my notes but I couldn't figure it out. I tried to find some branch prediction examples online but couldn't come across any.

Was it helpful?

Solution

"2-bit predictor" could be referring to either of two things, but much more likely one than the other.

The unlikely possibility is that they mean a branch table with only four entries, so two bits are used to associated a particular branch with an entry in the table. That's unlikely because a 4-entry table is so small that lots of branches would share the same table entries, so the branch predictor wouldn't be much more accurate than static branch prediction (e.g., always predicting backward branches as taken, since they're typically used to form loops).

The much more like possibility is using two bits to indicate whether a branch is likely to be taken or not. Some of the earliest microprocessors that included branch prediction (e.g., Pentium, PowerPC 604) worked roughly this way. The basic idea is that you keep a two-bit saturating counter, and make a prediction based on its current state. Intel called the states strongly not taken, weakly not taken, weakly taken, strongly taken. These would be numbered as (say) 0, 1, 2 and 3, so you can use a two-bit counter to track the states. Every time a branch is taken, you increment the number (unless it's already 3) and every time it's not taken, you decrement it (again, unless it's already 0). When you need to predict a branch if the counter is 0 or 1 you predict the branch not taken, and if it's 2 or 3 you predict it taken1.

A separate predictor entry used for each branch means each branch instruction in the program has its own entry in the branch prediction table. The alternative is some sort of mapping from branch instructions to table entries. For example, if you had a table with 220 entries, you could use 20 bits from a branch instruction's address, and use those bits as the index into the table. Assuming a machine with 32-bit addressing, and 32-bit instructions, you'd have up to 1024 branch instructions that could map to any one entry in the table (32-20-2 = 10, 210 = 1024). In reality you expect only a small percentage of instructions to be branches, some of the address space to be used for data, etc., so probably only a few branches would map to one entry in the table.

As far as the basic question of what it's asking for: they want a sequence of branch instructions that will (by what coincidence) be predicted more accurately when two branches map to the same slot in the branch predictor table than when/if each maps to a separate slot in the table. To go into just slightly more detail (but hopefully without giving away the whole puzzle), start with a pattern of branches where the branch predictor will usually be wrong. What the predictor basically does is assume that if the branch was taken the last time, that indicates that it's more likely to be taken this time (and conversely, if it wasn't taken last time, it probably won't be this time either).

So, you start with a pattern of branches exactly the opposite of that. Then, you want to add a second branch mapping to the same spot in the branch prediction table that will follow a pattern of branches that will adjust the data in the branch predictor table so that it more accurately reflects the upcoming branch rather than the previous branch.


1Technically, the Pentium didn't actually work this way, but it's how it was documented to work, and probably intended to work; the discrepancy in how it actually did work seems to have been a bug.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top