Question

I want to confirm whether the following timing diagram is correct for a MIPS 5 stage pipeline ( * = stalls ):

               |  1 |  2 |  3 |  4 |  5 |  6 |  7 |  8 |  9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
LW R1, 0(R4)   | IF | ID | EX |  M |  W |    |    |    |    |    |    |    |    |    |    |    |
LW R2, 400(R4) |    | IF | ID | EX |  M |  W |    |    |    |    |    |    |    |    |    |    |
ADD1 R3,R1,R2  |    |    | IF | ID |  * |  * | EX |  M |  W |    |    |    |    |    |    |    |
SW R3, 0(R4)   |    |    |    | IF |  * |  * | ID |  * |  * | EX |  M |  W |    |    |    |    |
SUB R4,R4,#4   |    |    |    |    |    |    | IF |  * |  * | ID | EX |  M |  W |    |    |    |
BNEZ R4, L1    |    |    |    |    |    |    |    |    |    | IF | ID |  * |  * | EX |  M |  W |

Comments:

1) ADDI : ID is stalled 2 cycles so previous WB can complete

2) SW : 2 stalls because ID can't begin, 2 more stalls for ID for previous WB to complete

3) SUB : IF can't start till cycle # 7, ID has to wait till cycle 10 because of previous instruction

4) BNEZ : IF can't start till cycle # 10, 2 stalls for ID so previous WB can complete

Was it helpful?

Solution

If instruction results are only passed through the register file (no forwarding/bypassing) and registers are only read just before EX, then your diagram looks correct. (You might consider the empty stages before IF for SUB and BNEZ to be stall cycles since normally the next instruction's IF would immediately follow the IF stage of the previous instruction. On the other hand, that might be viewed as cluttering the diagram.)

However, a 5-stage pipeline is typically optimized to avoid most of the above stalls by forwarding results directly from the end of the EX stage (or the end of the M stage for loads) of the result-producing instruction to the beginning of the EX stage of the dependent instruction. (For store instructions, the value to be stored in memory might only be needed by the M or even the W stage, so a designer might consider adding forwarding for this case. With this simple pipeline, this would only matter for a pair of instructions providing a memory move since loads are the only instructions with a latency greater than one. For a two-wide superscalar, such could allow something like "ADD R3, R2, R1; SW R3, 0(R4);" to begin execution in the same cycle.)

With such an optimized pipeline, the ADD only has one stall cycle (after ID) by forwarding the result from the end of the M stage of "LW R2, 400(R4)" to the start of the ADD's EX.

               |  1 |  2 |  3 |  4 |  5 |  6 |  7 |  8 |  9 | 10 | 11 |
LW R1, 0(R4)   | IF | ID | EX |  M |  W |    |    |    |    |    |    |
LW R2, 400(R4) |    | IF | ID | EX |  M |  W |    |    |    |    |    |
ADD1 R3,R1,R2  |    |    | IF | ID |  * | EX |  M |  W |    |    |    |
SW R3, 0(R4)   |    |    |    | IF |  * | ID | EX |  M |  W |    |    |
SUB R4,R4,#4   |    |    |    |    |  * | IF | ID | EX |  M |  W |    |
BNEZ R4, L1    |    |    |    |    |    |    | IF | ID | EX |  M |  W |

Such optimizations add complexity to the design, but avoiding unnecessary stalls can noticeably improve performance.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top