How To Diagram The Pipelined Execution of A Code

https://stackoverflow.com/questions/19643871

01-07-2022
|

Question

I'm kind of clueless on this one. I'm in a Computer Architecture course and we are given the following assembly code:

      ADDI $S4, $zero, 3
loop: LW $S1, 0($S5)
      ADD   $S6, $S1, $S6
      SW $S6, 0($S5)
      ADDI $S5, $S5, 4
      ADDI $S4, $S4, -1
      BNZ $S4, loop

We are supposed to diagram the pipelined execution of the code using stalls to account for hazards, and then diagram it implementing forwarding. The diagram is a chart (columns labeled 1 2 3 4.... which I believe is the cycle; the rows are labeled as instruction, so I'm assuming each line of the code given).

So what I've gotten out of the pipelining is that it allows you to execute multiple instructions at a time. However, it will have to stall if a register is called before a previous instruction has written to that register. This follows IF->ID->EX->MEM->WB, where the register is read in the ID, and it is written to in the WB. So how I see this, is that each cycle, an instruction moves to the next step in that process, at the same time as having the next instruction execute. To me, that makes it sound like the code would have to keep stalling until the first instruction finishes writing to the register, and then the instruction set can continue. I have no idea if I am close to being correct on what I just said, nor do I know how to fill out a chart to show the information...

With that being said, looking at the code, I would think that there should be a stall on line 4 (SW $S6, 0($55)) because $S6 would still be on stage... EX?, which means it has yet to be written, and would have to stall once for that stage, once for MEM, and another for WB.

Any help on where I should be going with this would be much appreciated.

Thanks

Solution

As to your original question, you are right that there is no non-stalling forwarding option between the first lw and add - but you get what you need for add in MEM and can then forward; between that add and the following sw you get the result in EX and can then forward it with 1 stall. Etc.

As to your additional question in the comments, the loop execution stalls mostly in the lw stage as the loaded word is only available after WB; or for forwarding after MEM. The loop loads/stores 4 values; so instead of looping, rename registers and start your code with several consecutive lw into, say, $t0 - $tt2. Once a result has been written back, or can be forwarded, add it, and sw it as soon as available. So yes, your code will look much longer, but execute faster.

By the way, you seem to be using Patterson/Hennessy. There are very good diagrams in that book Illustrating this. Maybe have a look.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow