HW 10 - Datapath, Pipeline, and Hazards
Due at 10:30pm on Monday, November 23, 2015

Collaboration
You must complete this assignment independently. You may not discuss your answers with any students in this or the other section of the course.
Submitting your work
Email me your answers with the subject [CSC 211.01] Assignment 10

Problem 1

Consider a pipelined MIPS processor executing the instruction add $10, $1, $2. Assume register $1 holds the value 5 and register $2 has the value 3.

Part A
Which pipeline registers hold the value 8, the result of the addition, at some point during the execution of this instruction?
Part B
Which pipeline registers hold the value of the MemWrite control line, which enables writes to data memory?
Part C
How many bubbles must be added to the pipeline before this processor can execute the next instruction, add $1, $10, $3, assuming this processor does not support forwarding?
Part D
How many bubbles must be added to the pipeline before this processor can execute the next instruction if the processor does support forwarding?

Problem 2

Each pipeline stage has some latency, and passing a value through each pipeline register adds some additional latency. Answer the following questions using the latency values below:

IF ID EX MEM WB Each Pipeline Register
200ps 150ps 150ps 300ps 150ps 20ps
Part A
What is the latency of a lw instruction in a non-pipelined processor with these latencies?
Part B
What is the minimum clock cycle time for a non-pipelined processor with these latencies?
Part C
What is the latency of a lw instruction in a pipelined processor with these latencies?
Part D
What is the minimum clock cycle time for a pipelined processor with these latencies?
Part E
Ignoring hazards, what is the speed-up achieved by introducing pipelining?
Part F
If we could split one stage into two stages, each with half the latency of the original stage, which stage would you split?
Part G
What is the minimum clock cycle time for this new pipeline?

Problem 3

Consider the following sequence of instructions executed in a standard five-stage pipelined datapath:

lw  $1, 40($6)
add $2, $3, $1
add $1, $2, $6
sw  $2, 20($4)
and $1, $1, $4
Part A
Assuming there is no forwarding, manually insert the minimum number of nop instructions to eliminate all data hazards.
Part B
Draw a diagram of the execution of this instruction sequence with forwarding, similar to figure 4.59 in the text.

Problem 4 (Adapted from Patterson & Hennessy, exercise 4.24)

Consider the following sequences of branch outcomes. T means the branch is taken; N means the branch is not taken.

Sequence 1: T, T, T, N

Sequence 2: T, N, T, T, N

Part A
What is the accuracy of the always-taken branch predictor for each sequence?
Part B
What is the accuracy of the always-not-taken branch predictor for each sequence?
Part C
What is the accuracy of the 2-bit branch predictor for each sequence? Assume the predictor starts off in state 01 (weak not taken).
Part D
What is the accuracy of the 2-bit branch predictor for each sequence, if the sequence repeats thousands of times?
Part E
Write a new sequence of six branch outcomes that a 2-bit branch predictor will always predict correctly, assuming the predictor starts off in state 01.
Part F
Write a new sequence of six branch outcomes that a 2-bit branch predictor will always predict incorrectly, assuming the predictor starts off in state 01.

Problem 5 (Adapted from Patterson & Hennessy 4.28)

In this exercise we compare the performance of 1-issue and 2-issue processors, taking into account program transformations that can be made to optimize for 2-issue execution.

Consider the following loop:

loop:
  lw   $t0, 0($s0)
  addi $s0, $s0, 4
  sw   $t0, 0($s0)
  addi $s0, $s0, 4
  bne  $s0, $s1, loop
  nop
Part A
Show the schedule for two iterations of this loop on the static two-issue datapath shown in figure 4.69 of the text, assuming the processor has perfect branch prediction and forwarding.
Part B
Unroll this loop so each iteration performs four iterations of the original loop and rearrange instructions to improve the performance on the same static two-issue datapath.
Part C
Draw a pipelined execution diagram (as in Problem 3B) for your revised code on a static two-issue datapath. Show two iterations of the unrolled loop, assuming the processor has perfect branch prediction and forwarding.

License & Acknowledgements

This work is derived from that of Janet Davis, and is licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.