HW 10 - Datapath, Pipeline, and Hazards
Due at 10:30pm on Monday, November 23, 2015
- Collaboration
- You must complete this assignment independently. You may not discuss your answers with any students in this or the other section of the course.
- Submitting your work
- Email me your answers with the subject
[CSC 211.01] Assignment 10
Problem 1
Consider a pipelined MIPS processor executing the instruction add $10, $1, $2
. Assume register $1
holds the value 5 and register $2
has the value 3.
- Part A
- Which pipeline registers hold the value 8, the result of the addition, at some point during the execution of this instruction?
- Part B
- Which pipeline registers hold the value of the
MemWrite
control line, which enables writes to data memory?
- Part C
- How many bubbles must be added to the pipeline before this processor can execute the next instruction,
add $1, $10, $3
, assuming this processor does not support forwarding?
- Part D
- How many bubbles must be added to the pipeline before this processor can execute the next instruction if the processor does support forwarding?
Problem 2
Each pipeline stage has some latency, and passing a value through each pipeline register adds some additional latency. Answer the following questions using the latency values below:
IF |
ID |
EX |
MEM |
WB |
Each Pipeline Register |
200ps |
150ps |
150ps |
300ps |
150ps |
20ps |
- Part A
- What is the latency of a
lw
instruction in a non-pipelined processor with these latencies?
- Part B
- What is the minimum clock cycle time for a non-pipelined processor with these latencies?
- Part C
- What is the latency of a
lw
instruction in a pipelined processor with these latencies?
- Part D
- What is the minimum clock cycle time for a pipelined processor with these latencies?
- Part E
- Ignoring hazards, what is the speed-up achieved by introducing pipelining?
- Part F
- If we could split one stage into two stages, each with half the latency of the original stage, which stage would you split?
- Part G
- What is the minimum clock cycle time for this new pipeline?
Problem 3
Consider the following sequence of instructions executed in a standard five-stage pipelined datapath:
lw $1, 40($6)
add $2, $3, $1
add $1, $2, $6
sw $2, 20($4)
and $1, $1, $4
- Part A
- Assuming there is no forwarding, manually insert the minimum number of
nop
instructions to eliminate all data hazards.
- Part B
- Draw a diagram of the execution of this instruction sequence with forwarding, similar to figure 4.59 in the text.
Problem 4 (Adapted from Patterson & Hennessy, exercise 4.24)
Consider the following sequences of branch outcomes. T means the branch is taken; N means the branch is not taken.
Sequence 1: T, T, T, N
Sequence 2: T, N, T, T, N
- Part A
- What is the accuracy of the always-taken branch predictor for each sequence?
- Part B
- What is the accuracy of the always-not-taken branch predictor for each sequence?
- Part C
- What is the accuracy of the 2-bit branch predictor for each sequence? Assume the predictor starts off in state 01 (weak not taken).
- Part D
- What is the accuracy of the 2-bit branch predictor for each sequence, if the sequence repeats thousands of times?
- Part E
- Write a new sequence of six branch outcomes that a 2-bit branch predictor will always predict correctly, assuming the predictor starts off in state 01.
- Part F
- Write a new sequence of six branch outcomes that a 2-bit branch predictor will always predict incorrectly, assuming the predictor starts off in state 01.
Problem 5 (Adapted from Patterson & Hennessy 4.28)
In this exercise we compare the performance of 1-issue and 2-issue processors, taking into account program transformations that can be made to optimize for 2-issue execution.
Consider the following loop:
loop:
lw $t0, 0($s0)
addi $s0, $s0, 4
sw $t0, 0($s0)
addi $s0, $s0, 4
bne $s0, $s1, loop
nop
- Part A
- Show the schedule for two iterations of this loop on the static two-issue datapath shown in figure 4.69 of the text, assuming the processor has perfect branch prediction and forwarding.
- Part B
- Unroll this loop so each iteration performs four iterations of the original loop and rearrange instructions to improve the performance on the same static two-issue datapath.
- Part C
- Draw a pipelined execution diagram (as in Problem 3B) for your revised code on a static two-issue datapath. Show two iterations of the unrolled loop, assuming the processor has perfect branch prediction and forwarding.
License & Acknowledgements
This work is derived from that of Janet Davis, and is licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.