## Pipelining

 Question 1

Instruction execution in a processor is divided into 5 stages, Instruction Fetch (IF), Instruction Decode (ID), Operand Fetch (OF), Execute (EX), and Write Back (WB). These stages take 5, 4, 20, 10 and 3 nanoseconds (ns) respectively. A pipelined implementation of the processor requires buffering between each pair of consecutive stages with a delay of 2 ns. Two pipelined implementations of the processor are contemplated:

(i) a naive pipeline implementation (NP) with 5 stages and
(ii) an efficient pipeline (EP) where the OF stage is divided into stages OF1 and OF2 with execution times of 12 ns and 8 ns respectively.

The speedup (correct to two decimal places) achieved by EP over NP in executing 20 independent instructions with no hazards is _________.

 A 1.51 B 1.52 C 1.53 D 1.54
Computer-Organization       Pipelining       Gate 2017 set-01
Question 1 Explanation:
Naive Pipeline implementation:
The stage delays are 5, 4, 20, 10 and 3. And buffer delay = 2ns
So clock cycle time = max of stage delays + buffer delay
= max(5, 4, 20, 10,3)+2
= 20+2
= 22ns
Execution time for n-instructions in a pipeline with k-stages = (k+n-1) clock cycles
= (k+n-1)* clock cycle time
In this case execution time for 20 instructions in the pipeline with 5-stages
= (5+20-1)*22ns
= 24*22
= 528ns
Efficient Pipeline implementation:
OF phase is split into two stages OF1, OF2 with execution times of 12ns, 8ns
New stage delays in this case = 5, 4, 12, 8, 10, 3
Buffer delay is the same 2ns.
So clock cycle time = max of stage delays + buffer delay
= max(5, 4, 12, 8, 10,3) + 2
= 12+2
= 14ns
Execution time = (k+n-1) clock cycles
= (k+n-1)* clock cycle time
In this case no. of pipeline stages, k = 6
No. of instructions = 20
Execution time = (6+20-1)*14 = 25*14 = 350ns
Speed up of Efficient pipeline over native pipeline
= Naive pipeline execution time / efficient pipeline execution time
= 528 / 350
≌ 1.51
 Question 2

The stage delays in a 4-stage pipeline are 800, 500, 400 and 300 picoseconds. The ﬁrst stage (with delay 800 picoseconds) is replaced with a functionally equivalent design involving two stages with respective delays 600 and 350 picoseconds. The throughput increase of the pipeline is ________ percent.

 A 33.33% B 33.34% C 33.35% D 33.36%
Computer-Organization       Pipelining       2016 set-01
Question 2 Explanation:
In a pipelined processor the throughput is 1/clock cycle time.
Cycle time = max of all stage delays.
In the first case max stage delay = 800.
So throughput = 1/800 initially.
After replacing this stage with two stages of delays 600, 350... the cycle time = maximum stage delay = 600.
So the new throughput = 1/600.
The new throughput > old throughput.
And the increase in throughput = 1/600 - 1/800.
We calculate the percentage increase in throughput w.r.t initial throughput, so the % increase in throughput
= (1/600 - 1/800) / (1/800) * 100
= ((800 / 600) - 1) * 100
= ((8/6) -1) * 100
= 33.33%
 Question 3

Consider a 3 GHz (gigahertz) processor with a three-stage pipeline and stage latencies τ1, τ2, τ3 and such that τ = 3τ2/4 = 2τ3. If the longest pipeline stage is split into two pipeline stages of equal latency, the new frequency is _________ GHz, ignoring delays in the pipeline registers.

 A 4 B 5 C 6 D 7
Computer-Organization       Pipelining       GATE 2016 set-2
Question 3 Explanation:
Given 3 stage pipeline, with 3 GHz processor.
Given ,τ1 = 3 τ2/4 = 2 τ3
Put τ1 = 6t, we get τ2 = 8t, τ3 = 3t
Now largest stage time is 8t.
So, frequency is 1/8t
⇒ 1/8t = 3 GHz
⇒ 1/t = 24 GHz
From the given 3 stages, τ 1 = 6t, τ 2 = 8t and τ 3 = 3t
So, τ 2 > τ1 > τ3.
The longest stage is τ2 = 8t and we will split that into two stages of 4t & 4t.
New processor has 4 stages - 6t, 4t, 4t, 3t.
Now largest stage time is 6t.
So, new frequency is = 1/6t
We can substitute 24 in place of 1/t, which gives the new frequency as 24/6 = 4 GHz
 Question 4
 A 11 B 12 C 13 D 14
Computer-Organization       Pipelining       GATE 2015 -(Set-2)
Question 4 Explanation:
I ⇒ Instruction Fetch and Decode
O ⇒ Operand Fetch
P ⇒ Perform the operation
W ⇒ write back the result Question 5
 A 4 B 5 C 6 D 7
Computer-Organization       Pipelining       GATE 2015(Set-03)
Question 5 Explanation:
Minimum average latency is based on an advanced concept in pipelining.
S1 is needed at time 1 and 5, so its forbidden latency is 5-1 = 4.
S2 is needed at time 2 and 4, so its forbidden latency is 4-2 = 2.
So, forbidden latency = (2,4,0) (0 by default is forbidden)
Allowed latency = (1,3,5) (any value more than 5 also).
Collision vector (4,3,2,1,0) = 10101 which is the initial state as well.
From initial state we can have a transition after "1" or "3" cycles and we reach new states with collision vectors
(10101 >> 1 + 10101 = 11111) and (10101 >> 3 + 10101 = 10111) respectively.
These 2 becomes states 2 and 3 respectively.
For "5" cycles we come back to state 1 itself.
From state 2 (11111), the new collision vector is 11111.
We can have a transition only when we see first 0 from right.
So, here it happens on 5th cycle only which goes to initial state. (Any transition after 5 or more cycles goes to initial state as we have 5 time slices).
From state 3 (10111), the new collision vector is 10111.
So, we can have a transition on 3, which will give (10111 >> 3 + 10101 = 10111) third state itself. For 5, we get the initial state.
Thus all the transitions are complete. State\Time 1 3 5 1 (10101) 2 3 1 2 (11111) - - 1 3 (10111) - 3 1 So, minimum length cycle is of length 3 either from 3-3 or from 1-3. So the minimum average latency is also 3.
 Question 6
 A Only S1 is true B Only S2 is true C Only S1 and S3 are true D Only S2 and S3 are true
Computer-Organization       Pipelining       GATE 2015(Set-03)
Question 6 Explanation:
S1: False. Antidependency means WAR dependency. There is no WAR dependency between I2 and I5.
S2: True. There is WAR dependency between I2 and I4.
S3: False. Because WAR or antidependency can be resolved by register renaming.
 Question 7
Consider a 6-stage instruction pipeline, where all stages are perfectly balanced. Assume that there is no cycle-time overhead of pipelining. When an application is executing on this 6-stage pipeline, the speedup achieved with respect to non-pipelined execution if 25% of the instructions incur 2 pipeline stall cycles is ______________________.
 A 4 B 5 C 6 D 7
Data-Structures       Pipelining       GATE 2014(Set-01)
Question 7 Explanation:
For 6 stages, non- pipelining takes 6 cycles.
There were 2 stall cycles for pipelining for 25% of the instructions.
So pipeline time =(1+(25/100)*2)=3/2=1.5
Speed up =Non-pipeline time / Pipeline time=6/1.5=4
 Question 8
 A 13 B 15 C 17 D 19
Computer-Organization       Pipelining       2010
Question 8 Explanation:
It is given that there is operand forwarding. In the case of operand forwarding the updated value from previous instruction’s PO stage is forwarded to the present instruction’s PO stage. Here there’s RAW dependency between I1-I2 for R5 and between I2-I3 for R2. These dependencies are resolved by using operand forwarding as shown in the below timeline diagram. The total number of clock cycles needed is 15. Question 9
 A 16 B 23 C 28 D 30
Computer-Organization       Pipelining       2009
Question 9 Explanation: Question 10
 A I and II only B I and III only C II and III only D I, II and III
Computer-Organization       Pipelining       Gate-2008
Question 10 Explanation:
I. False. Bypassing can't handle all RAW hazard.
II. True. Register renaming can eliminate all WAR Hazard as well as WAW hazard.
III. If this statement would have said that
"Control hazard penalties can be completely eliminated by dynamic branch prediction", then it is false. But it is only given that "Control hazard penalties can be eliminated by dynamic branch prediction". So, it is true.
Hence, none of the given Option is Correct.
 Question 11
 A The instruction following the conditional branch instruction in memory is executed. B The first instruction in the fall through path is executed. C The first instruction in the taken path is executed. D The branch takes longer to execute than any other instruction.
Computer-Organization       Pipelining       Gate-2008
Question 11 Explanation:
In order to avoid the pipeline delay due to conditional branch instruction, a suitable instruction is placed below the conditional branch instruction such that the instruction will be executed irrespective of whether branch is taken or not and won't affect the program behaviour. Hence option A is the answer.
 Question 12
 A I1 B I2 C I3 D I4
Computer-Organization       Pipelining       Gate-2008
Question 12 Explanation:
It is the method to maximize the use of the pipeline by finding and executing an instruction that can be safely executed whether the branch is taken or not. So, when a branch instruction is encountered, the hardware puts the instruction following the branch into the pipe and begins executing it. Here we do not need to worry about whether the branch is taken or not, as we do not need to clear the pipe because no matter whether the branch is taken or not, we know the instruction is safe to execute.
From the given set of instructions I3 is updating R1, and the branch condition is based on the value of R1 so I3 can’t be executed in the delay slot.
Instruction I1 is updating the value of R2 and R2 is used in I3. So I1 also can’t be executed in the delay slot.
Instruction I2 is updating R4, and at the memory location represented by R4 the value of R1 is stored. So if I2 is executed in the delay slot then the memory location where R1 is to be stored as part of I4 will be in a wrong place. Hence between I2 and I4, I2 can’t be executed after I4. Hence I2 can’t be executed in the delay slot.
Instruction I4 can be executed in the delay slot as this is storing the value of R1 in a memory location and executing this in the delay slot will have no effect. Hence option D is the answer.
 Question 13

A non pipelined single cycle processor operating at 100 MHz is converted into a synchro­nous pipelined processor with five stages requiring 2.5 nsec, 1.5 nsec, 2 nsec, 1.5 nsec and 2.5 nsec, respectively. The delay of the latches is 0.5 nsec. The speedup of the pipeline processor for a large number of instructions is

 A 4.5 B 4 C 3.33 D 3
Computer-Organization       Pipelining       Gate 2008-IT
Question 13 Explanation:
For non-pipelined system time required = 2.5 + 1.5 + 2.0 + 1.5 + 2.5 = 10
For pipelined system = Max(stage delay) + Max(latch delay) = 2.5 + 0.5 = 3.0
Speedup = Time in non-pipelined system/Time in pipelined system = 10/3 = 3.33
 Question 14
 A 7 B 8 C 10 D 14
Computer-Organization       Pipelining       Gate-2007
Question 14 Explanation:
Since operand forwarding is there, by default we consider the operand forwarding from EX stage to EX stage. So, total no. of clock cycles needed to execute the given 3 instructions is 8.
 Question 15
A CPU has a five-stage pipeline and runs at 1 GHz frequency. Instruction fetch happens in the first stage of the pipeline. A conditional branch instruction computes the target address and evaluates the condition in the third stage of the pipeline. The processor stops fetching new instructions following a conditional branch until the branch outcome is known. A program executes 109 instructions out of which 20% are conditional branches. If each instruction takes one cycle to complete on average, the total execution time of the program is:
 A 1.0 second B 1.2 seconds C 1.4 seconds D 1.6 seconds
Computer-Organization       Pipelining       Gate-2006
Question 15 Explanation:
No. of total instructions = 109
20% are condition branches out of 109
⇒ 20/100 × 109
⇒ 2 × 108
In third stage of pipeline it consists of 2 stage cycles.
Total cycle penalty = 2 × 2 × 108 = 4 × 108
Clock speed = 1 GHz
Each Instruction takes 1 cycle i.e., 109 instructions.
Total execution time of a program is
= (109 / 109) +((4× 108) / 109) = 1+0.4 = 1.4seconds
 Question 16
 A 8 B 10 C 12 D 15
Computer-Organization       Pipelining       Gate-2005
Question 16 Explanation:
From memory stage we are using operator forwarding: If we don't use operator forwarding: Total clock cycles = 8/11
There is no '11' in option.
Then no. of cycles = 8
 Question 17

A 4-stage pipeline has the stage delays as 150, 120, 160 and 140 nanoseconds respectively. Registers that are used between the stages have a delay of 5 nanoseconds each. Assuming constant clocking rate, the total time taken to process 1000 data items on this pipeline will be

 A 120.4 microseconds B 160.5 microseconds C 165.5 microseconds D 590.0 microseconds
Computer-Organization       Pipelining       Gate-2004
Question 17 Explanation:
First instruction will take complete four cycle for execution. And then after that all 999 instruction will take only 1 cycle for execution to be completed. So time required to process 1000 instruction or data items is,
1st instruction × 4 × clock time + 999 instruction × 1 × clock time
1 × 4 × 165ns + 999 × 1 × 165ns
= 1654.95ns
= 165.5μs
 Question 18
 A I and II only B II and III only C III only D All the three
Computer-Organization       Pipelining       Gate-2003
Question 18 Explanation:
I is belongs to the Data hazard.
II is belongs to the Control hazard.
III is belongs to the Structural hazard.
→ Hazards are the problems with the instruction pipeline in CPU micro architectures.
 Question 19
The performance of a pipelined processor suffers if
 A the pipeline stages have different delays B consecutive instructions are dependent on each other C the pipeline stages share hardware resources D All of the above
Computer-Organization       Pipelining       Gate-2002
Question 19 Explanation:
To speedup from pipelining equals the number of pipe stages are involve. Usually, however, the stages will not be perfectly balanced; besides, the pipelining itself involves some overhead.
If pipeline stages can’t have different delays, no dependency among consecutive instructions and sharing of hardware resources should not be there.
 Question 20
Comparing the time T1 taken for a single instruction on a pipelined CPU with time T2 taken on a non-pipelined but identical CPU, we can say that
 A T1 ≤ T2 B T1 ≥ T2 C T1 < T2 D T1 is T2 plus the time taken for one instruction fetch cycle
Computer-Organization       Pipelining       Gate-2000
Question 20 Explanation:
PIPELINING SYSTEM:
Pipelining is an implementation technique where multiple instructions are overlapped in execution. It has a high throughput (amount of instructions executed per unit time). In pipelining, many instructions are executed at the same time and execution is completed in fewer cycles. The pipeline is filled by the CPU scheduler from a pool of work which is waiting to occur. Each execution unit has a pipeline associated with it, so as to have work pre-planned. The efficiency of pipelining system depends upon the effectiveness of CPU scheduler.
NON- PIPELINING SYSTEM:
All the actions (fetching, decoding, executing of instructions and writing the results into the memory) are grouped into a single step. It has a low throughput.
Only one instruction is executed per unit time and execution process requires more number of cycles. The CPU scheduler in the case of non-pipelining system merely chooses from the pool of waiting work when an execution unit gives a signal that it is free. It is not dependent on CPU scheduler.
 Question 21
 A Theory Explanation is given below.
Computer-Organization       Pipelining       Gate-2001
There are 21 questions to complete.