
Adapting SRC instructions for Pipelined, Control Signals

<< SRC Exception Processing Mechanism, Pipelining, Pipeline Design
SRC, RTL, Data Dependence Distance, Forwarding, Compiler Solution to Hazards >>
Advanced Computer Architecture-CS501
Advanced Computer Architecture
Lecture 19
Reading Material
Vincent P. Heuring&Harry F. Jordan
Chapter 5
Computer Systems Design and Architecture
Pipelined Version of the SRC
Adapting SRC instructions for Pipelined Execution
Control Signals for Pipelined SRC
Pipelined Version of the SRC
In this lecture, a pipelined version of the SRC is presented. The SRC uses a five-stage
pipeline. Those five stages are given below:
1. Instruction Fetch
2. Instruction decode/operand fetch
3. ALU operation
4. Memory access
5. Register write
As shown in the next diagram, there are several registers between each stage.
After the instruction has been fetched, it is stored in IR2 and the incremented value of the
program counter is held in PC2. When the register values have been read, the first
register value is stored in X3, and the second register value is stored in Y3. IR3 holds the
opcode and ra. If it is a store to memory instruction, MD3 holds the register value to be
After the instruction has been executed in the ALU, the register Z4 holds the result. The
op-code and ra are passed on to IR4. During the write back stage, the register Z5 holds the
value to be stored back into the register, while the op-code and ra are passed into IR5.
There are also two separate memories and several multiplexers involved in the pipeline
operation. These will be shown at appropriate places in later figures.
The number after a particular register name indicates the stage where the value of this
register is used.
Page 208
Last Modified: 01-Nov-06
Advanced Computer Architecture-CS501
Adapting SRC Instructions for Pipelined Execution
As mentioned earlier, the SRC instructions fall into the following three categories:
1. ALU Instructions
2. Load/Store instructions
3. Branch Instructions
We will now discuss how to design a common pipeline for all three categories of
1. ALU instructions
ALU instructions are usually of the form:
op-code ra, rb, rc
op-code ra, rb, constant.
In the diagram shown, X3 and Y3 are temporary registers to hold the values between
pipeline stages. X3 is loaded with operand value from the register file. Y3 is loaded with
either a register value from the register file or a constant from the instruction. The
operands are then available to the ALU. The ALU function is determined by decoding the
op-code bits. The result of the ALU operation is stored in register Z4, and then stored in
the destination register in the register write back stage. There is no activity in the memory
access stage for ALU instructions. Note that Z5, IR3, IR4, and IR5 are not shown
Page 209
Last Modified: 01-Nov-06
Advanced Computer Architecture-CS501
explicitly in the figure. The purpose of not including these registers is to keep the
drawing simple. However, these registers will transfer values as instructions progress
through the pipeline. This comment also applies to some other figures in this discussion.
2. Load/Store instructions
Load/Store instructions are usually of the form:
op-code ra, constant(rb)
The instruction is loaded into IR2 and the incremented value of the PC is loaded in PC2.
In the next stage, X3 is loaded with the value in PC2 if the relative addressing mode is
used, or the value in rb if the displacement addressing mode is used. Similarly, C1 is
transferred to Y3 for the relative addressing mode, and c2 is transferred to Y3 for the
displacement addressing mode. The store instruction is completed once memory access
has been made and the memory location has been written to. The load instruction is
completed once the loaded value is transferred back to the register file. The following
figure shows the schematic for a load instruction. A similar schematic can be drawn for
the store instruction.
Page 210
Last Modified: 01-Nov-06
Advanced Computer Architecture-CS501
3. Branch Instructions
Branch Instructions usually involve calculating the target address and evaluating a
condition. The condition is evaluated based on the c2 field of the IR and by using the
value in R[rc]. If the condition is true, the PC is loaded with the value in R[rb], otherwise
it is incremented by 4 as usual. The following figure shows these details.
The complete pipelined data path
The pipelined data path implementation diagrams shown earlier for the three SRC
instruction categories must be combined and refined to get a working system. These
details get complicated very quickly. A detailed combined diagram is shown in Figure
5.7 of the text book.
Page 211
Last Modified: 01-Nov-06
Advanced Computer Architecture-CS501
Control Signals for the Pipelined SRC
We define the following signals for the SRC by grouping similar op-codes:
In most cases, the signals defined above are used in the same stage where they are
generated. If that is not the case, a number used after the signal name indicates the stage
where the signal is generated.
Using these definitions, we can develop RTL statements for describing the pipeline
activity as well as the equations for the multiplexer select signals for different stages of
the pipeline. This is shown in the next diagram.
Control Signals for different pipeline Stages
Consider the RTL description of the Mp1 signal, which controls the input to the PC. It
simply means that if the branch and cond signals are not activated, then the PC is
incremented by 4, otherwise if both are activated then the value of R1 is copied in to the
The multiplexer Mp2 is used to decide which registers are read from the register file. If
the store signal is activated then R[rb] from the instruction bits is read from the register
file so that its value may be stored into memory, otherwise R[rc] is read from the register
The multiplexer Mp3 is used to decide which registers are read from the register file for
operand 2. If either rl or branch is activated then the updated value of PC2 is transferred
to X3, otherwise if dsp or alu is activated, the value of R[ra] from the register file is
Page 212
Last Modified: 01-Nov-06
Advanced Computer Architecture-CS501
transferred to the x3. In the same way, multiplexer Mp4 is used to select an input from
In the same way, multiplexer Mp4 is used to select an input for Y3.
The multiplexer MP5 is used to decide which value is transferred to be written back to
the register file. If the load signal is activated data from memory is transferred to Z5,
however if the load signal is not activated then data from Z4 (which is the result of ALU)
is transferred to Z5 which is then written back to the register file.
Page 213
Last Modified: 01-Nov-06
Table of Contents:
  1. Computer Architecture, Organization and Design
  2. Foundations of Computer Architecture, RISC and CISC
  3. Measures of Performance SRC Features and Instruction Formats
  4. ISA, Instruction Formats, Coding and Hand Assembly
  5. Reverse Assembly, SRC in the form of RTL
  6. RTL to Describe the SRC, Register Transfer using Digital Logic Circuits
  7. Thinking Process for ISA Design
  8. Introduction to the ISA of the FALCON-A and Examples
  9. Behavioral Register Transfer Language for FALCON-A, The EAGLE
  10. The FALCON-E, Instruction Set Architecture Comparison
  11. CISC microprocessor:The Motorola MC68000, RISC Architecture:The SPARC
  12. Design Process, Uni-Bus implementation for the SRC, Structural RTL for the SRC instructions
  13. Structural RTL Description of the SRC and FALCON-A
  14. External FALCON-A CPU Interface
  15. Logic Design for the Uni-bus SRC, Control Signals Generation in SRC
  16. Control Unit, 2-Bus Implementation of the SRC Data Path
  17. 3-bus implementation for the SRC, Machine Exceptions, Reset
  18. SRC Exception Processing Mechanism, Pipelining, Pipeline Design
  19. Adapting SRC instructions for Pipelined, Control Signals
  20. SRC, RTL, Data Dependence Distance, Forwarding, Compiler Solution to Hazards
  21. Data Forwarding Hardware, Superscalar, VLIW Architecture
  22. Microprogramming, General Microcoded Controller, Horizontal and Vertical Schemes
  23. I/O Subsystems, Components, Memory Mapped vs Isolated, Serial and Parallel Transfers
  24. Designing Parallel Input Output Ports, SAD, NUXI, Address Decoder , Delay Interval
  25. Designing a Parallel Input Port, Memory Mapped Input Output Ports, wrap around, Data Bus Multiplexing
  26. Programmed Input Output for FALCON-A and SRC
  27. Programmed Input Output Driver for SRC, Input Output
  28. Comparison of Interrupt driven Input Output and Polling
  29. Preparing source files for FALSIM, FALCON-A assembly language techniques
  30. Nested Interrupts, Interrupt Mask, DMA
  31. Direct Memory Access - DMA
  32. Semiconductor Memory vs Hard Disk, Mechanical Delays and Flash Memory
  33. Hard Drive Technologies
  34. Arithmetic Logic Shift Unit - ALSU, Radix Conversion, Fixed Point Numbers
  35. Overflow, Implementations of the adder, Unsigned and Signed Multiplication
  36. NxN Crossbar Design for Barrel Rotator, IEEE Floating-Point, Addition, Subtraction, Multiplication, Division
  37. CPU to Memory Interface, Static RAM, One two Dimensional Memory Cells, Matrix and Tree Decoders
  38. Memory Modules, Read Only Memory, ROM, Cache
  39. Cache Organization and Functions, Cache Controller Logic, Cache Strategies
  40. Virtual Memory Organization
  41. DRAM, Pipelining, Pre-charging and Parallelism, Hit Rate and Miss Rate, Access Time, Cache
  42. Performance of I/O Subsystems, Server Utilization, Asynchronous I/O and operating system
  43. Difference between distributed computing and computer networks
  44. Physical Media, Shared Medium, Switched Medium, Network Topologies, Seven-layer OSI Model