ZeePedia buy college essays online

Advance Computer Architecture

<<< Previous I/O Subsystems, Components, Memory Mapped vs Isolated, Serial and Parallel Transfers Next >>>
Advanced Computer Architecture-CS501
Advanced Computer Architecture
Lecture No. 23
Reading Material
Vincent P. Heuring & Harry F. Jordan
Chapter 8
Computer Systems Design and Architecture
8.1, 8.2
Introduction to I/O Subsystems
Major Components of an I/O Subsystems
Computer Interface
Memory Mapped I/O versus Isolated I/O
Considerations during I/O Subsystem Design
Serial and Parallel Transfers
I/O Buses
Introduction to I/O Subsystems
This module is about the computer's input and output. As we have seen in the case of
memory subsystems, that when we use the terms " read" and "write", then these terms
are from the CPU's point of view. Similarly, when we use the terms "input" and "output"
then these are also from the CPU's point of view. It means that when we are talking about
an input cycle, then the CPU is receiving data from a peripheral device and the peripheral
device is providing data. Similarly, when we talk about an output cycle then the CPU is
sending data to a peripheral device and the peripheral device is receiving data. I/O
Subsystems are similar to memory subsystems in many aspects. For example, both
exchange bits or bytes. This transfer is usually controlled by the CPU. The CPU sends
address information to the memory and the I/O subsystems. Then these subsystems
decode the address and decide which device should be involved in the transfer. Finally
the appropriate data is exchanged between the CPU and the memory or the I/O device.
Memory and I/O subsystems differ in the following ways:
1. Wider range of data transfer speed:
I/O devices can be very slow such as a keyboard in which case the interval between
two successive bytes (or keystrokes) can be in seconds. On the other extreme, I/O
devices can be very fast such as a disk drive sending data to the CPU or a stream of
packets arriving over a network, in which case the interval between two successive
bytes can be in microseconds or even nanoseconds. While I/O devices can have such
a wide range of data transfer speed compared to the CPU's speed, the case of memory
devices is not so. Even if a memory device is slow compared to the CPU, the CPU's
speed can be made compatible by inserting wait states in the bus cycle.
Page 234
Last Modified: 01-Nov-06
Advanced Computer Architecture-CS501
2. Asynchronous activity:
Memory subsystems are almost always synchronous. This means that most memory
transfers are governed by the CPU's clock. Generally this is not the case with I/O
subsystems. Additional signals, called handshaking signals, are needed to take care of
asynchronous I/O transfers.
3. Larger degradation in data quality:
Data transferred by I/O subsystems can carry more noise. As an example, telephone
line noise can become part of the data transferred by a modem. Errors caused by
media defects on hard drives can corrupt the data. This implies that effective error
detection and correction techniques must be used with I/O subsystems.
4. Mechanical nature of many I/O devices:
Many I/O devices or a large portion of I/O devices use mechanical parts which
inherently have a high failure rate. In case an I/O device fails, interruptions in data
transfer will occur, reducing the throughput. As an example, if a printer runs out of
paper, then additional bytes cannot be sent to it. The CPU's data should be buffered
(or kept in a temporary place) till the paper is supplied to the printer, otherwise the
CPU will not be able to do anything else during this time.
To deal with these differences, special software programs called device drivers are made
a part of the operating system. In most cases, device drivers are written in assembly
You would recall that in case of memory subsystems, each location uses a unique address
from the CPU's address space. This is generally not the case with I/O devices. In most
cases, a group or block of contiguous addresses is assigned to an I/O device, and data is
exchanged byte-by-byte. Internal buffers (memory) within the device store this data if
In the past, people have paid a lot of attention to improve the CPU's performance, as a
result of which the performance improvement of I/O subsystems was ignored. (I/O
subsystems were even called the "orphans" of computer architecture by some people).
Perhaps, many benchmark programs and metrics that were developed to evaluate
computer systems focused on the CPU or the memory performance only. Performance of
I/O subsystems is as important as that of the CPU or the memory, especially in today's
world. For example, the transaction processing systems used in airline reservation
systems or the automated teller machines in banks have a very heavy I/O traffic,
requiring improved I/O performance. To illustrate this point, look at the following
Suppose that a certain program takes 200 seconds of elapsed time to execute. Out of
these 200 seconds, 180 seconds is the CPU time and the rest is I/O time. If the CPU
performance improves by 40% every year for the next seven years because of
developments in technology, but the I/O performance stays the same, let us look at the
following table, which shows the situation at the end of each year. Remember that
Elapsed time = CPU time + I/O time.
This gives us the I/O time = 200 ­ 180 = 20 seconds at the beginning, which is 10 % of
the elapsed time.
Page 235
Last Modified: 01-Nov-06
Advanced Computer Architecture-CS501
Year # CPU  I/O  Elapsed
I/O Time x100 %
Elapsed Time
Time Time  Time
10 %
13.42 %
17.85 %
23.25 %
29.85 %
37.03 %
45.45 %
54.05 %
It can be easily seen that over seven years, the I/O time will become more than 50 % of
the total time under these conditions. Therefore, the improvement of I/O performance is
as important as the improvement of CPU performance. I/O performance will also be
discussed in detail in a later section.
Major components of an I/O
I/O subsystems have two major parts:
The I/O interface, which is the
electronic  circuitry  that  connects
the CPU to the I/O device.
Peripherals,  which  are  the
devices used to communicate with the
keyboard, the monitor, etc.
Computer Interface
A Computer Interface is a piece of hardware whose primary purpose is to connect
together any of the following types of computer elements in such a way that the signal
levels and the timing requirements of the elements are matched by the interface. Those
elements are:
 The processor unit
 The memory subsystem(s)
 Peripheral (or I/O) devices
 The buses (also called "links")
In other words, an interface is an electronic circuit that matches the requirements of the
two subsystems between which it is connected. An interface that can be used to connect
Page 236
Last Modified: 01-Nov-06
Advanced Computer Architecture-CS501
the microcomputer bus to peripheral devices is called an I/O Port. I/O ports serve the
following three purposes:
 Buffering (i.e., holding temporarily) the data to and from the computer bus.
 Holding control information that dictates how a transfer is to be conducted.
Holding status information so that the processor can monitor the activity of the
interface and its associated I/O element.
This control information is usually provided by the CPU and is used to tell the device
how to perform the transfer, e.g., if the CPU wants to tell a printer to start a new page,
one of the control signals from the CPU can be used for a paper advance command,
thereby telling the printer to start printing from the top of the next page. In the same way
the CPU may send a control signal to a tape drive connected in the system asking it to
activate the rewind mechanism so that the start of the tape is positioned for access by the
CPU. Status information from various devices helps the CPU to know what is going on in
the system. Once again, using the printer as an example, if the printer runs out of paper,
this information should be sent to the CPU immediately. In the same way, if a hard drive
in the system crashes, or if a sector is damaged and cannot be read, this information
should also be conveyed to the CPU as soon as possible
The term "buffer" used in the above discussion also needs to be understood. In most
cases, the word buffer refers to I/O registers in an interface where data, status or control
information is temporarily stored. A block of memory locations within the main memory
or within the peripheral devices is also called a buffer if it is used for temporary storage.
Special circuits used in the interfaces
for voltage/current matching, at the
input and the output, are also called
The given figure shows a block
diagram of a typical I/O subsystem
connected with the other components
in a computer.  The thick horizontal
line is the system bus that serves as a
back-bone in the entire computer
system. It is used to connect the
memory subsystems as well as the I/O
subsystems together. The CPU also
connects to this bus through a "bus
interface unit", which is not shown in
this figure.  Four I/O modules are
shown in the figure. One module is
used to connect a keyboard and a
mouse to the system bus. A second
module connects a monitor to the
system bus. Another module is used
with a hard disk and a fourth I/O
module is used by a modem. All these
Page 237
Last Modified: 01-Nov-06
Advanced Computer Architecture-CS501
modules are examples of I/O ports. A somewhat detailed view of these modules is shown
in the next figure.
As we already know that the system bus actually consists of three buses, namely the
address bus, the data bus and the control bus. These three buses are being applied to the
I/O module in this figure. At the bottom, we see a set of data, status and control lines
from each "device interface logic" block. Each of these sets connects to a peripheral
device. I/O decoding logic is also shown in this figure.
Memory Mapped I/O versus Isolated I/O
Although this concept was explained earlier as
well, it will be useful to review it again in this
context. In isolated I/O, a separate address
space of the CPU is reserved for I/O
operations.  This address space is totally
different from the address space used for
memory devices. In other words, a CPU has
two distinct address spaces, one for memory
and one for input/output. Unique CPU
instructions are associated with the I/O space,
which means that if those instructions are
executing on the CPU, then the accessed
address space will be the I/O space and hence
the devices mapped on the I/O space.
The x86 family with the in and the out
instructions is a well known example
of  this  situation.  Using  the  in
instruction, the Pentium processor can
receive information from a peripheral
device, and using the out instruction,
the  Pentium  processor  can  send
information to a peripheral device.
Thus, the I/O devices are mapped on
the I/O space in case of the Pentium
processor. In some processors, like the
SRC, there is no separate I/O space. In
this case, some address space out of
the memory address space must be used to map I/O devices. The benefit will be that all
the instructions which access memory can be used for I/O devices. There is no need for
including separate I/O instructions in the ISA
of the processor. However, the disadvantage
will be that the I/O interface will become
complex. If partial decoding is used to reduce
the complexity of the I/O interface, then a lot
of memory addresses will be consumed. The
given figure shows the memory address space
Page 238
Last Modified: 01-Nov-06
Advanced Computer Architecture-CS501
as well as the I/O address space for the Pentium processor. The I/O space is of size 64
Kbytes, organized as eight banks of 8 Kbytes each.
A similar diagram for the FALCON-A was shown earlier and is repeated here for easy
The next question to be answered is how the CPU will differentiate between these two
address spaces. How will the system components know whether a particular transfer is
meant for memory or an I/O device? The answer is simple: by using signals
from the control bus, the CPU will indicate which address space is meant during a
particular transfer. Once again, using the Pentium as an example, if the in instruction is
executing on the processor, the IOR# signal will become active and the MEMR# signal
will be deactivated. For a mov instruction, the control logic will activate the MEMR#
signal instead of the IOR# signal.
Considerations during I/O Subsystem Design
Certain things must be taken care of during the design of an I/O subsystem.
Data location:
The designer must identify the device where the data to be accessed is available, the
address of this device and how to collect the data from this device. For example, if a
database needs to be searched for a record that is stored in the fourth sector of the second
track of the third platter on a certain hard drive in the system, then this information is
related to data location. The particular hard drive must be selected out of the possibly
many hard drives in the system, and the address of this record in terms of platter number,
track number and sector number must be given to this hard drive.
Data transfer:
This includes the direction of transfer of data; whether it is out of the CPU or into the
CPU, whether the data is being sent to the monitor or the hard drive, or whether it is
being received from the keyboard or the mouse. It also includes the amount of data to be
transferred and the rate at which it should be transferred. If a single mouse click is to be
transferred to the CPU, then the amount of data is just one bit; on the other hand, a block
of data for the hard drive may be several kilo bytes. Similarly, the rate of the transfer of
data to a printer is very different from the transfer rate needed for a hard drive.
Data synchronization:
This means that the CPU should input data
from an input device only when the device is
ready to provide data and send data to an
output device only when it is ready to receive
There are three basic schemes which can be
used for synchronization of an I/O data
 Synchronous transmission
 Semi-synchronous transmission
 Asynchronous transmission
Synchronous transmission:
Page 239
Last Modified: 01-Nov-06
Advanced Computer Architecture-CS501
This can be understood by looking at the waveforms shown in Figure A.
M stands for the bus master and S stands for the slave device on the bus. The master and
the slave are assumed to be permanently connected together, so that there is no need for
the selection of the particular slave device out of the many devices that may be present in
the system. It is also assumed that the slave device can perform the transfer at the speed
of the master, so no handshaking signals are needed.
At the start of the transfer operation, the master activates the Read signal, which indicates
to the slave that it should respond with data. The data is provided by the slave, and the
master uses the Enable signal to latch it. All
activity takes place synchronously with the
Figure A
system clock (not shown in the figure). A
familiar example of synchronous transfer is a register-to-register transfer within a CPU.
Semi-synchronous transmission:
Figure B explains this type of transfer. All activity is still synchronous with the system
clock, but in some situations, the slave device
may not be able to provide the data to the
master within the allotted time. The additional
time needed by the slave, can be provided by
adding an integral number of clock periods to
Figure A
the master's cycle time.
The slave indicates its readiness by activating
the complete signal. Upon receiving this
signal, the master activates the Enable signal
to latch the data provided by the slave.
Transfers between the CPU and the main
memory are examples of semi-synchronous
Asynchronous transmission:
This type of transfer does not require a
common clock. The master and the slave
Figure B
operate at different speeds. Handshaking
signals are necessary in this case, and are used
to coordinate the data transfer between the
master and the slave as shown in the Figure C.
When the master wants to initiate a data
transfer, it activates its Ready signal. The
slave detects this signal, and if it can provide
data to the master, it does so and also activates
its Acknowledge signal. Upon receiving the
Acknowledge signal, the master uses the
Enable signal to latch the incoming data .The
master then deactivates its Ready line, and in
response to it, the slave removes its data and
deactivates its Acknowledge line.
In all the three cases discussed above, the
Figure C
Page 240
Last Modified: 01-Nov-06
Advanced Computer Architecture-CS501
waveforms correspond to an "input" or a "read"
operation. A similar explanation will apply to an "output" or a "write" operation. It
should also be noted that the latching of the incoming data can be done by the master
either by using the rising edge of the Enable signal or by using its falling-edge. This will
depend on the way the intermediate circuitry between the master and the slave is
Serial and Parallel Transfers
There are two ways in which data can be transferred between the CPU and an I/O device:
serial and parallel.
Serial Transfer, or serial communication of data between the CPU and the I/O devices,
refers to the situation when all the data bits in a "piece of information", (which is a byte
or word mostly), are transferred one bit at a time, over a single pair of wires.
 Easy to implement, especially by using UARTs7 or USARTs8.
 Low cost because of less wires.
 Longer distance between transmitter and receiver.
 Slow by its very nature.
 Inefficient because of the associated overhead, as we will see when we discuss the
serial wave forms.
Parallel Transfer, or parallel communication of data between the CPU and the I/O
devices, refers to the situation when all the bits of data (8 or 16 usually), are transferred
over separate lines simultaneously, or in parallel.
 Fast (compared to serial communication)
 High cost (because of more lines).
 Cost increases with distance.
 Possibility of interference (noise) increases with distance.
Remember that the terms "serial" and "parallel" are with respect to the computer I/O
ports and not with respect to the CPU. The CPU always transfers data in parallel.
Types of serial communication
There are two types of serial communication:
 Special bit patterns separate the characters.
 "Dead time" between characters can be of any length.
 Clocks at both ends need not have the same frequency (within permissible
Universal Asynchronous Receiver Transmitter.
Universal Synchronous Asynchronous Receiver Transmitter.
Page 241
Last Modified: 01-Nov-06
Advanced Computer Architecture-CS501
Characters are sent back to back.
Must include special "sync" characters at the beginning of each message.
Must have special "idle" characters in the data stream to fill up the time
when no information is being sent.
Characters must be precisely spaced.
Activity at both ends must be coordinated by a single clock. (This implies
that the clock must be transmitted with data).
The "maximum information rate" of a synchronous line is higher than that of an
asynchronous line with the same "bit rate", because the asynchronous transmission must
use extra bits with each character. Different protocols are used for serial and parallel
transfer. A protocol is a set of rules understood by both the sender and the receiver. In
some cases, these protocols can be predefined for a certain system.  As an alternate,
some available standard protocols can be used.
Error conditions related to serial communication
(Some related to synchronous transmission, some to asynchronous, and some to both).
 Framing Error: is said to occur when a 0 is received instead of a stop bit (which is
always a 1). It means that after the detection of the beginning of a character with a
start bit, the appropriate number of stop bits was not detected. [A]
 Parity Error: is said to occur when the parity* of the received data is not the same
as it should be. [B] (PARITY is equivalent to the number of 1's; it is either EVEN
or ODD. A PARITY BIT is an extra bit added to the data, for the purpose of error
detection and correction. If even parity is used, the parity bit is set so that the total
number of 1's, including the parity bit, is even. The same applies to odd parity.)
 Overrun Error: means that the prior character that was received, was not yet read
from the USART's "receive data register" by the CPU, and is overwritten by the
new received character. Thus the first character was lost, and should be
retransmitted. [A]
 Under-run Error: If a character is not available at the beginning of an interval, an
under-run is said to occur. The transmitter will insert an idle character till the end
of the interval. [S]
Computer Bus or
System Bus
I/O Buses
The block diagram of a general purpose
computer system that has been referred to
repeatedly in this course has three buses
in addition to the three most important
blocks. These three buses are collectively
referred to as the system bus or the
The bus interface
computer bus9. The block diagram is
unit is usually
between the CPU
In some cases, the external CPU bus is the same as the system bus, especially in the case of System bus.
and small,
dedicated systems. However, for most systems, there is a "bus interface unit" between the CPU and the
system bus. The bus interface unit is not shown in the figure.
Page 242
Last Modified: 01-Nov-06
Advanced Computer Architecture-CS501
repeated here for an easy reference in Figure 1.
Another organization that is used in modern computers is shown in Figure 2. It has a
memory bus for connecting the CPU to the memory subsystem. This bus
is separate from the I/O bus that is used to connect peripherals and I/O devices to the
Figure 1
Examples of I/O buses include the PCI bus and the ISA bus. These I/O buses provide an
"abstract interface" that can be used for interfacing a large variety of peripherals to the
system with minimum hardware. It is also possible to standardize I/O buses, as done by
several agencies, so that third party manufacturers can build add-on sub systems for
existing architectures.
The location of these I/O buses may be different in different
Earlier generation computers used a
single bus over which the CPU could
communicate with the memory as well
as the I/O devices. This meant that the
bandwidth of the bus was shared
between the memory and I/O devices.
However, with the passage of time,
Figure 1
computer architects drifted towards
separate memory and I/O buses,
thereby giving more flexibility to users
wanting to upgrade their existing
A main disadvantage of I/O buses (and
the buses in general) is that every bus has a fixed bandwidth which is shared by all
Figure 2
devices on the bus. Additionally, electrical constraints like transmission line effects and
bus length further reduce the bandwidth. As a result of this, the designer has to make a
decision whether to sacrifice interface simplicity (by connecting more devices to the bus)
at the cost of bandwidth, or connect fewer devices to the bus and keep things simple to
get a better bandwidth. This can be explained with the help of an example.
Example # 1
Problem statement:
Consider an I/O bus that can transfer 4 bytes of data in one bus cycle. Suppose that a
designer is considering to attach the following two components to this bus:
Hard drive, with a transfer rate of 40 Mbytes/sec
Video card, with a transfer rate of 128 Mbytes/sec.
What will be the implications?
The maximum frequency of the bus is 30 MHz10. This means that the maximum
bandwidth of this bus is 30 x 4 = 120 Mbytes/sec. Now, the demand for bandwidth from
these two components will be 128 + 40 =168 Mbytes/sec which is more than the 120
These numbers correspond to an I/O bus that is relatively old. Modern systems use much faster buses
than this.
Page 243
Last Modified: 01-Nov-06
Advanced Computer Architecture-CS501
Mbytes/sec that the bus can provide. Thus, if the designer uses these two components
with this bus, one or both of these components will be operating at reduced bandwidth.
Bus arbitration:
Arbitration is another issue in the use of I/O buses. Most commercially available I/O
buses have protocols defining a number of things, for example how many devices can
access the bus, what will happen if multiple devices want to access the bus at the same
time, etc. In such situations, an "arbitration scheme" must be established. As an example,
in the SCSI11 specifications, every device in the system is assigned an ID which identifies
the device to the "bus arbiter". If multiple devices send a request for the bus, the device
with the highest priority will be given access to the bus first. Such a scheme is easy to
implement because the arbiter can easily decide which device should be given access to
the bus, but its disadvantage is that the device with a low priority will
not be able to get access to the bus12. An alternate scheme would be to give the highest
priority to the device that has been waiting for the longest time for the bus. As a result of
this arbitration, the access time, or the latency, of such buses will be further reduced.
Details about the PCI and some other buses will be presented in a separate section.
Example # 2
Problem statement:
If a bus requires 10 nsec for bus requests, 10 nsec for arbitration and the average time to
complete an operation is 15 nsec after the access to the bus has been granted, is it
possible for such a bus to perform 50 million IOPS?
For 50 million IOPS, the average time for each IOP is 1 / (50 x 106) =20 nsec. Given the
information about the bus, the sum of the three times is 10 + 10 + 15 = 35 nsec for a
complete I/O operation. This means that the bus can perform a maximum of 1 / (35 x 10-
) = 28.6 million IOPS.
Thus, it will not be able to perform 50 million IOPS.
Small Computer System Interface.
Such a situation is called "starvation".
Page 244
Last Modified: 01-Nov-06
Table of Contents:
  1. Computer Architecture, Organization and Design
  2. Foundations of Computer Architecture, RISC and CISC
  3. Measures of Performance SRC Features and Instruction Formats
  4. ISA, Instruction Formats, Coding and Hand Assembly
  5. Reverse Assembly, SRC in the form of RTL
  6. RTL to Describe the SRC, Register Transfer using Digital Logic Circuits
  7. Thinking Process for ISA Design
  8. Introduction to the ISA of the FALCON-A and Examples
  9. Behavioral Register Transfer Language for FALCON-A, The EAGLE
  10. The FALCON-E, Instruction Set Architecture Comparison
  11. CISC microprocessor:The Motorola MC68000, RISC Architecture:The SPARC
  12. Design Process, Uni-Bus implementation for the SRC, Structural RTL for the SRC instructions
  13. Structural RTL Description of the SRC and FALCON-A
  14. External FALCON-A CPU Interface
  15. Logic Design for the Uni-bus SRC, Control Signals Generation in SRC
  16. Control Unit, 2-Bus Implementation of the SRC Data Path
  17. 3-bus implementation for the SRC, Machine Exceptions, Reset
  18. SRC Exception Processing Mechanism, Pipelining, Pipeline Design
  19. Adapting SRC instructions for Pipelined, Control Signals
  20. SRC, RTL, Data Dependence Distance, Forwarding, Compiler Solution to Hazards
  21. Data Forwarding Hardware, Superscalar, VLIW Architecture
  22. Microprogramming, General Microcoded Controller, Horizontal and Vertical Schemes
  23. I/O Subsystems, Components, Memory Mapped vs Isolated, Serial and Parallel Transfers
  24. Designing Parallel Input Output Ports, SAD, NUXI, Address Decoder , Delay Interval
  25. Designing a Parallel Input Port, Memory Mapped Input Output Ports, wrap around, Data Bus Multiplexing
  26. Programmed Input Output for FALCON-A and SRC
  27. Programmed Input Output Driver for SRC, Input Output
  28. Comparison of Interrupt driven Input Output and Polling
  29. Preparing source files for FALSIM, FALCON-A assembly language techniques
  30. Nested Interrupts, Interrupt Mask, DMA
  31. Direct Memory Access - DMA
  32. Semiconductor Memory vs Hard Disk, Mechanical Delays and Flash Memory
  33. Hard Drive Technologies
  34. Arithmetic Logic Shift Unit - ALSU, Radix Conversion, Fixed Point Numbers
  35. Overflow, Implementations of the adder, Unsigned and Signed Multiplication
  36. NxN Crossbar Design for Barrel Rotator, IEEE Floating-Point, Addition, Subtraction, Multiplication, Division
  37. CPU to Memory Interface, Static RAM, One two Dimensional Memory Cells, Matrix and Tree Decoders
  38. Memory Modules, Read Only Memory, ROM, Cache
  39. Cache Organization and Functions, Cache Controller Logic, Cache Strategies
  40. Virtual Memory Organization
  41. DRAM, Pipelining, Pre-charging and Parallelism, Hit Rate and Miss Rate, Access Time, Cache
  42. Performance of I/O Subsystems, Server Utilization, Asynchronous I/O and operating system
  43. Difference between distributed computing and computer networks
  44. Physical Media, Shared Medium, Switched Medium, Network Topologies, Seven-layer OSI Model