ZeePedia

Nested Interrupts, Interrupt Mask, DMA

<< Preparing source files for FALSIM, FALCON-A assembly language techniques
Direct Memory Access - DMA >>
img
Advanced Computer Architecture-CS501
________________________________________________________
Advanced Computer Architecture
Lecture No. 30
Reading Material
Vincent P. Heuring & Harry F. Jordan
Chapter 8
Computer Systems Design and Architecture
8.3.3, 8.4
Summary
·
Nested Interrupts
·
Interrupt Mask
· DMA
Nested Interrupts
(Read from Book, Jordan Page 397)
Interrupt Mask
(Read from Book, Jordan Page 397)
Priority Mask
(Read from Book, Jordan Page 398)
Examples
Example # 123
Assume that three I/O devices are connected to a 32-bit, 10 MIPS CPU. The first device
is a hard drive with a maximum transfer rate of 1MB/sec. It has a 32-bit bus. The second
device is a floppy drive with a transfer rate of 25KB/sec over a 16-bit bus, and the third
device is a keyboard that must be polled thirty times per second. Assuming that the
polling operation requires 20 instructions for each I/O device, determine the percentage
of CPU time required to poll each device.
Solution:
The hard drive can transfer 1MB/sec or 250 K 32-bit words every second. Thus, this hard
drive should be polled using at least this rate.
Using 1K=210, the number of CPU instructions required would be
250 x 210 x 20 = 5120000 instructions per second.
23
Adopted from [H&P org]
Page 309
Last Modified: 01-Nov-06
img
Advanced Computer Architecture-CS501
________________________________________________________
Percentage of CPU time required for polling is
(5.12 x 106)/ (10 x106) = 51.2%
The floppy disk can transfer 25K/2= 12.5 x 210 half-words per second. It should be
polled with at least this rate. The number of CPU instructions required will be 12.5 x 210
x 20 = 256,000 instructions per second.
Therefore, the percentage of CPU time required for polling is
(0.256 x 106)/ (10 x 106) = 2.56%.
For the keyboard, the number of instructions required for polling is
30 x 20 = 600 instructions per second.
Therefore, the percentage of CPU time spent in polling is
600 / (10 x 106) = 0.006%
It is clear from this example that while it is acceptable to use polling for a keyboard or a
floppy drive, it is very risky to use polling for the hard drive. In general, for devices with
a high data rate, the use of polling is not adequate.
Example # 22
a. What should be the polling frequency for an I/O device if the average delay
between the time when the device wants to make a request and the time when it is
polled, is to be at most 10 ms?
b. If it takes 10,000 cycles to poll the I/O device, and the processor operates at
100MHz, what % of the CPU time is spent polling?
c. What if th24e system wants to provide an average delay of 1msec?
Solution:
a. Assuming that the I/O requests are distributed evenly in time, the average time
that a device will have to wait for the processor to poll is half the time between
polling attempts. Therefore, to provide an average delay of 10 ms, the processor
will have to poll every 20 ms, or 50 times per second.
b. If each polling attempt takes 10,000 cycles, then the processor will spend 500,000
cycles polling each second. The % of CPU time spent in polling is then
(0.5x106)/(100x106)=0.5%
c. To provide an average delay of 1ms, the polling frequency must be increased. The
processor will have to poll every 2ms, or 500 times per second. This will consume
5,000,000 cycles for polling. The % of CPU time spent polling then becomes
5/100=5%.
24
Adopted from [Schaum]
Page 310
Last Modified: 01-Nov-06
img
Advanced Computer Architecture-CS501
________________________________________________________
Example # 325
What percentage of time will a 20MIPS processor spend in the busy wait loop of an 80-
character line printer when it takes 1 msec to print a character and a total of 565
instructions need to be executed to print an 80 character line. Assume that two
instructions are executed in the polling loop.
Solution:
Out of the total 565 instructions executed to print a line, 80x2=160 are required for
polling. For a 20MIPS processor, the execution of the remaining 405 instructions takes
405/ (20x106) = 20.25µsec. Since the printing of 80 characters takes 80ms, (80-0.02025)
=79.97msec is spent in the polling loop before the next 80 characters can be printed. This
is 79.97/80=99.96% of the total time.
Example # 426
Consider a 20 MIPS processor with several input devices attached to it, each running at
1000 characters per second. Assume that it takes 17 instructions to handle an interrupt. If
the hardware interrupt response takes 1µsec, what is the maximum number of devices
that can be handled simultaneously?
Solution:
A service for one character requires 17/ (20x106) +1µsec=1.85µsec. Since each device
runs at 1000 characters per second, 1.85 ms of handling time is required by each device
every second. Therefore the maximum number of devices that can be handled is 1/
(1.85x10-3) = 540.
Example # 527
Assume that a floppy drive having a transfer rate of 25KB per second is attached to a 32
bit, 10MIPS CPU using an interrupt driven interface. The drive has a 16-bit data bus.
Assume that the interrupt overhead is 20 instructions. Calculate the fraction of CPU time
required to service this drive when it is active.
Solution:
Since the floppy drive has a 16-bit data bus, it can transfer two bytes at one time. Thus its
transfer rate is 25/2 = 12.5K half-words (16-bits each) per second. This corresponds to an
overhead of 20 instructions or 12.5K x 20 = 12.5 x 210 x 20 = 256000 instructions per
second.
Example # 628
A processor with a 500 MHz clock requires 1000 clock cycles to perform a context
switch and start an ISR. Assume each interrupt takes 10,000 cycles to execute the ISR
25
Adopted from [H&J]
26
Adopted from [H&J]
27
Adopted from [H&P org]
28
Adopted from [Schaum]
Page 311
Last Modified: 01-Nov-06
Advanced Computer Architecture-CS501
________________________________________________________
and the device makes 200 interrupt requests per second. Also, assume that the processor
polls every 0.5msec during the time when there are no interrupts. Further assume that
polling an I/O device requires 500 cycles. Compute the following:
a. How many cycles per second does the processor spend handling I/O from the
device if only interrupts are used?
b. What fraction of the CPU time is used in interrupt handling for part (a)?
c. How many cycles per second are spent on I/O if polling is also used with
interrupts?
d. How often should the processor poll so that polling incurs the same overhead as
interrupts?
Solution:
a. The device makes 200 interrupt requests per second, each of which takes
10,000 + 2x1000 (context switching to the ISR and back from it)
= 12,000 cycles.
Thus, a total of 200x12,000=2,400,000 cycles per second are spent handling I/O
using
interrupts.
b. The percentage of the processor time used in interrupt handling is
2,400,000/(500x106) or 0.48%.
c. There are 200 interrupt requests per second, or one interrupt request every 5 ms.
Every interrupt consumes a total of 12,000 cycles, as calculated in part (a). For a
500 MHz CPU, this is
12000/(500 x 106 ) = 24 microseconds.
For 200 interrupts per second, this is 4.8 msec.
This leaves 1000 - 4.8 = 995.2 msec for polling.
Since the processor polls once every 0.5 msec during the time when there is no
interrupt, this corresponds to
995/0.5 = 1990 times per second.
The total number of cycles required for polling is
1990 x 500 = 995,000 cycles per second.
Thus, the total time spent on I/O when using polling with
interrupts is
Page 312
Last Modified: 01-Nov-06
Advanced Computer Architecture-CS501
________________________________________________________
2,400,000 + 995,000 = 3,395,000 cycles per second.
d. The interrupt overhead is 1000 cycles per second for a context switch to the ISR
and 1000 cycles per second back from it. This is a total of 2 x 1000 cycles per
second. With 200 interrupts per second, this is
200 x 2000 = 400,000 cycles per second.
The polling overhead is 500 cycles per second. Thus, for the same overhead
as
interrupts, the polling operation should be performed
400,000 / 500 = 800 times per second,
or 1/800 = every 1.25 msec.
Direct Memory Access (DMA)
Direct memory access is a technique, where by the CPU passes its control to the memory
subsystem or one of its peripherals, so that a contiguous block of data could be
transferred from peripheral device to memory subsystem or from memory subsystem to
peripheral device or from one peripheral device to another peripheral device.
Advantage of DMA
The transfer rate is pretty fast and conceptually you could imagine that through disabling
the tri-state buffers, the system bus is isolated and a direct connection is established
between the I/O subsystem and the memory subsystem and then the CPU is free. It is idle
at that time or it could do some other activity. Therefore, the DMA would be quite useful,
if a large amount of data needs to be transferred, for example from a hard disk to a printer
or we could fill up the buffer of a printer in a pretty short time.
As compared to interrupt driven I/O or the programmed I/O, DMA would be much faster.
What is the consequence? The consequence is that we need to have another chip, which is
a DMA controller. "A DMA controller could be a CPU in itself and it could control the
total activity and synchronize the transfer of data". DMA could be considered as a
technique of transferring data from I/O to memory and from memory to I/O without the
intervention of the CPU. The CPU just sets up an I/O module or a memory
subsystem, so that it passes control and the data could be passed on from I/O to memory
or from memory to I/O or within the memory from one subsystem to another subsystem
without interaction of the CPU. After this data transfer is complete, the control is passed
from I/O back to the CPU.
Now we can illustrate further the advantage of DMA using following example.
Example of DMA
If we write instruction load as follows:
load [2], [9]
This instruction is illegal and not available in the SRC processor. The symbols [2] and [9]
represent memory locations. If we want to have this transfer to be done then two steps
would be required. The instruction would be:
load r1, [9]
store r1, [2]
Page 313
Last Modified: 01-Nov-06
Advanced Computer Architecture-CS501
________________________________________________________
Thus it is not possible to transfer from one memory location to another without involving
the CPU. The same applies to transfer between memory and peripherals connected to I/O
ports. For example we cannot have:
out [6], datap
It has to be done again in two steps:
load r1, [6]
out r1, datap
Similar comments apply to the "in" instruction. Thus the real cause of the limited transfer
rate is the CPU itself. It acts as an unnecessary middle man. The example illustrates that
in general, every data word travels over the system bus twice and this is not necessary,
and therefore, the DMA in such cases is pretty useful.
DMA Approach
The DMA approach is to turn off i.e. through tri-state buffers and therefore, electrically
disconnect from the system bus, the CPU and let a peripheral device or a memory
subsystem or any other module or another block of the same module communicate
directly with the memory or with another peripheral device. This would have the
advantage of having higher transfer rates which could approach that of limited by the
memory itself.
Disadvantage of DMA
The disadvantage however, would be that an additional DMA controller would be
required, that could make the system a bit more complex and expensive. Generally, the
DMA requests have priority over all other bus activities including interrupts. No
interrupts may be recognized during a DMA cycle.
Page 314
Last Modified: 01-Nov-06
Table of Contents:
  1. Computer Architecture, Organization and Design
  2. Foundations of Computer Architecture, RISC and CISC
  3. Measures of Performance SRC Features and Instruction Formats
  4. ISA, Instruction Formats, Coding and Hand Assembly
  5. Reverse Assembly, SRC in the form of RTL
  6. RTL to Describe the SRC, Register Transfer using Digital Logic Circuits
  7. Thinking Process for ISA Design
  8. Introduction to the ISA of the FALCON-A and Examples
  9. Behavioral Register Transfer Language for FALCON-A, The EAGLE
  10. The FALCON-E, Instruction Set Architecture Comparison
  11. CISC microprocessor:The Motorola MC68000, RISC Architecture:The SPARC
  12. Design Process, Uni-Bus implementation for the SRC, Structural RTL for the SRC instructions
  13. Structural RTL Description of the SRC and FALCON-A
  14. External FALCON-A CPU Interface
  15. Logic Design for the Uni-bus SRC, Control Signals Generation in SRC
  16. Control Unit, 2-Bus Implementation of the SRC Data Path
  17. 3-bus implementation for the SRC, Machine Exceptions, Reset
  18. SRC Exception Processing Mechanism, Pipelining, Pipeline Design
  19. Adapting SRC instructions for Pipelined, Control Signals
  20. SRC, RTL, Data Dependence Distance, Forwarding, Compiler Solution to Hazards
  21. Data Forwarding Hardware, Superscalar, VLIW Architecture
  22. Microprogramming, General Microcoded Controller, Horizontal and Vertical Schemes
  23. I/O Subsystems, Components, Memory Mapped vs Isolated, Serial and Parallel Transfers
  24. Designing Parallel Input Output Ports, SAD, NUXI, Address Decoder , Delay Interval
  25. Designing a Parallel Input Port, Memory Mapped Input Output Ports, wrap around, Data Bus Multiplexing
  26. Programmed Input Output for FALCON-A and SRC
  27. Programmed Input Output Driver for SRC, Input Output
  28. Comparison of Interrupt driven Input Output and Polling
  29. Preparing source files for FALSIM, FALCON-A assembly language techniques
  30. Nested Interrupts, Interrupt Mask, DMA
  31. Direct Memory Access - DMA
  32. Semiconductor Memory vs Hard Disk, Mechanical Delays and Flash Memory
  33. Hard Drive Technologies
  34. Arithmetic Logic Shift Unit - ALSU, Radix Conversion, Fixed Point Numbers
  35. Overflow, Implementations of the adder, Unsigned and Signed Multiplication
  36. NxN Crossbar Design for Barrel Rotator, IEEE Floating-Point, Addition, Subtraction, Multiplication, Division
  37. CPU to Memory Interface, Static RAM, One two Dimensional Memory Cells, Matrix and Tree Decoders
  38. Memory Modules, Read Only Memory, ROM, Cache
  39. Cache Organization and Functions, Cache Controller Logic, Cache Strategies
  40. Virtual Memory Organization
  41. DRAM, Pipelining, Pre-charging and Parallelism, Hit Rate and Miss Rate, Access Time, Cache
  42. Performance of I/O Subsystems, Server Utilization, Asynchronous I/O and operating system
  43. Difference between distributed computing and computer networks
  44. Physical Media, Shared Medium, Switched Medium, Network Topologies, Seven-layer OSI Model