DRAM, Pipelining, Pre-charging and Parallelism, Hit Rate and Miss Rate, Access Time, Cache

<< Virtual Memory Organization

Performance of I/O Subsystems, Server Utilization, Asynchronous I/O and operating system >>

Advanced Computer Architecture-CS501

________________________________________________________

Advanced Computer Architecture

Lecture No. 41

Reading Material

Vincent P. Heuring & Harry F. Jordan

Computer Systems Design and Architecture

Summary

Numerical Examples related to

�

DRAM

�

Pipelining, Pre-charging and Parallelism

�

Cache

�

Hit Rate and Miss Rate

�

Access Time

Example 1

If a DRAM has 512 rows and its refresh time is 9ms, what should be the frequency of

row refresh operation on the average?

Solution

Refresh time= 9ms

Number of rows=512

Therefore we have to do 512 row refresh operations in a 9 ms interval, in other words

one row refresh operation every (9x10-3)/512 =1.76x10-5seconds.

Example 2

Consider a DRAM with 1024 rows and a refresh time of 10ms.

a. Find the frequency of row refresh operations.

b. What fraction of the DRAM's time is spent on refreshing if each refresh takes 100ns.

Solution

Total number of rows = 1024

Refresh period = 10ms

One row refresh takes place after every

10ms/1024=9.7micro seconds

Each row refresh takes 100ns, so fraction of the DRAM's time taken by row refreshes is,

100ns/9.7 micro sec= 1.03%

Page 358

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

________________________________________________________

Example 3

Consider a memory system having the following specifications. Find its total cost and

cost per byte of memory.

Memory type

Total bytes

Cost per byte

SRAM

256 KB

30$ per MB

DRAM

128 MB

1$ per MB

Disk

1 GB

10$ per GB

Solution

Total cost of system

256 KB( � MB) of SRAM costs = 30 x � = $7.5

128 MB of DRAM costs= 1 x 128= $128

1 GB of disk space costs= 10 x 1=$10

Total cost of the memory system

= 7.5+128+10=$145.5

Cost per byte

Total storage= 256 KB + 128 MB + 1 GB

= 256 KB + 128x1024KB + 1x1024x1024KB

=1,179,904 KB

Total cost = $145.5

Cost per byte=145.5/(1,179,904x1024)

= $1.2x10-7$/B

Example 4

Find the average access time of a level of memory hierarchy if the hit rate is 80%. The

memory access takes 12ns on a hit and 100ns on a miss.

Solution

Hit rate =80%

Miss rate=20%

Thit=12 ns

Tmiss=100ns

Average Taccess=(hit rate*Thit)+(miss rate*Tmiss)

=(0.8*12ns)+(0.2*100ns)

= 29.6ns

Page 359

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

________________________________________________________

Example 5

Consider a memory system with a cache, a main memory and a virtual memory. The

access times and hit rates are as shown in table. Find the average access time for the

hierarchy.

Main memory

cache

virtual memory

Hit rate

99%

80%

100%

Access time

100ns

5ns

8ms

Solution

Average access time for requests that reach the main memory

= (100ns*0.99)+(8ms*0.01)

= 80,099 ns

Average access time for requests that reach the cache

=(5ns*0.8)+(80,099ns*0.2)

=16,023.8ns

Example 6

Given the following memory hierarchy, find the average memory access time of the

complete system

Memory type

Average access time

Hit rate

SRAM

5ns

80 %

DRAM

60ns

80%

Disk

10ms

100%

Solution

Page 360

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

________________________________________________________

For each level, average access time=( hit rate x access time for that level) + ((1-hit rate) x

average access time for next level)

Average access time for the complete system

= (0.8x5ns) + 0.2 x((0.8x60ns) + (0.2)(1x10ms))

= 4 + 0.2(48+2000000)

=4 + 400009.6

= 400013.6 ns

Example 7

Find the bandwidth of a memory system that has a latency of 25ns, a pre charge time of

5ns and transfers 2 bytes of data per access.

Solution

Time between two memory references

=latency + pre charge time

= 25 ns+ 5ns

= 30ns

Throughput = 1/30ns

=3.33x107 operations/second

Bandwidth = 2x 3.33x107

= 6.66x107 bytes/s

Example 8

Consider a cache with 128 byte cache line or cache block size. How many cycles does it

take to fetch a block from main memory if it takes 20 cycles to transfer two bytes of data?

Solution

The number of cycles required for the complete transfer of the block

=20 x 128/2

= 1280 cycles

Using large cache lines decreases the miss rate but it increases the amount of time a

program takes to execute as obvious from the number of clock cycles required to transfer

a block of data into the cache.

Example 9

Find the number of cycles required to transfer the same 128 byte cache line if page-mode

DRAM with a CAS-data delay of 8 cycles is used for main memory. Assume that the

cache lines always lie within a single row of the DRAM, and each line lies in a different

row than the last line fetched.

Solution

Page 361

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

________________________________________________________

Memory requests to fetch each cache line=128/2= 64

Only the first fetch require the complete 20 cycles, and the other 63 will take only 8 clock

cycles. Hence the no. of cycles required to fetch a cache line

=20 + 8 x 63

= 524

Example 10

Consider a 64KB direct-mapped cache with a line length of 32 bytes.

a. Determine the number of bits in the address that refer to the byte within a cache

line.

b. Determine the number of bits in the address required to select the cache line.

Solution

Address breakdown

n=log2 of number of bytes in line

m=log2 of number of lines in cache

a. For the given cache, the number of bits in the address to determine the byte

within the line= n = log232 = 5

b. There are 64K/32= 2048 lines in the given cache. The number of bits required to

select the required line = m =log22048 = 11

Hence n=5 and m=11 for this example.

Example 11

Consider a 2-way set-associative cache with 64KB capacity and 16 byte lines.

a. How many sets are there in the cache?

b. How many bits of address are required to select a set in the cache?

c. Repeat the above two calculations for a 4-way set-associative cache with

same size.

Solution

a. A 64KB cache with 16 byte lines contains 4096 lines of data. In a 2-way set

associative cache, each set contains 2 lines, so there are 2048 sets in the cache.

b. Log2(2048)=11. Hence 11 bits of the address are required to select the set.

c. The cache with 64KB capacity and 16 byte line has 4096 lines of data. For a 4-

way set associative cache, each set contains 4 lines, so the number of sets in the

Page 362

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

________________________________________________________

cache would be 1024 and Log 2 (1024) =10. Therefore 10 bits of the address are

required to select a set in the cache.

Example 12

Consider a processor with clock cycle per instruction (CPI) = 1.0 when all memory

accesses hit in the cache. The only data accesses are loads and stores, and these constitute

60% of all the instructions. If the miss penalty is 30 clock cycles and the miss rate is

1.5%, how much faster would the processor be if all instructions were cache hits?

Solution

Without any misses, the computer performance is

CPU execution time = (CPU clock cycles + Memory stall cycles) x Clock cycle

=(IC x CPI+ 0)x Clock cycle = IC x 1.0 x Clock cycle

Now for the computer with the real cache, first we compute the number of memory stall

cycles:

Memory accesses

= IC x Instruction x Miss Rate x Miss Penalty

Memory stall cycles

= IC x (l + 0.6) x 0.015 x 30

= IC x 0.72

where the middle term (1 + 0.6) represents one instruction access and 0.6 data accesses

per instruction. The total performance is thus

CPU execution time cache = (IC x 1.0 + IC x 0.72) x Clock cycle

= 1.72 x IC x Clock cycles

The performance ratio is the inverse of the execution times

CPU execution time cache = 1.72 x IC x clock cycle

CPU execution time

1.0 x IC x clock cycle

The computer with no cache misses is 1.72 times faster

Example 13

Consider the above example but this time assume a miss rate of 20 per 1000 instructions.

What is memory stall time in terms of instruction count?

Solution

Re computing the memory stall cycles:

Memory stall cycles=Number of misses x Miss penalty

=IC * Misses * Miss penalty

Page 363

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

________________________________________________________

Instruction

=IC / 1000 * Misses * Miss penalty

Instruction * 1000

=IC / 1000 * 20 * 30

= IC /1000 * 600= IC * 0.6

Example 14

What happens on a write miss?

Solution

The two options to handle a write miss are as follows:

Write Allocate

The block is allocated on a write miss, followed by the write hit actions. This is just like

read miss.

No-Write Allocate

Here write misses do not affect the cache. The block is modified only in the lower level

memory.

Example 15

Assume a fully associative write-back cache with many cache entries that starts empty.

Below is a sequence of five memory operations (the address is in square brackets):

Write Mem[300];

Read Mem[400];

Write Mem[400];

WriteMem[300];

What is the number of hits and misses when using no-write allocate versus write allocate?

Solution

For no-write allocate, the address 300 is not in the cache, and there is no allocation on

write, so the first two writes will result in misses. Address 400 is also not in the cache, so

the read is also a miss. The subsequent write to address 400 is a hit. The last write to 300

is still a miss. The result for no-write allocate is four misses and one hit.

For write allocate, the first accesses to 300 and 400 are misses, and the rest are hits since

300 and 400 are both found in the cache. Thus, the result for write allocate is two misses

and three hits.

Example 16

Page 364

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

________________________________________________________

Which has the lower miss rate?

a 32 KB instruction cache with a 32 KB data cache or a 64 KB unified cache?

Use the following Miss per 1000 instructions.

size

Instruction

Data cache

Unified

cache

32 KB

1.5

42.2

64 KB

0.7

38.5

41.2

Assumptions

�

The percentage of instruction references is about 75%.

�

Assume 40% of the instructions are data transfer instructions.

�

Assume a hit takes 1 clock cycle.

�

The miss penalty is 100 clock cycles.

�

A load or store hit takes 1 extra clock cycle on a unified cache if there is only one

cache port to satisfy two simultaneous requests.

� Also the unified cache might lead to a structural hazard.

� Assume write-through caches with a write buffer and ignore stalls due to the write

buffer.

What is the average memory access time in each case?

Solution

First let's convert misses per 1000 instructions into

miss rates.

Misses

Miss rate =

1000 Instructions

Memory accesses

Instruction

Since every instruction access has exactly one memory access to fetch the instruction, the

instruction miss rate is

Miss rate32 KB instruction = 1.5/1000 = 0.0015

1.00

Since 40% of the instructions are data transfers, the data miss rate is

Miss Rate 32 kb data = 40 /1000

= 0.1

0.4

Page 365

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

________________________________________________________

The unified miss rate needs to account for instruction and data accesses:

Miss Rate 64 kb unified = 42.2 /1000 = 0.031

1.00+ 0.4

As stated above, about 75% of the memory accesses are instruction references. Thus, the

overall miss rate for the split caches is

(75% x 0.0015) + (25% x 0.1) = 0.026125

Thus, a 64 KB unified cache has a slightly lower effective miss rate than two 16 KB

caches. The average memory access time formula can be divided into instruction and data

accesses:

Average memory access time

= % instructions x (Hit time + Instruction miss rate x Miss Penalty) + % data x (Hit time

+ Data miss rate x Miss Penalty)

Therefore, the time for each organization is:

Average memory access time split

= 75%x(l +0.0015x 100) + 25%x(l +0.1x100)

= (75% x 1.15) + (25% x 11)

= 0.8625+2.75= 3.61

Average memory access time unified

= 75% x (1+0.031 x 100) +25% x (1 + 1+0.031 x 100)

= (75% x 4.1) + (25% x 5.1) = 3.075+1.275

= 4.35

Hence split caches have a better average memory access time despite having a worse

effective miss rate. Split cache also avoids the problem of structural hazard present in a

unified cache.

Page 366

Last Modified: 01-Nov-06

Table of Contents: