Data Storage Concepts, Physical Storage Media, Memory Hierarchy

<< Designing Input Form, Arranging Form, Adding Command Buttons

File Organizations: Hashing Algorithm, Collision Handling >>

Database Management System (CS403)

Lecture No. 34

Reading Material

"Database Management Systems", 2nd edition, Raghu Ramakrishnan, Johannes

Gehrke, McGraw-Hill

"Modern Database Management", Fred McFadden, Jeffrey Hoffer,

Benjamin/Cummings

Overview of Lecture

o Data Storage Concepts

o Physical Storage Media

o Memory Hierarchy

In the previous lecture we have discussed the forms and their designing. From

this lecture we will discuss the storage media.

Classification of Physical Storage MediaStorage media are classified according to

following characteristics:

Speed of access

Cost per unit of data

Reliability

We can also differentiate storage as either

Volatile storage

Non-volatile storage

Computer storage that is lost when the power is turned off is called as volatile storage.

Computer storage that is not lost when the power is turned off is called as non

volatile storage, Pronounced cash a special high-speed storage mechanism. It can be

either a reserved section of main memory or an independent high-speed storage

device. Two types of caching are commonly used in personal computers: memory

caching and disk caching.

A memory cache, sometimes called a cache store or RAM Cache is a portion of

memory made of high-speed static RAM (SRAM) instead of the slower and cheaper

DRAM used for main memory. Memory caching is effective because most programs

access the same data or instructions over and over. By keeping as much of this

information as possible in SRAM, the computer avoids accessing the slower DRAM.

Some memory caches are built into the architecture of microprocessors.. The Intel

80486 microprocessor, for example, contains an 8K memory cache, and the Pentium

has a 16K cache. Such internal caches are often called Level 1 (L1) caches. Most

modern PCs also come with external cache memory, called Level 2 (L2) caches.

These caches sit between the CPU and the DRAM. Like L1 caches, L2 caches are

composed of SRAM but they are much larger. Disk caching works under the same

principle as memory caching, but instead of using high-speed SRAM, a disk cache

uses conventional main memory. The most recently accessed data from the disk (as

255

Database Management System (CS403)

well as adjacent sectors) is stored in a memory buffer. When a program needs to

access data from the disk, it first checks the disk cache to see if the data is there. Disk

caching can dramatically improve the performance of applications, because accessing

a byte of data in RAM can be thousands of times faster than accessing a byte on a

hard disk.

When data is found in the cache, it is called a cache hit, and the effectiveness of a

cache is judged by its hit rate. Many cache systems use a technique known as smart

caching, in which the system can recognize certain types of frequently used data. The

strategies for determining which information should be kept in the cache constitute

some of the more interesting problems in computer science.

The main memory of the computer is also known as RAM, standing for Random

Access Memory. It is constructed from integrated circuits and needs to have electrical

power in order to maintain its information. When power is lost, the information is lost

too! The CPU can directly access it. The access time to read or write any particular

byte are independent of whereabouts in the memory that byte is, and currently is

approximately 50 nanoseconds (a thousand millionth of a second). This is broadly

comparable with the speed at which the CPU will need to access data. Main memory

is expensive compared to external memory so it has limited capacity. The capacity

available for a given price is increasing all the time. For example many home

Personal Computers now have a capacity of 16 megabytes (million bytes), while 64

megabytes is commonplace on commercial workstations. The CPU will normally

transfer data to and from the main memory in groups of two, four or eight bytes, even

if the operation it is undertaking only requires a single byte.

Flash memory is a form of EEPROM that allows multiple memory locations to be

erased or written in one programming operation. Normal EEPROM only allows one

location at a time to be erased or written, meaning that flash can operate at higher

effective speeds when the system uses it to read and write to different locations at the

same time. All types of flash memory and EEPROM wear out after a certain number

of erase operations, due to wear on the insulating oxide layer around the charge

storage mechanism used to store data.

Flash memory is non-volatile, which means that it stores information on a silicon chip

in a way that does not need power to maintain the information in the chip. This means

that if you turn off the power to the chip, the information is retained without

consuming any power. In addition, flash offers fast read access times and solid-state

shock resistance. These characteristics are why flash is popular for applications such

as storage on battery-powered devices like cellular phones and PDAs.Flash memory is

based on the Floating-Gate Avalanche-Injection Metal Oxide Semiconductor

(FAMOS transistor) which is essentially an NMOS transistor with an additional

conductor suspended between the gate and source/drain terminals.

Magnetic disk is round plate on which data can be encoded. There are two basic types

of disks: magnetic disks and optical disks on magnetic disks, data is encoded as

microscopic magnetized needles on the disk's surface. You can record and erase data

on a magnetic disk any number of times, just as you can with a cassette tape.

Magnetic disks come in a number of different forms:

Floppy Disk: A typical 5�-inch floppy disk can hold 360K or 1.2MB (megabytes).

3�-inch floppies normally store 720K, 1.2MB or 1.44MB of data.

Hard Disk: Hard disks can store anywhere from 20MB to more than 10GB. Hard

disks are also from 10 to 100 times faster than floppy disks.

256

Database Management System (CS403)

Optical disks record data by burning microscopic holes in the surface of the disk with

a laser. To read the disk, another laser beam shines on the disk and detects the holes

by changes in the reflection pattern.

Optical disks come in three basic forms:

CD-ROM: Most optical disks are read-only. When you purchase them, they are

already filled with data. You can read the data from a CD-ROM, but you cannot

modify, delete, or write new data.

WORM: Stands for write-once, read-many. WORM disks can be written on once and

then read any number of times; however, you need a special WORM disk drive to

write data onto a WORM disk.

Erasable optical (EO): EO disks can be read to, written to, and erased just like

magnetic disks.

The machine that spins a disk is called a disk drive. Within each disk drive is one or

more heads (often called read/write heads) that actually read and write data.

Accessing data from a disk is not as fast as accessing data from main memory, but

disks are much cheaper. And unlike RAM, disks hold on to data even when the

computer is turned off. Consequently, disks are the storage medium of choice for

most types of data. Another storage medium is magnetic tape. But tapes are used only

for backup and archiving because they are sequential-access devices (to access data in

the middle of a tape, the tape drive must pass through all the preceding data).

Short for Redundant Array of Independent (or Inexpensive) Disks, a category of disk

drives that employ two or more drives in combination for fault tolerance and

performance. RAID disk drives are used frequently on servers but aren't generally

necessary for personal computers.

Fundamental to RAID is "striping", a method of concatenating multiple drives into

one logical storage unit. Striping involves partitioning each drive's storage space into

stripes which may be as small as one sector (512 bytes) or as large as several

megabytes. These stripes are then interleaved round-robin, so that the combined space

is composed alternately of stripes from each drive. In effect, the storage space of the

drives is shuffled like a deck of cards. The type of application environment, I/O or

data intensive, determines whether large or small stripes should be used.

Most multi-user operating systems today, like NT, UNIX and Netware, support

overlapped disk I/O operations across multiple drives. However, in order to maximize

throughput for the disk subsystem, the I/O load must be balanced across all the drives

so that each drive can be kept busy as much as possible. In a multiple drive system

without striping, the disk I/O load is never perfectly balanced. Some drives will

contain data files which are frequently accessed and some drives will only rarely be

accessed. In I/O intensive environments, performance is optimized by striping the

drives in the array with stripes large enough so that each record potentially falls

entirely within one stripe. This ensures that the data and I/O will be evenly distributed

across the array, allowing each drive to work on a different I/O operation, and thus

maximize the number of simultaneous I/O operations, which can be performed by the

array.

In data intensive environments and single-user systems which access large records,

small stripes (typically one 512-byte sector in length) can be used so that each record

will span across all the drives in the array, each drive storing part of the data from the

record. This causes long record accesses to be performed faster, since the data transfer

occurs in parallel on multiple drives. Unfortunately, small stripes rule out multiple

overlapped I/O operations, since each I/O will typically involve all drives. However,

operating systems like DOS which do not allow overlapped disk I/O, will not be

257

Database Management System (CS403)

negatively impacted. Applications such as on-demand video/audio, medical imaging

and data acquisition, which utilize long record accesses, will achieve optimum

performance with small stripe arrays.

RAID-0

RAID Level 0 is not redundant, hence does not truly fit the "RAID" acronym. In level

0, data is split across drives, resulting in higher data throughput. Since no redundant

information is stored, performance is very good, but the failure of any disk in the

array results in data loss. This level is commonly referred to as striping.

RAID-1

RAID Level 1 provides redundancy by writing all data to two or more drives. The

performance of a level 1 array tends to be faster on reads and slower on writes

compared to a single drive, but if either drive fails, no data is lost. This is a good

entry-level redundant system, since only two drives are required; however, since one

drive is used to store a duplicate of the data, the cost per megabyte is high. This level

is commonly referred to as mirroring.

RAID-2

RAID Level 2, which uses Hamming error correction codes, is intended for use with

drives which do not have built-in error detection. All SCSI drives support built-in

error detection, so this level is of little use when using SCSI drives.

RAID-3

RAID Level 3 stripes data at a byte level across several drives, with parity stored on

one drive. It is otherwise similar to level 4. Byte-level striping requires hardware

support for efficient use.

RAID-4

RAID Level 4 stripes data at a block level across several drives, with parity stored on

one drive. The parity information allows recovery from the failure of any single drive.

The performance of a level 4 array is very good for reads (the same as level 0). Writes,

however, require that parity data be updated each time. This slows small random

writes, in particular, though large writes or sequential writes are fairly fast. Because

only one drive in the array stores redundant data, the cost per megabyte of a level 4

array can be fairly low.

RAID-5

RAID Level 5 is similar to level 4, but distributes parity among the drives. This can

speed small writes in multiprocessing systems, since the parity disk does not become

a bottleneck. Because parity data must be skipped on each drive during reads,

however, the performance for reads tends to be considerably lower than a level 4

array. The cost per megabyte is the same as for level 4.

The manner data records are stored and retrieved on physical devices .The technique

used to find and retrieve store records are called access methods.

Sequential File Organization

Records are arranged on storage devices in some sequence based on the value of some

field, called sequence field. Sequence field is often the key field that identifies the

record.

258

Database Management System (CS403)

Simply, easy to understand and manage, best for providing sequential access. It is not

feasible for direct or random access; inserting/deleting a record in/from the middle of

the sequence involves cumbersome record searches and rewriting of the file.

RAID-0 is the fastest and most efficient array type but offers no fault-tolerance.

RAID-1 is the array of choice for performance-critical, fault-tolerant environments. In

addition, RAID-1 is the only choice for fault-tolerance if no more than two drives are

desired.

RAID-2 is seldom used today since ECC is embedded in almost all modern disk

drives.

RAID-3 can be used in data intensive or single-user environments which access long

sequential records to speed up data transfer. However, RAID-3 does not allow

multiple I/O operations to be overlapped and requires synchronized-spindle drives in

order to avoid performance degradation with short records.

RAID-4 offers no advantages over RAID-5 and does not support multiple

simultaneous write operations.

RAID-5 is the best choices in multi-user environments which are not write

performance sensitive. However, at least three and more typically five drives are

required for RAID-5 arrays.

259

Table of Contents: