Physical Record and De-normalization, Partitioning

<< Physical Database Design: DESIGNING FIELDS, CODING AND COMPRESSION TECHNIQUES

Vertical Partitioning, Replication, MS SQL Server >>

Database Management System (CS403)

Lecture No. 23

Reading Material

"Database Systems Principles, Design and Implementation" written by Catherine Ricardo,

Maxwell Macmillan.

"Database Management Systems", 2nd edition, Raghu Ramakrishnan, Johannes Gehrke,

McGraw-Hill

Overview of Lecture

Physical Record and De-normalization

Partitioning

In the previous lecture, we have studied different data types and the coding techniques.

We have reached now on implementing our database in which relations are now

normalized. Now we will make this database efficient form implementation point of

view.

Physical Record and Denormalization

Denormalization is a technique to move from higher to lower normal forms of

database modeling in order to speed up database access. Denormalization process is

applied for deriving a physical data model from a logical form. In logical data base

design we group things logically related through same primary key. In physical

database design fields are grouped, as they are stored physically and accessed by

DBMS. In general it may decompose one logical relation into separate physical

records, combine some or do both. There is a valid reason for denormalization that is

to enhance the performance. However, there are several indicators, which will help to

identify systems, and tables, which are potential denormalization candidates. These

are:

Many critical queries and reports exist which rely upon data from more than one table.

Often times these requests need to be processed in an on-line environment.

Repeating groups exist which need to be processed in a group instead of individually.

Many calculations need to be applied to one or many columns before queries can be

successfully answered.

Tables need to be accessed in different ways by different users during the same

timeframe.

Certain columns are queried a large percentage of the time. Consider 60% or greater

to be a cautionary number flagging denormalization as an option.

187

Database Management System (CS403)

We should be aware that each new RDBMS release usually bring enhanced

performance and improved access options that may reduce the need for

denormalization. However, most of the popular RDBMS products on occasion will

require denormalized data structures. There are many different types of denormalized

tables, which can resolve the performance problems caused when accessing fully

normalized data. Denormalization must balance the need for good system response

time with the need to maintain data, while avoiding the various anomalies or problems

associated with denormalized table structures. Denormalization goes hand-in-hand

with the detailed analysis of critical transactions through view analysis. View

analysis must include the specification of primary and secondary access paths for

tables that comprise end-user views of the database. A fully normalized database

schema can fail to provide adequate system response time due to excessive table join

operations

Denormalization Situation 1:

Merge two Entity types into one with one to one relationship. Even if one of the entity

type is optional, so joining can lead to wastage of storage, however if two accessed

together very frequently their merging might be a wise decision. So those two

relations must be merged for better performance, which have one to one relationship.

Denormalization Situation 2:

Many to many binary relationships mapped to three relations. Queries needing data

from two participating ETs need joining of three relations that is expensive. Join is an

expensive operation from execution point of view. It takes time and lot of resources.

Now suppose there are two relations STUDENT and COURSE and there exits a many

to many relationship in between them. So there are three relations STUDENT,

COURSE and ENROLLED in between them. Now if we want to see that a student

has enrolled how many courses. So to get this we will have to join three relations, first

the STUDENT and ENROLLED and then joining it with COURSE, which is quite

expensive. The relation created against relationship is merged with one of the relation

created against participating ETs. Now the join operation will be performed only once.

Consider the following many to many relationship:-

EMP (empID, eName,pjId,Sal)

PROJ (pjId,pjName)

WORK (empId.pjId,dtHired,Sal)

This is a many to many relationship in between EMP and PROJ with a relationship of

WORK. So now if we by de-normalizing these relations and merge the WORK

relation with PROJ relation, which is comparatively smaller one. But in this case it is

violating 2NF and anomalies of 2NF would be there. But there would be only one join

operation involved by joining two tables, which increases the efficiency.

EMP (empID, eName,pjId,Sal)

PROJ (pjId,pjName, empId,dtHired,Sal)

So now it is up to you that you want to weigh the drawbacks and advantages of

denormalization.

Denormalization Situation 3:

Reference Data: One to many situation when the ET on side does not participate in

any other relationship, then many side ET is appended with reference data rather than

the foreign key. In this case the reference table should be merged with the main table.

188

Database Management System (CS403)

We can see it with STUDENT and HOBBY relations. One student can have one

hobby and one hobby can be adopted by many students. Now in this case the hobby

can be merged with the student relation. So in this case although redundancy of data

would be there, but there would not be any joining of two relations, which will have a

better performance.

Partitioning

De-normalization leads to merging different relations, whereas partitioning splits

same relation into two. The general aims of data partitioning and placement in

database are to

1. Reduce workload (e.g. data access, communication costs, search space)

Balance

workload

3. Speed up the rate of useful work (e.g. frequently accessed objects in main

memory)

There are two types of partitioning:-

Horizontal Partitioning

Vertical Partitioning

Horizontal Partitioning:

Table is split on the basis of rows, which means a larger table is split into smaller

tables. Now the advantage of this is that time in accessing the records of a larger table

is much more than a smaller table. It also helps in the maintenance of tables, security,

authorization and backup. These smaller partitions can also be placed on different

disks to reduce disk contention. Some of the types of horizontal partitioning are as

under:-

Range Partitioning:

In this type of partitioning range is imposed on any particular attribute. So in this

sway different partitions are made on the basis of those ranges with the help of select

statement. For Example for those students whose ID is from 1-1000 are in partition 1

and so on. This will improve the overall efficiency of the database. In range partition

the partitions may become unbalanced. So in this way few partitions may be

overloaded.

Hash Partitioning:

It is a type of horizontal partitioning. In this type particular algorithm is applied and

DBMS knows that algorithm. So hash partitioning reduces the chances of unbalanced

partitions to a large extent.

List Partitioning:

In this type of partitioning the values are specified for every partition. So there is a

specified list for all the partitions. So there is no range involved in this rather there is a

list of values.

Summary:

De-normalization can lead to improved processing efficiency. The objective is to

improve system response time without incurring a prohibitive amount of additional

189

Database Management System (CS403)

data maintenance requirements. This is especially important for client-server

systems. Denormalization requires thorough system testing to prove the effect that

denormalized table structures have on processing efficiency. Furthermore, unseen ad

hoc data queries may be adversely affected by denormalized table

structures. Denormalization must be accomplished in conjunction with a detailed

analysis of the tables required to support various end-user views of the database. This

analysis must include the identification of primary and secondary access paths to data.

Similarly before carrying out partitioning of the table thorough analysis of the

relations is must.

Exercise:

Critically examine the tables drawn for Examination system and see if there is a

requirement of denormalization and partitioning and then carry out the process.

190

Table of Contents: