ZeePedia

Second, Third Normal Form, Boyce - Codd Normal Form, Higher Normal Forms

<< Functional Dependency, Inference Rules, Normal Forms
Normalization Summary, Example, Physical Database Design >>
img
Database Management System (CS403)
VU
Lecture No. 20
Reading Material
"Database Systems Principles, Design and Implementation"
Section 7.7 ­ 7.10
written by Catherine Ricardo, Maxwell Macmillan.
"Database Management Systems", 2nd edition, Raghu Ramakrishnan, Johannes Gehrke,
McGraw-Hill
Overview of Lecture:
o Second and Third Normal Form
o Boyce - Codd Normal Form
o Higher Normal Forms
In the previous lecture we have discussed functional dependency, the inference rules
and the different normal forms. From this lecture we will study in length the second
and third normal forms.
Second Normal Form
A relation is in second normal form if and only if it is in first normal form and all
nonkey attributes are fully functionally dependent on the key. Clearly if a relation is
in 1NF and the key consists of a single attribute, the relation is automatically 2NF.
The only time we have to be concerned 2NF is when the key is composite. A relation
that is not in 2NF exhibits the update, insertion and deletion anomalies we will now
see it with an example. Consider the following relation.
CLASS (crId, stId, stName, fId, room, grade)
crId, stId
stName, fId, room, grade
stId
stName
crId
fId, room
Now in this relation the key is course ID and student ID. The requirement of 2NF is
that all non-key attributes should be fully dependent on the key there should be no
169
img
Database Management System (CS403)
VU
partial dependency of the attributes. But in this relation student ID is dependent on
student name and similarly course ID is partially dependent on faculty ID and room,
so it is not in second normal form. At this level of normalization, each column in a
table that is not a determiner of the contents of another column must itself be a
function of the other columns in the table. For example, in a table with three columns
containing customer ID, product sold, and price of the product when sold, the price
would be a function of the customer ID (entitled to a discount) and the specific
product. If a relation is not in 2NF then there are some anomalies, which are as under:
-
·
Redundancy
·
Insertion Anomaly
·
Deletion Anomaly
·
Updation Anomaly
The general requirements of 2NF are:-
·
Remove subsets of data that apply to multiple rows of a table and place them
in separate rows.
·
Create relationships between these new tables and their predecessors through
the use of foreign keys.
Consider the following table which has the anomalies:-
crId
StId
stName
fId
room
grade
C3456
S1020
Suhail Dar
F2345
104
B
C5678
S1020
Suhail Dar
F4567
106
C3456
S1038
Shoaib Ali
F2345
104
A
C5678
S1015
Tahira Ejaz
F4567
106
B
Now the first thing is that the table is in 1NF because there are no duplicate values in
any tuple and all cells contain atomic value. The first thing is the redundancy. Like in
this table of CLASS the course ID C3456 is being repeated for faculty ID F2345 and
similarly the room no 104 is being repeated twice. Second is the insertion anomaly.
Suppose we want to insert a course in the table, but this course has not been registered
to any student. But we cannot enter the student ID, because no student has registered
this course yet. So we can also not insert this course. This is called as insertion
anomaly which is wrong state of database. Next is the deletion anomaly. Suppose
there is a course which has been enrolled by one student only. Now due to some
170
img
Database Management System (CS403)
VU
reason, we want to delete the record of student. But here the information about the
course will also be deleted, so in this way this is the incorrect state of database in
which infact we want to delete the information about the student record but along with
this the course information has also been deleted. So it is not reflecting the actual
system. Now the next is updation anomaly. Suppose a course has been registered by
50 students and now we want to change the class rooms of all the students. So in this
case we will have to change the records of all the 50 students. So this is again a
deletion anomaly. The process for transforming a 1NF table to 2NF is:
·
Identify any determinants other than the composite key, and the columns they
determine.
·
Create and name a new table for each determinant and the unique columns it
determines.
·
Move the determined columns from the original table to the new table. The
determinate becomes the primary key of the new table.
·
Delete the columns you just moved from the original table except for the
determinant which will serve as a foreign key.
·
The original table may be renamed to maintain semantic meaning.
Now to remove all these anomalies from the table we will have to decompose this
table, into different tables as under:
CLASS (crId, stId, stName, fId, room, grade)
crId, stId
stName, fId, room, grade
stId
stName
crId
fId, room
Now this table has been decomposed into three tables as under:-
STD (stId, stName)
COURSE (crId, fId, room)
CLASS (crId, stId, grade)
So now these three tables are in second normal form. There are no anomalies
available now in this form and we say this as 2NF.
Third Normal Form
A relational table is in third normal form (3NF) if it is already in 2NF and
every non-key column is non-transitively dependent upon its primary key. In
171
img
Database Management System (CS403)
VU
other words, all nonkey attributes are functionally dependent only upon the
primary key.
Transitive Dependency
Transitive dependency is one that carries over another attribute. Transitive
dependency occurs when one non-key attribute determines another non-key
attribute. For third normal form we concentrate on relations with one
candidate key, and we eliminate transitive dependencies. Transitive
dependencies cause insertion, deletion, and update anomalies. We will now
see it with an example:-
STD(stId, stName, stAdr, prName, prCrdts)
stId
stName, stAdr, prName, prCrdts
prName
prCrdts
Now here the table is in second normal form. As there is no partial dependency of any
attributes here. The key is student ID . The problem is of transitive dependency in
which a non-key attribute can be determined by a non-key attribute. Like here the
program credits can be determined by program name, which is not in 3NF. It also
causes same four anomalies, which are due to transitive dependencies. For Example:-
STUDENT
stId
stName
stAdr
prName
prCrdts
S1020
Sohail Dar
I-8 Islamabad
MCS
64
S1038
Shoaib Ali
G-6 Islamabad
BCS
132
S1015
Tahira Ejaz
L Rukh Wah
MCS
64
S1015
Tahira Ejaz
L Rukh Wah
MCS
64
S1018
Arif Zia
E-8,
BIT
134
Islamabad.
Now in this table all the four anomalies are exists in the table. So we will have to
remove these anomalies by decomposing this table after removing the transitive
dependency. We will see it as under: -
STD (stId, stName, stAdr, prName, prCrdts)
stId
stName, stAdr, prName, prCrdts
prName
prCrdts
The process of transforming a table into 3NF is:
172
img
Database Management System (CS403)
VU
·
Identify any determinants, other the primary key, and the columns they
determine.
·
Create and name a new table for each determinant and the unique columns it
determines.
·
Move the determined columns from the original table to the new table. The
determinate becomes the primary key of the new table.
·
Delete the columns you just moved from the original table except for the
determinate which will serve as a foreign key.
·
The original table may be renamed to maintain semantic meaning.
STD (stId, stName, stAdr, prName)
PROGRAM (prName, prCrdts)
We have now decomposed the relation into two relations of student and program. So
the relations are in third normal form and are free of all the anomalies
Boyce - Codd Normal Form
A relation is in Boyce-Codd normal form id and only if every determinant is a
candidate key. A relation R is said to be in BCNF if whenever X -> A holds in R, and
A is not in X, then X is a candidate key for R. It should be noted that most relations
that are in 3NF are also in BCNF. Infrequently, a 3NF relation is not in BCNF and
this happens only if
(a) the candidate keys in the relation are composite keys (that is, they are not single
attributes),
(b)
there
is
more
than
one
candidate
key
in
the
relation,
and
(c) the keys are not disjoint, that is, some attributes in the keys are common.
The BCNF differs from the 3NF only when there are more than one candidate keys
and the keys are composite and overlapping. Consider for example, the relationship:
enrol (sno, sname, cno, cname, date-enrolled)
Let us assume that the relation has the following candidate keys:
173
img
Database Management System (CS403)
VU
(sno,cno)
(sno,cname)
(sname,cno)
(sname, cname)
(we have assumed sname and cname are unique identifiers). The relation is in 3NF
but not in BCNF because there are dependencies
sno -> sname
cno -> cname
Where attributes are part of a candidate key are dependent on part of another
candidate key. Such dependencies indicate that although the relation is about some
entity or association that is identified by the candidate keys e.g. (sno, cno), there are
attributes that are not about the whole thing that the keys identify. For example, the
above relation is about an association (enrolment) between students and subjects and
therefore the relation needs to include only one identifier to identify students and one
identifier to identify subjects. Provided that two identifiers about the students (sno,
sname) and two keys about subjects (cno, cname) means that some information about
the students and subjects which is not needed is being provided. This provision of
information will result in repetition of information and the anomalies that we
discussed at the beginning of this chapter. If we wish to include further information
about students and courses in the database, it should not be done by putting the
information in the present relation but by creating new relations that represent
information about entities student and subject.
These difficulties may be overcome by decomposing the above relation in the
following three relations:
(sno, sname)
(cno, cname)
(sno, cno, date-of-enrolment)
174
img
Database Management System (CS403)
VU
We now have a relation that only has information about students, another only about
subjects and the third only about enrolments. All the anomalies and repetition of
information have been removed.
Higher Normal Forms
After BCNF are the fourth, a fifth and domain key normal form exists. Although till
BCNF normal form tables are in required form, but if we want we can move on to
fourth and fifth normal forms as well. 4NF deals with multivalued dependency, fifth
deals with possible loss less decompositions; DKNF reduces further chances of any
possible inconsistency.
Summary
The goal of normalization is to create a set of relational tables that are free of
redundant data and that can be consistently and correctly modified. This means that
all tables in a relational database should be in the third normal form (3NF). A
relational table is in 3NF if and only if all non-key columns are (a) mutually
independent and (b) fully dependent upon the primary key. Mutual independence
means that no non-key column is dependent upon any combination of the other
columns. The first two normal forms are intermediate steps to achieve the goal of
having all tables in 3NF. In order to better understand the 2NF and higher forms, it is
necessary to understand the concepts of functional dependencies and loss less
decomposition.
Exercise:
The tables of Examination System which were brought in 1NF in previous lecture
bring those tables into 2 and 3NF.
175
Table of Contents:
  1. Introduction to Databases and Traditional File Processing Systems
  2. Advantages, Cost, Importance, Levels, Users of Database Systems
  3. Database Architecture: Level, Schema, Model, Conceptual or Logical View:
  4. Internal or Physical View of Schema, Data Independence, Funct ions of DBMS
  5. Database Development Process, Tools, Data Flow Diagrams, Types of DFD
  6. Data Flow Diagram, Data Dictionary, Database Design, Data Model
  7. Entity-Relationship Data Model, Classification of entity types, Attributes
  8. Attributes, The Keys
  9. Relationships:Types of Relationships in databases
  10. Dependencies, Enhancements in E-R Data Model. Super-type and Subtypes
  11. Inheritance Is, Super types and Subtypes, Constraints, Completeness Constraint, Disjointness Constraint, Subtype Discriminator
  12. Steps in the Study of system
  13. Conceptual, Logical Database Design, Relationships and Cardinalities in between Entities
  14. Relational Data Model, Mathematical Relations, Database Relations
  15. Database and Math Relations, Degree of a Relation
  16. Mapping Relationships, Binary, Unary Relationship, Data Manipulation Languages, Relational Algebra
  17. The Project Operator
  18. Types of Joins: Theta Join, Equi–Join, Natural Join, Outer Join, Semi Join
  19. Functional Dependency, Inference Rules, Normal Forms
  20. Second, Third Normal Form, Boyce - Codd Normal Form, Higher Normal Forms
  21. Normalization Summary, Example, Physical Database Design
  22. Physical Database Design: DESIGNING FIELDS, CODING AND COMPRESSION TECHNIQUES
  23. Physical Record and De-normalization, Partitioning
  24. Vertical Partitioning, Replication, MS SQL Server
  25. Rules of SQL Format, Data Types in SQL Server
  26. Categories of SQL Commands,
  27. Alter Table Statement
  28. Select Statement, Attribute Allias
  29. Data Manipulation Language
  30. ORDER BY Clause, Functions in SQL, GROUP BY Clause, HAVING Clause, Cartesian Product
  31. Inner Join, Outer Join, Semi Join, Self Join, Subquery,
  32. Application Programs, User Interface, Forms, Tips for User Friendly Interface
  33. Designing Input Form, Arranging Form, Adding Command Buttons
  34. Data Storage Concepts, Physical Storage Media, Memory Hierarchy
  35. File Organizations: Hashing Algorithm, Collision Handling
  36. Hashing, Hash Functions, Hashed Access Characteristics, Mapping functions, Open addressing
  37. Index Classification
  38. Ordered, Dense, Sparse, Multi-Level Indices, Clustered, Non-clustered Indexes
  39. Views, Data Independence, Security, Vertical and Horizontal Subset of a Table
  40. Materialized View, Simple Views, Complex View, Dynamic Views
  41. Updating Multiple Tables, Transaction Management
  42. Transactions and Schedules, Concurrent Execution, Serializability, Lock-Based Concurrency Control, Deadlocks
  43. Incremental Log with Deferred, Immediate Updates, Concurrency Control
  44. Serial Execution, Serializability, Locking, Inconsistent Analysis
  45. Locking Idea, DeadLock Handling, Deadlock Resolution, Timestamping rules