ZeePedia

Relational Data Model, Mathematical Relations, Database Relations

<< Conceptual, Logical Database Design, Relationships and Cardinalities in between Entities
Database and Math Relations, Degree of a Relation >>
img
Database Management System (CS403)
VU
Lecture No. 14
Reading Material
"Database Systems Principles, Design and Implementation"
Section 6.1 ­ 6.3.3
written by Catherine Ricardo, Maxwell Macmillan.
"Database  Management  Systems",  2nd  edition,
Raghu
Ramakrishnan, Johannes Gehrke, McGraw-Hill
Overview of Lecture
Logical Database Design
o
Introduction to Relational Data Model
o
Basic properties of a table
o
Mathematical and database relations
o
From this lecture we are going to discuss the logical database design phase of
database development process. Logical database design, like conceptual database
design is our database design; it represents the structure of data that we need to store
to fulfill the requirements of the users or organization for which we are developing the
system. However there are certain differences between the two that are presented in
the table below:
Conceptual Database Design
Logical Database Design
Developed in a semantic data model In legacy data models (relational
1
(generally E-R data model)
generally in current age)
Free of particular DBMS in which
Free of data model in which going to be
2
going to be implemented; many/any
implemented; many/any possible
possible
Obtained
by
translating
the
3
Results from Analysis Phase
conceptual
database
design
into
another data model
124
img
Database Management System (CS403)
VU
4
Represented graphically
Descriptive
5
More expressive
Relatively less expressive
Going to be transformed and then
6
Going to be implemented
implemented
You can think more, give a try
Table 1: Differences between Conceptual and Logical Database Designs
As we have already discussed in previous lectures and as is given in row 2 of the
above table, the conceptual database design can be transformed into any data model,
like, hierarchical, network, relational or object-oriented. So the study of the logical
database design requires first involves the study of the data model/(s) that we can
possibly use for the purpose. However, in the current age, since early eighties, the
most popular choice for the logical database design is the relational data model; so
much popular that today it can be considered to be the only choice. Why? Because of
its features we are going to discuss in today's lecture. That is why rather than studying
different data models we will be studying only the relational data model. Once we
study this, the development of logical database design is transformation of conceptual
database design to relational one and the process is very simple and straightforward.
So from today's lecture our discussion starts on the relational data model. Just for the
sake of revision we repeat the definition of data model "a set of constructs/tools used
to develop a database design; generally consists of three components which are
constructs, manipulation language and integrity constraints". We have discussed it
earlier that the later part of the definition (three components) fits precisely with the
relational data model (RDM), that is, it has these components defined clearly.
Relational Data Model
The RDM is popular due to its two major strengths and they are:
o Simplicity
o Strong Mathematical Foundation
The RDM is simple, why, there is just one structure and that is a relation or a table.
Even this single structure is very easy to understand, so a user of even of a moderate
125
img
Database Management System (CS403)
VU
genius can understand it easily. Secondly, it has a strong mathematical foundation that
gives many advantages, like:
o Anything included/defined in RDM has got a precise meaning since it is based
on mathematics, so there is no confusion.
o If we want to test something regarding RDM we can test it mathematically, if it
works mathematically it will work with RDM (apart from some exceptions).
o The mathematics not only provided the RDM the structure (relation) but also
well defined manipulation languages (relational algebra and relational calculus).
o It provided RDM certain boundaries, so any modification or addition we want to
make in RDM, we have to see if it complies with the relational mathematics or
not. We cannot afford to cross these boundaries since we will be losing the huge
advantages provided by the mathematical backup.
"An IBM scientist E.F. Codd proposed the relational data model in 1970. At that
time most database systems were based on one of two older data models (the
hierarchical model and the network model); the relational model revolutionized
the database field and largely replaced these earlier models. Prototype relational
database management systems were developed in pioneering research projects at
IBM and UC-Berkeley by the mid-70s, and several vendors were offering
relational database products shortly thereafter. Today, the relational model is by
far the dominant data model and is the foundation for the leading DBMS
products, including IBM's DB2 family, Informix, Oracle, Sybase, Microsoft's
Access and SQLServer, FoxBase, and Paradox. Relational database systems are
ubiquitous in the marketplace and represent a multibillion dollar industry" [1]
The RDM is mainly used for designing/defining external and conceptual schemas;
however to some extent physical schema is also specified in it. Separation of
conceptual and physical levels makes data and schema manipulation much easier,
contrary to previous data models. So the relational data model also truly supports
"Three Level Schema Architecture".
Introduction to the Relational Data model
The RDM is based on a single structure and that is a relation. Speaking in terms of the
E-R data model, both the entity types and relationships are represented using relations
126
img
Database Management System (CS403)
VU
in RDM. The relation in RDM is similar to the mathematical relation however
database relation is also represented in a two dimensional structure called table. A
table consists of rows and columns. Rows of a table are also called tuples. A row or
tuple of a table represents a record or an entity instance, where as the columns of the
table represent the properties or attributes.
stID
stName
clName
doB
sex
S001
M. Suhail
MCS
12/6/84
M
S002
M. Shahid
BCS
3/9/86
M
S003
Naila S.
MCS
7/8/85
F
S004
Rubab A.
MBA
23/4/86
F
S005
Ehsan M.
BBA
22/7/88
M
Table 2: A database relation represented in the form of a table
In the above diagram, a table is shown that consists of five rows and five columns.
The top most rows contain the names of the columns or attributes whereas the rows
represent the records or entity instances. There are six basic properties of the database
relations which are:
·
Each cell of a table contains atomic/single value
A cell is the intersection of a row and a column, so it represents a value of an
attribute in a particular row. The property means that the value stored in a single cell
is considered as a single value. In real life we see many situations when a
property/attribute of any entity contains multiple values, like, degrees that a person
has, hobbies of a student, the cars owned by a person, the jobs of an employee. All
these attributes have multiple values; these values cannot be placed as the value of a
single attribute or in a cell of the table. It does not mean that the RDM cannot
handle such situations, however, there are some special means that we have to adopt
in these situations, and they can not be placed as the value of an attribute because an
attribute can contain only a single value. The values of attributes shown in table 1
are all atomic or single.
127
img
Database Management System (CS403)
VU
·
Each column has a distinct name; the name of the attribute it represents
Each column has a heading that is basically the name of the attribute that the
column represents. It has to be unique, that is, a table cannot have duplicated
column/attribute names. In the table 2 above, the bold items in the first row
represent the column/attribute names.
·
The values of the attributes come from the same domain
Each attribute is assigned a domain along with the name when it is defined. The
domain represents the set of possible values that an attribute can have. Once the
domain has been assigned to an attribute, then all the rows that are added into the
table will have the values from the same domain for that particular column. For
example, in the table 2 shown above the attribute doB (date of birth) is assigned the
domain "Date", now all the rows have the date value against the attribute doB. This
attribute cannot have a text or numeric value.
·
The order of the columns is immaterial
If the order of the columns in a table is changed, the table still remains the same.
Order of the columns does not matter.
·
The order of the rows is immaterial
As with the columns, if rows' order is changed the table remains the same.
·
Each row/tuple/record is distinct, no two rows can be same
Two rows of a table cannot be same. The value of even a single attribute has to be
different that makes the entire row distinct.
There are three components of the RDM, which are, construct (relation), manipulation
language (SQL) and integrity constraints (two). We have discussed the relation so far;
the last two components will be discussed later. In the next section we are going to
128
img
Database Management System (CS403)
VU
discuss the mathematical relations briefly that will help to link the mathematical
relations with the database relations and will help in a better understanding of the later.
Mathematical Relations
Consider two sets
A = {x, y}
B = {2, 4, 6}
Cartesian product of these sets (A x B) is a set that consists of ordered pairs where
first element of the ordered pair belongs to set A where as second element belongs to
set B, as shown below:
A X B= {(x,2), (x,4), (x,6), (y,2), (y,4), (y,6)}
A relation is some subset of this Cartesian product, For example,
·
R1= {(x,2), (y,2),(x,6),(x,4)}
·
R2 = {(x,4), (y,6), (y,4)}
The same notion of Cartesian product and relations can be applied to more than two
sets, e.g. in case of three sets, we will have a relation of ordered triplets
Applying the same concept in a real world scenario, consider two sets Name and Age
having the elements:
·
Name = {Ali, Sana, Ahmed, Sara}
·
Age = {15,16,17,18,.......,25}
Cartesian product of Name & Age
Name X Age= {(Ali,15), (Sana,15), (Ahmed,15), (Sara,15), ...., (Ahmed,25),
(Sara,25)}
Now consider a subset CLASS of this Cartesian product
CLASS = {(Ali, 18), (Sana, 17), (Ali, 20), (Ahmed, 19)}
This subset CLASS is a relation mathematically, however, it may represent a class in
the real world where each ordered pair represents a particular student mentioning the
name and age of a student. In the database context each ordered pair represents a tuple
and elements in the ordered pairs represent values of the attributes. Think in this way,
if Name and Age represent all possible values for names and ages of students, then
any class you consider that will definitely be a subset of the Cartesian product of the
Name and Age. That is, the name and age combination of all the students of any class
129
img
Database Management System (CS403)
VU
will be included in the Cartesian product and if we take out particulars ordered pairs
that are related to a class then that will be a subset of the Cartesian product, a relation.
Database Relations
Let A1, A2, A3, ..., An be some attributes and D1, D2, D3,..., Dn be their domains A
relation scheme relates certain attributes with their domain in context of a relation. A
relation scheme can be represented as:
R = (A1:D1, A2:D2, ......, An:Dn), for example,
STD Scheme = (stId:Text, stName: Text, stAdres:Text, doB:Date) OR
STD(stId, stName, stAdres, doB)
Whereas the stId, stName, stAdres and doB are the attribute names and Text, Text,
Text and Date are their respective domains. A database relation as per this relation
scheme can be:
STD={(stId:S001,
stName:Ali, stAdres:
Lahore, doB:12/12/76),
(stId:S003,
stName:A. Rehman, stAdres: RWP, doB:2/12/77)} OR
STD={(S001, Ali, Lahore, 12/12/76), (S003, A. Rehman, RWP, 2/12/77)}
The above relation if represented in a two dimensional structure will be called a table
as is shown below:
stId
stName
stAdres doB
S001
Ali
Lahore 12/12/76
S002
A. Rehman RWP
2/12/77
With this, today's lecture is finished; the discussion on RDM will be continued in the
next lecture.
Summary
In this lecture we have started the discussion on the logical database design that we
develop from the conceptual database design. The later is generally developed using
E-R data model, whereas for the former RDM is used. RDM is based on the theory of
mathematical relations; a mathematical relation is subset of the Cartesian product of
two or more sets. Relations are physically represented in the form of two-dimensional
130
img
Database Management System (CS403)
VU
structure called table, where rows/tuples represent records and columns represent the
attributes.
Exercise:
Define different attributes (assigning name and domain to each) for an entity
STUDENT, then apply the concept of Cartesian product on the domains of these
attributes, then consider the records of your class fellows and see if it is the subset of
the Cartesian product.
131
Table of Contents:
  1. Introduction to Databases and Traditional File Processing Systems
  2. Advantages, Cost, Importance, Levels, Users of Database Systems
  3. Database Architecture: Level, Schema, Model, Conceptual or Logical View:
  4. Internal or Physical View of Schema, Data Independence, Funct ions of DBMS
  5. Database Development Process, Tools, Data Flow Diagrams, Types of DFD
  6. Data Flow Diagram, Data Dictionary, Database Design, Data Model
  7. Entity-Relationship Data Model, Classification of entity types, Attributes
  8. Attributes, The Keys
  9. Relationships:Types of Relationships in databases
  10. Dependencies, Enhancements in E-R Data Model. Super-type and Subtypes
  11. Inheritance Is, Super types and Subtypes, Constraints, Completeness Constraint, Disjointness Constraint, Subtype Discriminator
  12. Steps in the Study of system
  13. Conceptual, Logical Database Design, Relationships and Cardinalities in between Entities
  14. Relational Data Model, Mathematical Relations, Database Relations
  15. Database and Math Relations, Degree of a Relation
  16. Mapping Relationships, Binary, Unary Relationship, Data Manipulation Languages, Relational Algebra
  17. The Project Operator
  18. Types of Joins: Theta Join, Equi–Join, Natural Join, Outer Join, Semi Join
  19. Functional Dependency, Inference Rules, Normal Forms
  20. Second, Third Normal Form, Boyce - Codd Normal Form, Higher Normal Forms
  21. Normalization Summary, Example, Physical Database Design
  22. Physical Database Design: DESIGNING FIELDS, CODING AND COMPRESSION TECHNIQUES
  23. Physical Record and De-normalization, Partitioning
  24. Vertical Partitioning, Replication, MS SQL Server
  25. Rules of SQL Format, Data Types in SQL Server
  26. Categories of SQL Commands,
  27. Alter Table Statement
  28. Select Statement, Attribute Allias
  29. Data Manipulation Language
  30. ORDER BY Clause, Functions in SQL, GROUP BY Clause, HAVING Clause, Cartesian Product
  31. Inner Join, Outer Join, Semi Join, Self Join, Subquery,
  32. Application Programs, User Interface, Forms, Tips for User Friendly Interface
  33. Designing Input Form, Arranging Form, Adding Command Buttons
  34. Data Storage Concepts, Physical Storage Media, Memory Hierarchy
  35. File Organizations: Hashing Algorithm, Collision Handling
  36. Hashing, Hash Functions, Hashed Access Characteristics, Mapping functions, Open addressing
  37. Index Classification
  38. Ordered, Dense, Sparse, Multi-Level Indices, Clustered, Non-clustered Indexes
  39. Views, Data Independence, Security, Vertical and Horizontal Subset of a Table
  40. Materialized View, Simple Views, Complex View, Dynamic Views
  41. Updating Multiple Tables, Transaction Management
  42. Transactions and Schedules, Concurrent Execution, Serializability, Lock-Based Concurrency Control, Deadlocks
  43. Incremental Log with Deferred, Immediate Updates, Concurrency Control
  44. Serial Execution, Serializability, Locking, Inconsistent Analysis
  45. Locking Idea, DeadLock Handling, Deadlock Resolution, Timestamping rules