ZeePedia

Physical Record and De-normalization, Partitioning

<< Physical Database Design: DESIGNING FIELDS, CODING AND COMPRESSION TECHNIQUES
Vertical Partitioning, Replication, MS SQL Server >>
img
Database Management System (CS403)
VU
Lecture No. 23
Reading Material
"Database Systems Principles, Design and Implementation" written by Catherine Ricardo,
Maxwell Macmillan.
"Database Management Systems", 2nd edition, Raghu Ramakrishnan, Johannes Gehrke,
McGraw-Hill
Overview of Lecture
Physical Record and De-normalization
Partitioning
In the previous lecture, we have studied different data types and the coding techniques.
We have reached now on implementing our database in which relations are now
normalized. Now we will make this database efficient form implementation point of
view.
Physical Record and Denormalization
Denormalization is a technique to move from higher to lower normal forms of
database modeling in order to speed up database access. Denormalization process is
applied for deriving a physical data model from a logical form. In logical data base
design we group things logically related through same primary key. In physical
database design fields are grouped, as they are stored physically and accessed by
DBMS. In general it may decompose one logical relation into separate physical
records, combine some or do both. There is a valid reason for denormalization that is
to enhance the performance. However, there are several indicators, which will help to
identify systems, and tables, which are potential denormalization candidates. These
are:
Many critical queries and reports exist which rely upon data from more than one table.
Often times these requests need to be processed in an on-line environment.
Repeating groups exist which need to be processed in a group instead of individually.
Many calculations need to be applied to one or many columns before queries can be
successfully answered.
Tables need to be accessed in different ways by different users during the same
timeframe.
Certain columns are queried a large percentage of the time. Consider 60% or greater
to be a cautionary number flagging denormalization as an option.
187
img
Database Management System (CS403)
VU
We should be aware that each new RDBMS release usually bring enhanced
performance and improved access options that may reduce the need for
denormalization. However, most of the popular RDBMS products on occasion will
require denormalized data structures. There are many different types of denormalized
tables, which can resolve the performance problems caused when accessing fully
normalized data. Denormalization must balance the need for good system response
time with the need to maintain data, while avoiding the various anomalies or problems
associated with denormalized table structures. Denormalization goes hand-in-hand
with the detailed analysis of critical transactions through view analysis. View
analysis must include the specification of primary and secondary access paths for
tables that comprise end-user views of the database. A fully normalized database
schema can fail to provide adequate system response time due to excessive table join
operations
Denormalization Situation 1:
Merge two Entity types into one with one to one relationship. Even if one of the entity
type is optional, so joining can lead to wastage of storage, however if two accessed
together very frequently their merging might be a wise decision. So those two
relations must be merged for better performance, which have one to one relationship.
Denormalization Situation 2:
Many to many binary relationships mapped to three relations. Queries needing data
from two participating ETs need joining of three relations that is expensive. Join is an
expensive operation from execution point of view. It takes time and lot of resources.
Now suppose there are two relations STUDENT and COURSE and there exits a many
to many relationship in between them. So there are three relations STUDENT,
COURSE and ENROLLED in between them. Now if we want to see that a student
has enrolled how many courses. So to get this we will have to join three relations, first
the STUDENT and ENROLLED and then joining it with COURSE, which is quite
expensive. The relation created against relationship is merged with one of the relation
created against participating ETs. Now the join operation will be performed only once.
Consider the following many to many relationship:-
EMP (empID, eName,pjId,Sal)
PROJ (pjId,pjName)
WORK (empId.pjId,dtHired,Sal)
This is a many to many relationship in between EMP and PROJ with a relationship of
WORK. So now if we by de-normalizing these relations and merge the WORK
relation with PROJ relation, which is comparatively smaller one. But in this case it is
violating 2NF and anomalies of 2NF would be there. But there would be only one join
operation involved by joining two tables, which increases the efficiency.
EMP (empID, eName,pjId,Sal)
PROJ (pjId,pjName, empId,dtHired,Sal)
So now it is up to you that you want to weigh the drawbacks and advantages of
denormalization.
Denormalization Situation 3:
Reference Data: One to many situation when the ET on side does not participate in
any other relationship, then many side ET is appended with reference data rather than
the foreign key. In this case the reference table should be merged with the main table.
188
img
Database Management System (CS403)
VU
We can see it with STUDENT and HOBBY relations. One student can have one
hobby and one hobby can be adopted by many students. Now in this case the hobby
can be merged with the student relation. So in this case although redundancy of data
would be there, but there would not be any joining of two relations, which will have a
better performance.
Partitioning
De-normalization leads to merging different relations, whereas partitioning splits
same relation into two. The general aims of data partitioning and placement in
database are to
1. Reduce workload (e.g. data access, communication costs, search space)
2.
Balance
workload
3. Speed up the rate of useful work (e.g. frequently accessed objects in main
memory)
There are two types of partitioning:-
Horizontal Partitioning
Vertical Partitioning
Horizontal Partitioning:
Table is split on the basis of rows, which means a larger table is split into smaller
tables. Now the advantage of this is that time in accessing the records of a larger table
is much more than a smaller table. It also helps in the maintenance of tables, security,
authorization and backup. These smaller partitions can also be placed on different
disks to reduce disk contention. Some of the types of horizontal partitioning are as
under:-
Range Partitioning:
In this type of partitioning range is imposed on any particular attribute. So in this
sway different partitions are made on the basis of those ranges with the help of select
statement. For Example for those students whose ID is from 1-1000 are in partition 1
and so on. This will improve the overall efficiency of the database. In range partition
the partitions may become unbalanced. So in this way few partitions may be
overloaded.
Hash Partitioning:
It is a type of horizontal partitioning. In this type particular algorithm is applied and
DBMS knows that algorithm. So hash partitioning reduces the chances of unbalanced
partitions to a large extent.
List Partitioning:
In this type of partitioning the values are specified for every partition. So there is a
specified list for all the partitions. So there is no range involved in this rather there is a
list of values.
Summary:
De-normalization can lead to improved processing efficiency. The objective is to
improve system response time without incurring a prohibitive amount of additional
189
img
Database Management System (CS403)
VU
data maintenance requirements. This is especially important for client-server
systems. Denormalization requires thorough system testing to prove the effect that
denormalized table structures have on processing efficiency. Furthermore, unseen ad
hoc  data  queries  may  be  adversely  affected  by  denormalized  table
structures. Denormalization must be accomplished in conjunction with a detailed
analysis of the tables required to support various end-user views of the database. This
analysis must include the identification of primary and secondary access paths to data.
Similarly before carrying out partitioning of the table thorough analysis of the
relations is must.
Exercise:
Critically examine the tables drawn for Examination system and see if there is a
requirement of denormalization and partitioning and then carry out the process.
190
Table of Contents:
  1. Introduction to Databases and Traditional File Processing Systems
  2. Advantages, Cost, Importance, Levels, Users of Database Systems
  3. Database Architecture: Level, Schema, Model, Conceptual or Logical View:
  4. Internal or Physical View of Schema, Data Independence, Funct ions of DBMS
  5. Database Development Process, Tools, Data Flow Diagrams, Types of DFD
  6. Data Flow Diagram, Data Dictionary, Database Design, Data Model
  7. Entity-Relationship Data Model, Classification of entity types, Attributes
  8. Attributes, The Keys
  9. Relationships:Types of Relationships in databases
  10. Dependencies, Enhancements in E-R Data Model. Super-type and Subtypes
  11. Inheritance Is, Super types and Subtypes, Constraints, Completeness Constraint, Disjointness Constraint, Subtype Discriminator
  12. Steps in the Study of system
  13. Conceptual, Logical Database Design, Relationships and Cardinalities in between Entities
  14. Relational Data Model, Mathematical Relations, Database Relations
  15. Database and Math Relations, Degree of a Relation
  16. Mapping Relationships, Binary, Unary Relationship, Data Manipulation Languages, Relational Algebra
  17. The Project Operator
  18. Types of Joins: Theta Join, Equi–Join, Natural Join, Outer Join, Semi Join
  19. Functional Dependency, Inference Rules, Normal Forms
  20. Second, Third Normal Form, Boyce - Codd Normal Form, Higher Normal Forms
  21. Normalization Summary, Example, Physical Database Design
  22. Physical Database Design: DESIGNING FIELDS, CODING AND COMPRESSION TECHNIQUES
  23. Physical Record and De-normalization, Partitioning
  24. Vertical Partitioning, Replication, MS SQL Server
  25. Rules of SQL Format, Data Types in SQL Server
  26. Categories of SQL Commands,
  27. Alter Table Statement
  28. Select Statement, Attribute Allias
  29. Data Manipulation Language
  30. ORDER BY Clause, Functions in SQL, GROUP BY Clause, HAVING Clause, Cartesian Product
  31. Inner Join, Outer Join, Semi Join, Self Join, Subquery,
  32. Application Programs, User Interface, Forms, Tips for User Friendly Interface
  33. Designing Input Form, Arranging Form, Adding Command Buttons
  34. Data Storage Concepts, Physical Storage Media, Memory Hierarchy
  35. File Organizations: Hashing Algorithm, Collision Handling
  36. Hashing, Hash Functions, Hashed Access Characteristics, Mapping functions, Open addressing
  37. Index Classification
  38. Ordered, Dense, Sparse, Multi-Level Indices, Clustered, Non-clustered Indexes
  39. Views, Data Independence, Security, Vertical and Horizontal Subset of a Table
  40. Materialized View, Simple Views, Complex View, Dynamic Views
  41. Updating Multiple Tables, Transaction Management
  42. Transactions and Schedules, Concurrent Execution, Serializability, Lock-Based Concurrency Control, Deadlocks
  43. Incremental Log with Deferred, Immediate Updates, Concurrency Control
  44. Serial Execution, Serializability, Locking, Inconsistent Analysis
  45. Locking Idea, DeadLock Handling, Deadlock Resolution, Timestamping rules