ZeePedia

Types of Joins: Theta Join, Equi–Join, Natural Join, Outer Join, Semi Join

<< The Project Operator
Functional Dependency, Inference Rules, Normal Forms >>
img
Database Management System (CS403)
VU
Lecture No. 18
Reading Material
"Database Systems Principles, Design and Implementation"
Section 6.6.1 ­ 6.6.3
written by Catherine Ricardo, Maxwell Macmillan.
Overview of Lecture:
o Types of Joins
o Relational Calculus
o Normalization
In the previous lecture we have studied the basic operators of relational algebra along
with different examples. From this lecture we will study the different types of joins,
which are very important and are used extensively in relational calculus.
Types of Joins
Join is a special form of cross product of two tables. It is a binary operation that
allows combining certain selections and a Cartesian product into one operation. The
join operation forms a Cartesian product of its two arguments, performs a selection
forcing equality on those attributes that appear in both relation schemas, and finally
removes duplicate attributes. Following are the different types of joins: -
1. Theta Join
2. Equi Join
3. Semi Join
4. Natural Join
5. Outer Joins
We will now discuss them one by one
Theta Join:
In theta join we apply the condition on input relation(s) and then only those selected
rows are used in the cross product to be merged and included in the output. It means
that in normal cross product all the rows of one relation are mapped/merged with all
the rows of second relation, but here only selected rows of a relation are made cross
product with second relation. It is denoted as under: -
RX S
157
img
Database Management System (CS403)
VU
If R and S are two relations then is the condition, which is applied for select
operation on one relation and then only selected rows are cross product with all the
rows of second relation. For Example there are two relations of FACULTY and
COURSE, now we will first apply select operation on the FACULTY relation for
selection certain specific rows then these rows will have across product with
COURSE relation, so this is the difference in between cross product and theta join.
We will now see first both the relation their different attributes and then finally the
cross product after carrying out select operation on relation.
From this example the difference in between cross product and theta join becomes
clear.
FACULTY
facId
facName
dept
salary
rank
F234
Usman
CSE
21000
lecturer
F235
Tahir
CSE
23000
Asso Prof
F236
Ayesha
ENG
27000
Asso Prof
F237
Samad
ENG
32000
Professor
COURSE
crCode
crTitle
fId
C3456
Database Systems
F234
C3457
Financial Management
C3458
Money & Capital Market
F236
C3459
Introduction to Accounting
F237
(σ rank = `Asso Prof'(FACULTY)) X COURSE
facName dept salary
rank
crCode crTitle
fId
facId
F235
Tahir
CSE
23000
Asso
C3456
Database Systems
F234
Prof
F235
Tahir
CSE
23000
Asso
C3457
Financial Management
Prof
F235
Tahir
CSE
23000
Asso
C3458
Money & Capital Market
F236
Prof
F235
Tahir
CSE
23000
Asso
C3459
Introduction
to
F237
Prof
Accounting
F236
Ayesha
ENG
27000
Asso
C3456
Database Systems
F234
Prof
F236
Ayesha
ENG
27000
Asso
C3457
Financial Management
Prof
F236
Ayesha
ENG
27000
Asso
C3458
Money & Capital Market
F236
Prof
158
img
Database Management System (CS403)
VU
F236
Ayesha
ENG
27000
Asso
C3459
Introduction
to
F237
Prof
Accounting
Fig. 1: Two tables with an example of theta join
In this example after fulfilling the select condition of Associate professor on faculty relation
then it is cross product with course relation
Equi­Join:
This is the most used type of join. In equi­join rows are joined on the basis of values
of a common attribute between the two relations. It means relations are joined on the
basis of common attributes between them; which are meaningful. This means on the
basis of primary key, which is a foreign key in another relation. Rows having the
same value in the common attributes are joined. Common attributes appear twice in
the output. It means that the attributes, which are common in both relations, appear
twice, but only those rows, which are selected. Common attribute with the same name
is qualified with the relation name in the output. It means that if primary and foreign
keys of two relations are having the same names and if we take the equi ­ join of both
then in the output relation the relation name will precede the attribute name. For
Example, if we take the equi ­ join of faculty and course relations then the output
would be as under: -
FACULTY..facId=COURSE.fId COURSE
FACULTY
facId  facName dept
salary  rank
crCode
crTitle
fID
F234
Usman
CSE
21000
lecturer
C3456
Database Systems
F234
F236
Ayesha
ENG
27000
Asso Prof
C3458
Money & Capital Market
F236
F237
Samad
ENG
32000
Professor
C3459
Introduction to Accounting
F237
Fig. 2: Equi-join on tables of figure 1
In the above example the name of common attribute between the two tables is
different, that is, it is facId in FACULTY and fId in COURSE, so it is not required to
qualify; however there is no harm doing it still. Now in this example after taking
equi­join only those tuples are selected in the output whose values are common in
both the relations.
Natural Join:
This is the most common and general form of join. If we simply say join, it means the
natural join. It is same as equi­join but the difference is that in natural join, the
common attribute appears only once. Now, it does not matter which common attribute
should be part of the output relation as the values in both are same. For Example if we
take the natural join of faculty and course the output would be as under: -
159
img
Database Management System (CS403)
VU
FACULTY
facId, fId COURSE
facId
facName dept
salary
rank
crCode crTitle
F234
Usman
CSE
21000
Lecturer
C3456
Database Systems
F236
Ayesha
ENG
27000
Asso Prof
C3458
Money & Capital Market
F237
Samad
ENG
32000
Professor
C3459
Introduction to Accounting
Fig. 4: Natural join of FACULTY and COURSE tables of figure 1
In this example the common attribute appears only once, rest the behavior is same.
Following are the different types of natural join:-
Left Outer Join:
In left outer join all the tuples of left relation remain part of the output. The tuples that
have a matching tuple in the second relation do have the corresponding tuple from the
second relation. However, for the tuples of the left relation, which do not have a
matching record in the right tuple have Null values against the attributes of the right
relation. The example is given in figure 5 below. It can be described in another way.
Left outer join is the equi-join plus the non matching rows of the left side relation
having Null against the attributes of right side relation.
Right Outer Join:
In right outer join all the tuples of right relation remain part of the output relation,
whereas on the left side the tuples, which do not match with the right relation, are left
as null. It means that right outer join will always have all the tuples of right relation
and those tuples of left relation which are not matched are left as Null.
COURSE
STUDENT
stId
bkId
bkTitile
stId
stName
B10001
Intro to Database Systems
S104
S101
Ali Tahir
B10002
Programming Fundamentals
S101
S103
Farah Hasan
B10003
Intro Data Structures
S101
S104
Farah Naz
B10004
Modern Operating Systems
S103
S106
Asmat Dar
B10005
Computer Architecture
S107
Liaqat Ali
B10006
Advanced Networks
S104
COURSE left outer join STUDENT
bkId  bkTitile
BOOK.stId STUDENT.stId
stName
B10001
Intro to Database Systems
S104
S104
Farah Naz
B10002
Programming
S101
S101
Ali Tahir
Fundamentals
B10003
Intro Data Structures
S101
S101
Ali Tahir
B10004
Modern
Operating
S103
S103
Farah Hasan
Systems
B10006
Advanced Networks
S104
S104
Farah Naz
B10005
Computer Architecture
Null
Null
Null
160
img
Database Management System (CS403)
VU
COURSE right outer join STUDENT
bkId bkTitile
BOOK.stId STUDENT.stId
stName
B10001
Intro to Database Systems
S104
S104
Farah Naz
B10002
Programming
S101
S101
Ali Tahir
Fundamentals
B10003
Intro Data Structures
S101
S101
Ali Tahir
B10004
Modern Operating Systems
S103
S103
Farah Hasan
B10006
Advanced Networks
S104
S104
Farah Naz
Null
Null
Null
S106
Asmat Dar
Null
Null
Null
S107
Liaqat Ali
Fig. 5: Input tables and left outer join and right outer join
Outer Join:
In outer join all the tuples of left and right relations are part of the output. It means
that all those tuples of left relation which are not matched with right relation are left
as Null. Similarly all those tuples of right relation which are not matched with left
relation are left as Null.
COURSE outer join STUDENT
bkId bkTitile
BOOK.stId STUDENT.stId
stName
B10001
Intro to Database Systems
S104
S104
Farah Naz
B10002
Programming
S101
S101
Ali Tahir
Fundamentals
B10003
Intro Data Structures
S101
S101
Ali Tahir
B10004
Modern Operating Systems
S103
S103
Farah Hasan
B10006
Advanced Networks
S104
S104
Farah Naz
B10005
Computer Architecture
Null
Null
Null
Null
Null
Null
S106
Asmat Dar
Null
Null
Null
S107
Liaqat Ali
Fig. 6: outer join operation on tables of figure 5
Semi Join:
In semi join, first we take the natural join of two relations then we project the
attributes of first table only. So after join and matching the common attribute of both
relations only attributes of first relation are projected. For Example if we take the
semi join of two relations faculty and course then the resulting relation would be as
under:-
FACULTY
COURSE
161
img
Database Management System (CS403)
VU
facId
facName Dept
Salary
Rank
F234
Usman
CSE
21000
lecturer
F236
Ayesha
ENG
27000
Asso Prof
F237
Samad
ENG
32000
Professor
Fig. 7: Semi-join operation on tables of figure 1
Now the resulting relation has attributes of first relation only after taking the natural
join of both relations.
Relational Calculus
Relational Calculus is a nonprocedural formal relational data manipulation language
in which the user simply specifies what data should be retrieved, but not how to
retrieve it. It is an alternative standard for relational data manipulation languages. The
relational calculus is not related to the familiar differential and integral calculus in
mathematics, but takes its name from a branch of symbolic logic called the predicate
calculus. It has two following two forms: -
·  Tuple Oriented Relational Calculus
·
Domain Oriented Relational Calculus
Tuple Oriented Relational Calculus:
In tuple oriented relational calculus we are interested primarily in finding relation
tuples for which a predicate is true. To do so we need tuple variables. A tuple variable
is a variable that takes on only the tuples of some relation or relations as its range of
values. It actually corresponds to a mathematical domain. We specify the range of a
tuple variable by a statement such as: -
RANGE OF S IS STUDENT
Here, S is the tuple variable and STUDENT is the range, so that S always represents a
tuple of STUDENT. It is expressed as
{S | P (S)}
We will read it as find the set of all tuples S such that P(S) is true, where P implies the
predicate condition now suppose range of R is STUDENT
{R | R.Credits > 50}
We will say like find the stuId, stuName, majors etc of all students having more than
50 credits.
Domain Oriented Relational Calculus:
Normalization
There are four types of anomalies, which are of concern, redundancy, insertion,
deletion and updation. Normalization is not compulsory, but it is strongly
162
img
Database Management System (CS403)
VU
recommended that normalization must be done. Because normalized design makes the
maintenance of database much easier. While carrying out the process of normalization,
it should be applied on each table of database. It is performed after the logical
database design. This process is also being followed informally during conceptual
database design as well.
Normalization Process
There are different forms or levels of normalization. They are called as first, second
and so on. Each normalized form has certain requirements or conditions, which must
be fulfilled. If a table or relation fulfills any particular form then it is said to be in that
normal form. The process is applied on each relation of the database. The minimum
form in which all the tables are in is called the normal form of entire database. The
main objective of normalization is to place the database in highest form of
normalization.
Summary
In this lecture we have studied the different types of joins, with the help of which we
can join different tables. We can get different types of outputs from joins. Then we
studied relational calculus in which we briefly touched upon tuple and domain
oriented relational calculus. Lastly we started the process of normalization which is a
very important topic and we will discuss in detail this topic in the coming lectures.
Exercise:
Draw two tables of PROJECT and EMPLOYEE along with different attribute, include
a common attribute between the two to implement the PK/FK relationship and
populate both the tables. Then apply all types of joins and observe the difference in
the output relations
163
Table of Contents:
  1. Introduction to Databases and Traditional File Processing Systems
  2. Advantages, Cost, Importance, Levels, Users of Database Systems
  3. Database Architecture: Level, Schema, Model, Conceptual or Logical View:
  4. Internal or Physical View of Schema, Data Independence, Funct ions of DBMS
  5. Database Development Process, Tools, Data Flow Diagrams, Types of DFD
  6. Data Flow Diagram, Data Dictionary, Database Design, Data Model
  7. Entity-Relationship Data Model, Classification of entity types, Attributes
  8. Attributes, The Keys
  9. Relationships:Types of Relationships in databases
  10. Dependencies, Enhancements in E-R Data Model. Super-type and Subtypes
  11. Inheritance Is, Super types and Subtypes, Constraints, Completeness Constraint, Disjointness Constraint, Subtype Discriminator
  12. Steps in the Study of system
  13. Conceptual, Logical Database Design, Relationships and Cardinalities in between Entities
  14. Relational Data Model, Mathematical Relations, Database Relations
  15. Database and Math Relations, Degree of a Relation
  16. Mapping Relationships, Binary, Unary Relationship, Data Manipulation Languages, Relational Algebra
  17. The Project Operator
  18. Types of Joins: Theta Join, Equi–Join, Natural Join, Outer Join, Semi Join
  19. Functional Dependency, Inference Rules, Normal Forms
  20. Second, Third Normal Form, Boyce - Codd Normal Form, Higher Normal Forms
  21. Normalization Summary, Example, Physical Database Design
  22. Physical Database Design: DESIGNING FIELDS, CODING AND COMPRESSION TECHNIQUES
  23. Physical Record and De-normalization, Partitioning
  24. Vertical Partitioning, Replication, MS SQL Server
  25. Rules of SQL Format, Data Types in SQL Server
  26. Categories of SQL Commands,
  27. Alter Table Statement
  28. Select Statement, Attribute Allias
  29. Data Manipulation Language
  30. ORDER BY Clause, Functions in SQL, GROUP BY Clause, HAVING Clause, Cartesian Product
  31. Inner Join, Outer Join, Semi Join, Self Join, Subquery,
  32. Application Programs, User Interface, Forms, Tips for User Friendly Interface
  33. Designing Input Form, Arranging Form, Adding Command Buttons
  34. Data Storage Concepts, Physical Storage Media, Memory Hierarchy
  35. File Organizations: Hashing Algorithm, Collision Handling
  36. Hashing, Hash Functions, Hashed Access Characteristics, Mapping functions, Open addressing
  37. Index Classification
  38. Ordered, Dense, Sparse, Multi-Level Indices, Clustered, Non-clustered Indexes
  39. Views, Data Independence, Security, Vertical and Horizontal Subset of a Table
  40. Materialized View, Simple Views, Complex View, Dynamic Views
  41. Updating Multiple Tables, Transaction Management
  42. Transactions and Schedules, Concurrent Execution, Serializability, Lock-Based Concurrency Control, Deadlocks
  43. Incremental Log with Deferred, Immediate Updates, Concurrency Control
  44. Serial Execution, Serializability, Locking, Inconsistent Analysis
  45. Locking Idea, DeadLock Handling, Deadlock Resolution, Timestamping rules