Introduction to Databases and Traditional File Processing Systems

<< Table of Contents

Advantages, Cost, Importance, Levels, Users of Database Systems >>

Database Management System (CS403)

Lecture No. 01

Reading Material

"Database Systems Principles, Design and Implementation"

Chapter 1.

written by Catherine Ricardo, Maxwell Macmillan.

Overview of Lecture

o Introduction to the course

o Database definitions

o Importance of databases

o Introduction to File Processing Systems

o Advantages of the Database Approach

Introduction to the course

This course is first (fundamental) course on database management systems. The course

discusses different topics of the databases. We will be covering both the theoretical and

practical aspects of databases. As a student to have a better understanding of the subject,

it is very necessary that you concentrate on the concepts discussed in the course.

Areas to be covered in this Course:

o Database design and application development: How do we represent a real-world

system in the form of a database? This is one major topic covered in this course. It

comprises of different stages, we will discuss all these stages one by one.

o Concurrency and robustness: How does a DBMS allow many users to access data

concurrently, and how does it protect against failures?

o Efficiency and Scalability: How does the database cope with large amounts of data?

Database Management System (CS403)

o Study of tools to manipulate databases: In order to practically implement, that is, to

perform different operations on databases some tools are required. The operations

on databases include right from creating them to add, remove and modify data in

the database and to access by different ways. The tools that we will be studying are

a manipulation language (SQL) and a DBMS (SQL Server).

Database definitions:

Definitions are important, especially in technical subjects because definition describes

very comprehensively the purpose and the core idea behind the thing. Databases have

been defined differently in literature. We are discussing different definitions here, if we

concentrate on these definitions, we find that they support each other and as a result of

the understanding of these definitions, we establish a better understanding of use,

working and to some extent the components of a database.

Def 1: A shared collection of logically related data, designed to meet the information

needs of multiple users in an organization. The term database is often erroneously

referred to as a synonym for a "database management system (DBMS)". They are

not equivalent and it will be explained in the next section.

Def 2: A collection of data: part numbers, product codes, customer information, etc. It

usually refers to data organized and stored on a computer that can be searched and

retrieved by a computer program.

Def 3: A data structure that stores metadata, i.e. data about data. More generally we can

say an organized collection of information.

Def 4: A collection of information organized and presented to serve a specific purpose.

(A telephone book is a common database.) A computerized database is an updated,

organized file of machine readable information that is rapidly searched and

retrieved by computer.

Def 5: An organized collection of information in computerized format.

Def 6: A collection of related information about a subject organized in a useful manner

that provides a base or foundation for procedures such as retrieving information,

drawing conclusions, and making decisions.

Def 7: A Computerized representation of any organizations flow of information and

storage of data.

Each of the above given definition is correct, and describe database from slightly variant

perspectives. From exam point of view, anyone will do. However, within this course, we

will be referring first of the above definitions more frequently, and concepts discussed in

the definition like, logically related data, shared collection should be clear. Another

Database Management System (CS403)

important thing that you should be very clear about is the difference between database

and the database management system (DBMS). See, the database is the collection of data

about anything, could be anything. Like cricket teams, students, busses, movies,

personalities, stars, seas, buildings, furniture, lab equipment, hobbies, hotels, pets,

countries, and many more anything about which you want to store data. What we mean

by data; simply the facts or figures. Following table shows the things and the data that we

may want to store about them:

Thing

Data (Facts or figures)

Cricket Player

Country, name, date of birth, specialty, matches played, runs etc.

Scholars

Name, data of birth, age, country, field, books published etc.

Movies

Name, director, language (Punjabi is default in case ) etc.

Food

Name, ingredients, taste, preferred time, origin, etc.

Vehicle

Registration number, make, owner, type, price, etc.

There could be infinite examples, and please note that the data that is listed about

different things in the above table is not the only data that can be defined or stored about

these things. As has been explained in the definition one above, there could be so many

facts about each thing that we are storing data about; what exactly we will store depends

on the perspective of the person or organization who wants to store the data. For example,

if you consider food, data required to be stored about the food from the perspective of a

cook is different from that of a person eating it. Think of a food, like, Karhahi Ghost, the

facts about Karhahi ghosht that a cook will like to store may be, quantity of salt, green

and red chilies, garlic, water, time required to cook and like that. Where as the customer

is interested in chicken or meat, then black or red chilies, then weight, then price and like

that. Well, definitely there are some things common but some are different as well. The

thing is that the perspective or point of view creates the difference in what we store;

however, the main thing is that the database stores the data.

The database management system (DBMS), on the other hand is the software or tool that

is used to manage the database and its users. A DBMS consist of different components or

subsystem that we will study about later. Each subsystem or component of the DBMS

performs different function(s), so a DBMS is collection of different programs but they all

work jointly to manage the data stored in the database and its users. In many books and

may be in this course sometimes database and database management system are used

interchangeably but there is a clear difference and we should be clear about them.

Sometimes another term is used, that is, the database system, again, this term has been

used differently by different people, however in this course we use the term database

system as a combination of database and the database management system. So database is

collection of data, DBMS is tool to manage this data, and both jointly are called database

system.

Database Management System (CS403)

Importance of the Databases

Databases are important; why? Traditionally computer applications are divided into

commercial and scientific (or engineering) ones. Scientific applications involve more

computations, that is, different type of calculations that vary from simple to very complex.

Today such applications exist, like in the fields of space, nuclear, medicine that take

hours or days of computations on even computers of the modern age. On the other hand,

the applications that are termed as commercial or business applications do not involve

much computations, rather minor computation but mainly they perform the input/output

operations. That is, these applications mainly store the data in the computer storage, then

access and present it to the users in different formats (also termed as data processing) for

example, banks, shopping, production, utilities billing, customer services and many

others. As is clear from the example systems mentioned, the commercial applications

exist in the day to day life and are related directly with the lives of common people. In

order to manage the commercial applications more efficiently databases are the ultimate

choice because efficient management of data is the sole objective of the databases. So

such applications are being managed by databases even in a developing country like

Pakistan, yet to talk about the developed countries. This way databases are related

directly or indirectly almost every person in society.

Databases are not only being used in the commercial applications rather today many of

the scientific/engineering application are also using databases less or more.

databases are concern of the effectively latter form of applications are more Commercial

applications involve The goal of this course is to present an in-depth introduction to

databases, with an emphasis on how to organize information in the database and to

maintain it and retrieve it efficiently, that is, how to design a database and use it

effectively.

Databases and Traditional File Processing Systems

Traditional file processing system or simple file processing system refers to the first

computer-based approach of handling the commercial or business applications. That is

why it is also called a replacement of the manual file system. Before the use computers,

the data in the offices or business was maintained in the files (well in that perspective

some offices may still be considered in the pre-computer age). Obviously, it was

laborious, time consuming, inefficient, especially in case of large organizations.

Computers, initially designed for the engineering purposes were though of as blessing,

since they helped efficient management but file processing environment simply

transformed manual file work to computers. So processing became very fast and efficient,

but as file processing systems were used, their problems were also realized and some of

them were very severe as discussed later.

It is not necessary that we understand the working of the file processing environment for

the understanding of the database and its working. However, a comparison between the

characteristics of the two definitely helps to understand the advantages of the databases

Database Management System (CS403)

and their working approach. That is why the characteristics of the traditional file

processing system environment have been discussed briefly here.

Fig. 1: A typical file processing environment

The diagram presents a typical traditional file processing environment. The main point

being highlighted is the program and data interdependence, that is, program and data

depend on each other, well they depend too much on each other. As a result any change

in one affects the other as well. This is something that makes a change very painful or

problematic for the designers or developers of the system. What do we mean by change

and why do we need to change the system at all. These things are explained in the

following.

The systems (even the file processing systems) are created after a very detailed analysis

of the requirements of the organizations. But it is not possible to develop a system that

does not need a change afterwards. There could be many reasons, mainly being that the

users get the real taste of the system when it is established. That is, users tell the analysts

or designers their requirements, the designers design and later develop the system based

on those requirements, but when system is developed and presented to the users, it is only

then they realize the outcome of the effort. Now it could be slightly and (unfortunately)

sometimes very different from what they expected or wanted it to be. So the users ask

changes, minor or major. Another reason for the change is the change in the requirements.

For example, previously the billing was performed in an organization on the monthly

Database Management System (CS403)

basis, now company has decided to bill the customers after every ten days. Since the bills

are being generated from the computer (using file processing system), this change has to

be incorporated in the system. Yet another example is that, initially bills did not contain

the address of the customer, now the company wants the address to be placed on the bill,

so here is change. There could be many more examples, and it is so common that we can

say that almost all systems need changes, so system development is always an on-going

process.

So we need changes in the system, but due to program-data interdependence these

changes in the systems were very hard to make. A change in one will affect the other

whether related or not. For example, suppose data about the customer bills is stored in the

file, and different programs use this file for different purposes, like adding data into the

bills file, to compute the bill and to print the bill. Now the company asks to add the

customers' address in the bills, for this we have to change the structure of the bill file and

also the program that prints the bill. Well, this was necessary, but the painful thing is that

the other programs that are using these bills files but are not concerned with the printing

of the bills or the change in the bill will also have to be changed, well; this is needless

and causes extra, unnecessary effort.

Another major drawback in the traditional file system environment is the non-sharing of

data. It means if different systems of an organization are using some common data then

rather than storing it once and sharing it, each system stores data in separate files. This

creates the problem of redundancy or wastage of storage and on the other hand the

problem on inconsistency. The change in the data in one system sometimes is not

reflected in the same data stored in other system. So different systems in organization;

store different facts about same thing. This is inconsistency as is shown in figure below.

Fig. 2: Some more problems in File System Environment

Database Management System (CS403)

Previous section highlighted the file processing system environment and major problems

found there. The following section presents the benefits of the database systems.

Advantages of Databases

It will be helpful to reiterate our database definition here, that is, database is a shared

collection of logically related data, designed to meet the information needs of multiple

users in an organization. A typical database system environment is shown in the figure 3

below:

Fig. 3: A typical Database System environment

The figure shows different subsystem or applications in an educational institution, like

library system, examination system, and registration system. There are separate, different

application programs for every application or subsystem. However, the data for all

applications is stored at the same place in the database and all application programs,

relevant data and users are being managed by the DBMS. This is a typical database

system environment and it introduces the following advantages:

o Data Sharing

The data for different applications or subsystems is placed at the same place. This

introduces the major benefit of data sharing. That is, data that is common among

different applications need not to be stored repeatedly, as was the case in the file

processing environment. For example, all three systems of an educational institution

shown in figure 3 need to store the data about students. The example data can be seen

Database Management System (CS403)

from figure 2. Now the data like registration number, name, address, father name that

is common among different applications is being stored repeatedly in the file

processing system environment, where as it is being stored just once in database

system environment and is being shared by all applications. The interesting thing is

that the individual applications do not know that the data is being shared and they do

not need to. Each application gets the impression as if the data is being for stored for

it. This brings the advantage of saving the storage along with others discussed later.

o Data Independence

Data and programs are independent of each other, so change is once has no or

minimum effect on other. Data and its structure is stored in the database where as

application programs manipulating this data are stored separately, the change in one

does not unnecessarily effect other.

o Controlled Redundancy

Means that we do not need to duplicate data unnecessarily; we do duplicate data in

the databases, however, this duplication is deliberate and controlled.

o Better Data Integrity

Very important feature; means the validity of the data being entered in the database.

Since the data is being placed at a central place and being managed by the DBMS, so

it provides a very conducive to check or ensure that the data being entered into the

database is actually valid. Integrity of data is very important, since all the processing

and the information produced in return are based on the data. Now if the data entered

is not valid, how can we be sure that the processing in the database is correct and the

results or the information produced is valid? The businesses make decisions on the

basis of information produced from the database and the wrong information leads to

wrong decisions, and business collapse. In the database system environment, DBMS

provides many features to ensure the data integrity, hence provides more reliable data

processing environment.

Dear students, that is all for this lecture. Today we got the introduction of the course,

importance of the databases. Then we saw different definitions of database and studied

what is data processing then studied different features of the traditional file processing

environment and database (DB) system environment. At the end of lecture we were

discussing the advantages of the DB approach. There some others to be studied in the

next lecture. Suggestions are welcome.

Exercises

o Think about the data that you may want to store about different things around you

o List the changes that may arise during the working of any system, lets say Railway

Reservation System

Table of Contents: