ZeePedia

DATA MANAGEMENT

<< Mathematical Functions in JavaScript
DATABASE SOFTWARE: Data Security, Data Integrity, Integrity, Accessibility, DBMS >>
img
Introduction to Computing ­ CS101
VU
LESSON 36
DATA MANAGEMENT
During the last Lesson ...
(Intelligent Systems)
We looked at the distinguishing features of intelligent systems w.r.t. other software systems
We looked at the role of intelligent systems in scientific, business, consumer and other applications
We discussed several techniques for designing intelligent systems
(Artificial) Intelligent Systems:
SW programs or SW/HW systems designed to perform complex tasks employing strategies that mimic
some aspect of human thought
Not a Suitable Hammer for All Nails!
if the nature of computations required in a task is not well understood
or there are too many exceptions to the
rules
or known algorithms are too complex or  inefficient
then AI has the potential of offering an acceptable solution
Selected Applications:
Games: Chess, SimCity
Image recognition
Medical diagnosis
Robots
Business intelligence
Neural Networks:
Original inspiration was the human brain; emphasis now on usefulness as a computational tool.
Genetic Algorithms (1):
Based on Darwin's evolutionary principle of `survival of the fittest'
GAs require the ability to recognize a good solution, but not how to get to that solution
Rulebased Systems (1):
Based on the principles of the logical reasoning ability of humans.
Fuzzy Logic (1):
Based on the principles of the approximate reasoning faculty that humans use when faced with linguistic
ambiguity
The Right Technique:
Selection of the right AI technique requires intimate knowledge about the problem as well as the
techniques under consideration
Real problems may require a combination of techniques (AI and/or nonAI) for an optimal solution
241
img
Introduction to Computing ­ CS101
VU
Three exciting areas of AI applications Robotics:
Automatic machines that perform various tasks that were previously done by humans
Autonomous Web Agents (1):
Computer program that performs various actions continuously, autonomously on behalf of their
principal!
Decision Support Systems:
Interactive software designed to improve the decision-making capability of their users
The do not make decisions - just assist in the process
Today's Goals:(Data Management)
First of a two-Lesson sequence
Today we will become familiar with the issues and problems related to data-intensive computing
We will find out about flat-files, the simpleast databases
Next time, in our 4th Lesson on productivity software, we will discuss relational databases and
implement a simple relational database
Keeping track of a few dozen data items is straight forward
However, dealing with situations that involve significant number of data items, requires more attention
to the data handling process
Dealing with millions - even billions - of inter-related data items requires even more careful thought
36.1 BholiBooks.com :
Consider the situation of a large, online bookstore
They have an inventory of millions of books, with new titles constantly arriving, and old ones being
phased out on a regular basis
The price for a book is not a static feature; it varies every once in a while
Thousands of books are shipped each day, changing the inventory constantly
Some are returned, again changing the inventory situation constantly
The cost of each shipped order depends on:
Prices of individual books
Size of the order
Location of the customer
Mode of shipment
For each order, the customer's particulars ­_ name, address, phone number, credit card number ­ are
required
Generally, that data is not deleted after the completion of the transaction; instead, it is kept for future
reference
All the transaction activity and the inventory changes result in:
Thousands of data items changing every day
Thousands of additional data items being added everyday
Keeping track & taking care (i.e. management) of all that constantly changing and expanding data is not
a trivial task and requires disciplined attention and actions for ensuring the smooth & profitable
operation of the bookstore
36.2 Issues in Data Management:
Data entry
Data updates
Data integrity
242
img
Introduction to Computing ­ CS101
VU
Data security
Data accessibility
Data Entry:
New titles are added every day
New customers are being added every day
Some of the above may require manual entry of new data into the computer systems
That new data needs to be added accurately
That can be achieved, for one, by user-interfaces that prevent the input of invalid data
Data Updates :
Old titles are deleted on a regular basis
Inventory changes every instant
Book prices change
Shipping costs change
Customers' personal data change
Various discount schemes are always commencing and concluding
All those actions require updates to existing data
Those changes need to be entered accurately
That can also be achieved by user-interfaces that prevent the input of invalid data
Data Security :
All the data that BholiBooks has in its computer systems is quite critical to its operation
The security of the customers' personal data is of utmost importance. Hackers are always looking for
that type of data, especially for credit card numbers
Enough leaks of that type, and customers will stop doing business with BholiBooks
This problem can be managed by using appropriate security mechanisms that provide access to
authorized persons/computers only
Security can also be improved through:
Encryption
Private or virtual-private networks
Firewalls
Intrusion detectors
Virus detectors
Data Integrity:
Integrity refers to maintaining the correctness and consistency of the data
Correctness: Free from errors
Consistency: No conflict among related data items
Integrity can be compromised in many ways:
Typing errors
Transmission errors
Hardware malfunctions
Program bugs
Viruses
Fire, flood, etc.
Ensuring Data Integrity:
Type Integrity is implemented by specifying the type of a data item:
243
img
Introduction to Computing ­ CS101
VU
Example: A credit card number consists of 12 digits. An update attempting to assign a value with more
or fewer digits or one including a non-numeral should be rejected
Limit Integrity is enforced by limiting the values of data items to specified ranges to prevent illegal
values
Example: Age of person should not be negative
Referential Integrity requires that an item referenced by the data for some other item must itself exist in
the database
Example: If an airline reservation is requested for a particular flight, then the corresponding flight
number must actually exist
Physical Integrity is ensured through hardware redundancy, backups, etc
Data Accessibility:
If the transaction and inventory data is placed in a disorganized fashion on a hard disk, it becomes very
difficult to later search for a stored data item
What is required is that:
Data be stored in an organized manner
Additional info about the data be storedso that the data access times are minimized
What if two customers check on the aavailability of a certain title simultaneously?
On seeing its availability, they both order the title ­ for which, unfortunately, only a single copy is
available
Same is the case when two airline customers try booking the only available seat
A solution to this concurrency control problem: Lock access to data while someone is using it
We can write our own SW that can take care of all the issues that we just discussed
OR
We can save ourselves lots of time, cost, and effort by buying ourselves a Database Management
System (DBMS) that takes care of most, if not all, of the issues
36.3 DBMS :
DBMSes are popularly, but incorrectly, also known as `Databases'
A DBMS is the SW system that operates a database, and is not the database itself
Some people even consider the database to be a component of the DBMS, and not an entity outside the
DBMS
DBMS
DBMS
Database
User/
Progra
m
A DBMS takes care of the storage, retrieval, and management of large data sets on a database
It provides SW tools needed to organize & manipulate that data in a flexible manner
It includes facilities for:
244
img
Introduction to Computing ­ CS101
VU
Adding, deleting, and modifying data
Making queries about the stored data
Producing reports summarizing the required contents
Database:
A collection of data organized in such a fashion that the computer can quickly search for a desired data
item
All data items in it are generally related to each other and share a single domain
They allow for easy manipulation of the data
They are designed for easy modification & reorganization of the information they contain
They generally consist of a collection of interrelated computer files
Example: VU Student Database:
Student's name
Student's photograph
Father's name
Phone number
Street address
eMail address
Courses being taken
Courses already taken & grades
Pre-VU educational record
Example: BholiBooks' Customer DB:
Name, address, phone & fax, eMail
Credit card type, number, expiration date
Shipping preference
Books on order
All books that were ever shipped to the customer
Book preference
Example: BholiBooks' Inventory DB:
Book title, author, publisher, binding, date of publication, price
Book summary, table of contents
Customers', editors', newspaper reviews
Number in stock
Number on order
Special offer details
36.4 OS Independence:
DBMS stores data in a database, which is a collection of interrelated files
Storage of files on the computer is managed by the computer OS's file system
Intimate knowledge of the OS & its file system is required to provide rapid access to the data
The DBMS takes care of those details
It hides the actual storage details of data files from the user
It provides an OS-independent view of the data to the user, making data manipulation and management
much more convenient
What can be stored in a database?
In the old days, databases were limited to numbers, Booleans, and text
245
img
Introduction to Computing ­ CS101
VU
These days, anything goes
As long as it is digital data, it can be stored:
Numbers, Booleans, text
Sounds
Images
Video
In the very, very old days ...:
Even large amounts of data was stored in text files, known as flat-file databases
All related info was stored in a single long, tab- or comma-delimited text file
Each group of info ­ called a record - in that file was separated by a special character; vertical bar `|'
was a popular option
Each record consisted of a group of fields, each field containing some distinct data item
246
img
Introduction to Computing ­ CS101
VU
Flat-File
Database
Record
Field
Record
Delimiter
247
img
Introduction to Computing ­ CS101
VU
Title, Author, Publisher,
Price, InStock|Good Bye Mr.
Bhola, Altaf Khan,
BholiBooks, 1000, Y|The
Terrible Twins, Bhola
Champion, BholiBooks, 199,
Y|Calculus & Analytical
Geometry, Smith Sahib, Good
Publishers, 325, N|Accounting
Secrets, Zamin Geoffry,
Sangg-e-Kilometer Publishers,
29, Y|
36.5 The Trouble with Flat-File Databases:
The text file format makes it hard to search for specific information or to create reports that include only
certain fields from each record
Reason: One has to search sequentially through the entire file to gather desired info, such as `all books
by a certain author'
However, for small sets of data ­ say, consisting of several tens of kB ­ they can provide reasonable
performance
Consider this tabular approach ...
(same records, same fields, but in a different format)
Title
Author
Publisher
Price
InStock
Good Bye Mr.
Altaf Khan
BholiBooks
1000
Y
Bhola
The Terrible
Bhola
BholiBooks
199
Y
Twins
Champion
Calculus &
Good
Smith Sahib
325
N
Analytical
Publishers
Geometry
Sung-e-
Accounting
Zamin
29
Y
Kilometer
Secrets
Geoffry
Publishers
Tabular Storage: Features & Possibilities:
Similar items of data form a column
Fields placed in a particular row ­ same as a flat-file record ­ are strongly interrelated
One can sort the table w.r.t. any column
That makes searching ­ e.g., for all the books written by a certain author ­ straight forward
248
img
Introduction to Computing ­ CS101
VU
Tabular Storage: Features & Possibilities:
Similarly, searching for the 10 cheapest/most expensive books can be easily accomplished through a
sort
Effort required for adding a new field to all the records of a flat-file is much greater than adding a new
column to the table
CONCLUSION: Tabular storage is better than flat-file storage
We will continue on this theme next time
Today's Summary:(Data Management)
First of a two-Lesson sequence
Today we became familiar with the issues and problems related to data-intensive computing
We also found out about flat-file and tabular storage
Next Lecture:(Database SW)
Next time, in our 4th Lesson on productivity SW, we will continue our discussion on data management
We will find out about relational databases
We will also implement a simple relational database
249
Table of Contents:
  1. INTRODUCTION
  2. EVOLUTION OF COMPUTING
  3. World Wide Web, Web’s structure, genesis, its evolution
  4. Types of Computers, Components, Parts of Computers
  5. List of Parts of Computers
  6. Develop your Personal Web Page: HTML
  7. Microprocessor, Bus interface unit, Data & instruction cache memory, ALU
  8. Number systems, binary numbers, NOT, AND, OR and XOR logic operations
  9. structure of HTML tags, types of lists in web development
  10. COMPUTER SOFTWARE: Operating Systems, Device Drivers, Trialware
  11. Operating System: functions, components, types of operating systems
  12. Forms on Web pages, Components of Forms, building interactive Forms
  13. APPLICATION SOFTWARE: Scientific, engineering, graphics, Business, Productivity, Entertainment, Educational Software
  14. WORD PROCESSING: Common functions of word processors, desktop publishing
  15. Interactivity to Forms, JavaScript, server-side scripts
  16. ALGORITHMS
  17. ALGORITHMS: Pseudo code, Flowcharts
  18. JavaScript and client-side scripting, objects in JavaScript
  19. Low, High-Level, interpreted, compiled, structured & object-oriented programming languages
  20. Software Design and Development Methodologies
  21. DATA TYPES & OPERATORS
  22. SPREADSHEETS
  23. FLOW CONTROL & LOOPS
  24. DESIGN HEURISTICS. Rule of thumb learned through trial & error
  25. WEB DESIGN FOR USABILITY
  26. ARRAYS
  27. COMPUTER NETWORKS: types of networks, networking topologies and protocols
  28. THE INTERNET
  29. Variables: Local and Global Variables
  30. Internet Services: FTP, Telnet, Web, eMail, Instant messaging, VoIP
  31. DEVELOPING PRESENTATIONS: Effective Multimedia Presentations
  32. Event Handlers
  33. GRAPHICS & ANIMATION
  34. INTELLIGENT SYSTEMS: techniques for designing Artificial Intelligent Systems
  35. Mathematical Functions in JavaScript
  36. DATA MANAGEMENT
  37. DATABASE SOFTWARE: Data Security, Data Integrity, Integrity, Accessibility, DBMS
  38. String Manipulations:
  39. CYBER CRIME
  40. Social Implications of Computing
  41. IMAGES & ANIMATION
  42. THE COMPUTING PROFESSION
  43. THE FUTURE OF COMPUTING
  44. PROGRAMMING METHODOLOGY
  45. REVIEW & WRAP-UP of Introduction to Computing