ZeePedia buy college essays online


Data Warehousing

<<< Previous Web Warehousing: Issues, Time-contiguous Log Entries, Transient Cookies, SSL, session ID Ping-pong, Persistent Cookies Next >>>
 
img
Slide 1
Virtual University of Pakistan
Data Transfer Service (DTS)
Introduction
Lab lec:1
Ahsan Abdullah
Assoc. Prof. & Head
Center for Agro -Informatics Research
www.nu.edu.pk/cairindex.asp
National University of Computers & Emerging Sciences, Islamabad
Email: ahsan101@yahoo.com
DWHAhsan Abdullah
-
1
Slide 2
Data Transformation Services
· DTS Overview
· SQL Server Enterprise Manager
· DTS Basics
­ DTS Packages
­ DTS Tasks
­ DTS Transformations
­ DTS Connections
­ Package Workflow
DWH Ahsan Abdullah
-
2
Microsoft® SQL ServerTM 2000 Data Transformation Services (DTS) is a set of graphical tools
and programmable objects that allow you extract, transform, and consolidate data from disparate
sources into single or multiple destinations. SQL Server Enterprise Manager provides an easy
access to the tools of DTS.
The purpose of this lecture is to get an understanding of DTS basics, which is necessary to learn
the use of DTS tools. These DTS basics describe the capabilities of DTS and summarize the
business problems it addresses.
350
img
Slide 3
DTS Overview
Extract Data
Single or
Disparate
Sources  Transform Data  Multiple
Destinations
of data
Consolidate Data
DTS, Graphical tools & Programmable objects
DWH -Ahsan Abdullah
3
Many organizations need to centralize data to improve corporate decision -making. However, their
data may be stored in a variety of formats and in different locations. Data Transformation
Services (DTS) address this vital business need by providing a set of tools that let you extract,
transform, and consolidate data from disparate sources into single or multiple destinations
supported by DTS connectivity.
DTS allows us to connect through any data source or destination that is supported by OLE DB.
This wide range of connectivity that is provided by DTS allows us to extract data from wide
range of legacy systems. Heterogeneous source systems store data with their local formats and
conventions. While consolidating data from variety of sources we need to transform names,
addresses, dates etc into a standard format. For example consider a student record management
system of a university having four campuses. A campus say `A' follows convention to store city
codes "LHR" for Lahore. An other campus say `B' stores names of cities "Lahore", campus `C'
stores city names in block letters `LAHORE', and the last campus `D' store city names as
`lahore'. When the data from all the four campuses is combined as it is and query is run "How
many students belong to `Lahore'?" We get the answer only from campus B because no other
convention for Lahore matches to the one in query.
To combine data from heterogeneous sources with the purpose of some useful analysis requires
transformation of data. Transformation brings data in some standard format.
Microsoft SQL Server provides graphical tools to build DTS packages. These tools provide good
support for transformations. Complex transformations are achieved through VB Script or Java
Script that is loaded in DTS package. Package can also be programmed by using DTS object
model instead of using graphical tools but DTS programming is rather complicated.
351
img
Slide 4
DTS Overview: Concept
DTS
Centralized
Data
DTS
DWH-Ahsan Abdullah
4
The slide shows the heterogeneous sources of data. Position of DTS while consolidating the data
into a single source is also clear from the slide. In legacy systems we may come across the text
files as a source of data. Microsoft Access is a database management system, maintains dat a in
tables, and columns validate the input to the system. We often find legal values are stored in these
sort of data management systems. But when we deal with text files no validation mechanism for
input is there. Therefore we may come across illegal and rubbish values in text files. This makes
the process of transformation further complicated.
Slide 5
DTS Overview: Example
MUHAMMED ANWAR
8-JUN-1978
CH MUHMD. ASLAM
23-Nov-1980
DTS
MOHAMMAD ANWAR
08/06/1978
CHOUDHARY MOHAMMAD ASLAM
23/11/1980
AHMAD JAHANZEB
05/08/1980
MOHAMMAD FARRUKH
08/11/1979
DTS
AHMED JAHANZEB
8-5-80
M. FARRUKH
11-8-79
DWH-Ahsan Abdullah
5
In the this slide we may see three data management systems. Data is extracted from two systems,
top and bottom, and is loaded into the standardized system shown in the middle. We may see two
transformations over here. First one is name transformation and the other one is date
transformation. In the database management system shown at the top we have two names
Muhammed Anwer and Choudhary Mohammed Aslam. Whereas in the system shown at the
bottom we have two different names Ahmed Jahanzeb and Muhammed Farrukh. Out of four
names three names contain Muhammed but with different spellings. Computer can not identify
that the word `Mu hammed' is intended at all the three locations. So while consolidating data
352
img
names are transformed to standard spellings of names. Similarly Date formats are different in
both source systems and it is standardized in destination system (middle one).
Slide 6
DTS Overview: Operations
· A set of tools for
­
Providing connectivity to different databases
­
Building query graphically
­
Extracting data from disparate databases
­
Transforming data
­
Copying database objects
­
Providing support of different scripting
languages( by default VB-Script and J-Script)
DWH-Ahsan Abdullah
40
DTS contains a set of tools that provides a very easy approach to build a package and execute it.
Writing or building a package through programming is a complex task but DTS tools like DTS
Designer and Import/Export Wizard do this entire complex task for user just through a single
click of button. Not only package building but query building has also very sophisticated support
in DTS tools.
Slide 7
DTS Overview: Tools
· DTS includes
­ Data Import/Export Wizard
­ DTS Designer
­ DTS Query Designer
­ Package Execution Utilities
· DTS Tools can be accessed
through "SQL Server Enterprise
Manager"
DWH-Ahsan Abdullah
41
Package execution utilities are used to run or execute a package, no matter package is designed
through the tools provided by DTS or any external tool like Visual Basic. All these tools can be
accessed through the SQL Server Enterprise Manager.
Open the node Data Transfer Services in SQL Server Enterprise Manager. Choose the option in
which any finished package is saved. Right click the package and get option to execute it.
353
img
Slide 8
DTS Overview:
SQL Server Enterprise Manager
Path:  Start >> Programs >> Microsoft
SQL Server >> Enterprise Manager
DWH-Ahsan Abdullah
6
The Data Transformation Services (DTS) node of the SQL Server Enterprise M anager console
tree provides facilities for accessing DTS tools, manipulating DTS packages, and accessing
package information. You can use these facilities to:
·
Open a new package in the DTS Import/Export Wizard or DTS Designer. In DTS
Designer, you can select and edit an existing package saved to SQL Server, SQL Server
2000 Meta Data Services, or to a structured storage file.
Action >> New Package
Connect to and import meta data from a data source, and display the meta data in the
·
Meta Data node of SQL Server Enterprise Manager.
Right click Meta Data Services Package and select option import Meta Data
Open a package template in DTS Designer.
·
Right click the package that is required to be opened in DTS Designer and select the option
open in DTS Designer.
Display the version history of a package, edit a specific package version in DTS
·
Designer, and delete package versions.
Right click the package and select the option Versions.
Display and manipulate package log information.
·
Right click the node "Local packages"/"Meta data services packages" in the tree and
select the option view logs.
Set the properties of DTS Designer
·
354
img
Right-click the Data Transformation Services node and click Properties.
Execute a package.
·
Right click the package and click execute
·
Schedule a package.
Right click the package and click schedule
Slide 9
DTS Overview: Market
·
SQL Market
·
Diverse Usage
·
Courses and certifications
·
High costs
DWH-Ahsan Abdullah
7
SQL Server is becoming one of Microsoft's biggest businesses, as being used
by people from wide spectrum of domains. The scope of the SQL Server is so
diverse that whole course can be offered and actually such courses are being
offered in developed countries. Microsoft also offers training courses and
certifications  are  expensive  enough.  For  example,  "Designing  and
Implementing OLAP Solutions with MS SQL Server 2000" is a course that
provides students with the knowledge and skills necessary to design,
implement, and deploy OLAP solutions by using Microsoft SQL Server 2000TM
Analysis Services. The importance of the course is well depicted by its cost
i.e. $2500+GST
355
img
Slide 10
DTS Basics
·
DTS Packages
·
DTS Tasks
·
DTS Transformations
·
DTS Package Workflows
·
DTS Tools
·
Meta Data
DWH-Ahsan Abdullah
7
Before learning to use DTS some basic concepts like DTS packages, DTS tasks, transformations
and workflows are important to understand.
When we want to use computers to perf orm some particular task through programming, what we
do? We write a program in some programming language. Program is a sequence of logical
statements that collectively achieve the purpose of the programmer. This analogy is useful in
understanding the concept of package and tasks in DTS. DTS package is exactly like a computer
program. Like a computer program DTS package is also prepared to achieve some goal.
Computer program contains set of instructions whereas DTS package contains set of tasks. Tasks
are logically related to each other. When a computer program is run, some instructions are
executed in sequence and some in parallel. Likewise when a DTS package is run some tasks are
performed in sequence and some in parallel. The intended goal of a computer program is
achieved when all instructions are successfully executed. Similarly the intended goal of a package
is achieved when all tasks are successfully accomplished.
DTS task is a unit of work in a package. Tasks can be establishment of connection to s ource and
destination databases, extraction of data from the source, transformation of data, loading of data
to the destination, generation of error messages and emails etc.
In real world systems when we talk about heterogeneous sources of data there arise a lot of
complicated issues. Heterogeneous systems contain data with different storage conventions,
different storage formats, different technologies, and different designs etc. Power of DTS lies in
extracting the data from these heterogeneous sources, transforming to some standard format and
convention, and finally load data to some different system with totally different parameters like
technology, design etc. Microsoft SQL Server provides user -friendly tools to develop DTS
Packages. Through graphical editor/ designer or wizards we can put together set of tasks in a
package. Order or sequence in which the tasks are required to be performed can be set through
conditions like "On success of task A task B should be performed otherwise task C should be
performed." This order or sequence of execution is called Workflow of a package.
356
img
In this lecture we will see these concepts in detail and in subsequent lectures we will develop
packages and practically get into the use of DTS functionalities.
Slide 11
DTS Packages
(Cont.)
DWH-Ahsan Abdullah
9
Slide shows how a package looks like.  We can only view package as a form of
graphical objects as shown in the slide. Here two connections are established. "Microsoft
OLEDB Driver" and "Microsoft Excel 97" are connections.  Blac k link between two
connections is transformation task. "Execute SQL" and "Copy SQL Server" both are
tasks. Green and blue links are workflows. Green link shows `On the Success of' i.e.
on the success of Connection establishment execute task execute SQL.Blue link shows
`On the Failure of' on the failure of the previous task execute another task Copy SQL
Server objects.
Slide 12
DTS Package: Contents
· DTS Package is an organized
collection of
­ Connections
­ DTS tasks
­ DTS transformations
­ Workflows
DWH-Ahsan Abdullah
8
A DTS package is an organized collection of connections, DTS tasks, DTS transformations, and
workflow constraints assembled either with a DTS tool or programmatically and saved to
357
img
Microsoft® SQL ServerTM, SQL Server 2000 Meta Data Services, a structured storage file, or a
Microsoft Visual Basic® file.
Each package contains one or more steps that are executed sequentially or in parallel when the
package is run. When executed, the package connects to the correct data sources, copies data and
database objects, transforms data, and notifies other users or processes of events.
Slide 13
DTS Package: Execution
· When a package is run
­ It connects to data sources
­ Copies data and database objects
­ Transforms data
­ Notifies other users and processes of
events
DWH-Ahsan Abdullah
10
When we run a Data Transformation Services (DTS) package, all of its connections, tasks,
transformations, and scripting code are executed in the sequence described by the package
workflow.
We can execute a package from:
Within a DTS tool.
·
SQL Server Enterprise Manager.
·
Package execution utilities.
·
Slide1 4
DTS Package: Creating
· Package can be created by one
of the following three methods:
­Import/Export wizard
­DTS Designer
­Programming DTS applications
DWH-Ahsan Abdullah
11
358
img
Microsoft SQL Server provides a good support for the tools that are helpful in building a
package. Import/Export Wizard and DTS Designer both are the graphical methods of building a
package. Both tools provide support to run the package also. Building a package means putting
all the tasks that are supposed to be performed in a particular package together and setting their
order of execution or defining workflow. Whereas when we actually run a package all the tasks
are actually performed.
Programming DTS applications without the help of these user-friendly tools is a difficult task.
Packages can be programmed using some external tool like Visual Studio in VC++ or VB. Such a
programming requires deep understanding of DTS object model.
Slide 15
DTS Package: Creating
Import/Export
1. Expand tree node
mentioning `Data
Transformation
Services' and
select the option
for available
location to save
package
2. Tool>Data Transfer
Service>
Import/Export
DWH-Ahsan Abdullah
12
Data Import and Export wizard can be accessed through a number of ways.
1. Start > Programs> Microsoft SQL Server> Data Import/Export Wizard
2. Through SQL Server Enterprise Manager
a.  In SQL Server Enterprise Manager we can see a Tree view of SQL Server
objects and services. Expand Tree, select the node "Data Transformation
Services". We can see two options (discussed earlier) to store Package. Select
any one of them (local OR SQL Server Meta Data Services). Then Click Tools >
Data Transformation Services > Import / Export Data.
b. After Expanding Data Transformation Services node we can click
on tool
bar to launch the wizard
359
img
Slide 16
DTS Package: Creating
Data Import/Export Wizard
DWH-Ahsan Abdullah
13
This is how wizard looks like. Just press Next and start working with a user friendly wizard.
An easy -to-use tool that guides you, a step at a time, through the process of creating a DTS
package. It is recommended for simple data transformation or data movement solutions (for
example, importing tabular data into a SQL Server 2000 database). It provides limited support for
transformations.
Slide 17
DTS Package: Creating
DTS Designer
· DTS Designer
­Graphical objects
­For complex workflows
­It supports more complex
transformations as compared to
wizard
DWH-Ahsan Abdullah
17
DTS designer is an applicat ion that uses graphical objects to help you build packages containing
complex workflows. DTS Designer includes a set of model DTS Package Templates, each
designed for a specific solution that you can copy and customize for your own installation. It is
recommended for sophisticated data transformation solutions requiring multiple connections,
complex workflows, and event-driven logic. DTS package templates are geared toward new users
who are learning about DTS Designer or more experienced users who want assistance setting up
specific DTS functionalities (for example, data driven queries).
360
img
Slide 18
DTS Package: Creating
DTS Designer Console
1. Expand tree
node
mentioning
`Data
Transformation
Services ' and
select the
option for
available
location to save
package
2. Action>New
DWH-Ahsan Abdullah
15
Package
DTS Designer can also be accessed through multiple ways.
1. Whenever a saved package is opened by double click or through right click, it is opened
in DTS Designer
2. SQL Server Enterprise Manager can also be used to access DTS Designer
a. After Expanding Data Transformation Services node select Action > New
Package
b. After Expanding Data Transformation Services node select
on toolbar to
access DTS Designer
Slide 19
DTS Package: Creating
DTS Designer Environment
DWH-Ahsan Abdullah
16
The slide shows environment of DTS Designer. In designer we can see four windows
A. Connection toolbar
B. Task toolbar
C. General toolbar
D. Design Area
361
img
A. Connection toolbar
Connection toolbar shows all available connections in the form of icons or symbols. All OLE
DB supported connections are available. To establis h a new connection just click the correct
icon and drag to design area. Then set properties to your connection. In case of any difficulty
in identifying the connection icon, click on Connection on Menu bar just above the
connection toolbar.
B. Task Toolbar
Tasks toolbar shows icons for all tasks that are supported by DTS. For example
is used
to set transformation task. This also works as drag and drop. DTS Designer is very friendly to
use as it guides user about what to do after picking a certain option. For new users who do not
recognize the tasks through icons, in the top menu bar `Task' is available.
C. General Toolbar
This toolbar provides general functionality like saving a package, executing a package.
is
used to execute a package
D. Design Area
Design Area is used to design a package through the objects available in the tool bars.
Slide 20
DTS Package: Creating
Programming
· Programming DTS applications
­ Complicated & technical way
­ For experienced developers and
programmers only
­ Requires Visual C++ or Visual
Basic for programming
DWH-Ahsan Abdullah
18
Programming applications that you can use to write and compile a DTS package either in
Microsoft Visual Basic® or Microsoft Visual C++®. It is recommended for developers who want
to access the DTS object model directly and exert a fine degree of control over package
operations. Packages created programmatically can be opened and further customized in DTS
Designer. In addition, packages created in the DTS Import/Export Wizard or DTS Designer can
be saved as a Visual Basic program and then opened and further customized in a development
environment such as Microsoft Visual Studio®.
362
img
Slide 19
Saving a DTS Package
· DTS Package can be saved to
­ Microsoft SQL Server
­ SQL Server 2000 Meta Data services
­ Structured storage files
­ A Microsoft Visual Basic file
DWH-Ahsan Abdullah
19
When you save a Data Transformation Services (DTS) package, you save all DTS connections,
DTS tasks, DTS transformations, and workflow steps and preserve the graphical layout of these
objects on the DTS Designer design
While saving a package we get different options as destination location for the package. Package
can be saved to Microsoft SQL server. Another option to save a package is SQL Server Meta
Data Services. The advantage which we get when we store our package to SQL Server 2000 Meta
Data Services is that we may maintain meta data information of the databases involved in the
packages and we may keep version information of each package. Furthermore package can be
stored in a structured file and Microsoft visual basic file.
Slide 20
Saving a DTS Package:
Illustration
DWH-Ahsan Abdullah
20
This slide illustrate the package saving process.
363
img
Slide 21
Saving a DTS Package:
SQL Server
Contains
Packages that
are  saved  to
this particular
instance
of
SQL Server
DWH-Ahsan Abdullah
21
Data Transformation Services node of SQL Server Enterprise Manager contains three options to
locate the package saved earlier. The first option is local Packages. These are the packages that
are saved to this particular instance of SQL Server. Microsoft SQL Server may have multiple
instances on each machine or over a Network. Local packages are those that are saved to this
particular instance of SQL Server.
Slide 22
Saving a DTS Package:
Meta Data Services
Contains Packages
that are saved to
Meta Data Services
of this instance of
SQL Server. It
maintains version
information of each
package saved to
it.
DWH-Ahsan Abdullah
22
As it has been discussed earlier, to maintain Metadata information for a package or version
information of a package, it may be stored to Meta Data Services. This node contains all those
packages that are saved to SQL Server 2000 Meta Data Services.
364
img
Slide 23
Saving a DTS Package
Metadata
It is a repository
of
metadata
information
of
databases
scanned to Meta
Data
Services
Packages. It also
provides version
tracking  facility
of Packages.
DWH-Ahsan Abdullah
23
If a package is saved to SQL Server 2000 Metadata Services and is scanned for metadata than its
meta data information is maintained in a repository " Meta Data ", that can be found as third
option under node Data Transformation Services.
Slide 24
DTS Packages:
4 Operations
· Packages can be:
1.
Edited
2.
Password protected
3.
Scheduled for execution
4.
Retrieved by version
DWH-Ahsan Abdullah
24
Packages that are required for very complicated tasks are not trivial to build. To develop such
packages, DTS Designer or programming tools are used. Once such packages are built they are
saved for further use. We may edit these packages later on. For editing purposes either DTS
designer or programming is used. After edit a package we may keep both packages that is
package before editing and package after editing as two different versions of same package. To
maintain version information packages are saved in "SQL Server 2000 Meta Data Services".
Tasks in the package require access to database objects, when package is executed. Packages can
be protected through passwords. When a package is built it is not necessary to execute it
immediately. We may schedule package to be executed any time later on. We may prepare a
package that is executed after definite intervals. For example we want to update our dataware
house every night at 12:00 o' clock, what will we do? We will write a package to update dataware
house and schedule it to run at 12:00 o' clock every night.
365
img
Slide 25
DTS Package Operations:
Editing
DWH-Ahsan Abdullah
25
Double click a package to open in designer. Drag and drop objects to edit a package. Designer is
the easiest way of editing a package. Even the packages that are created through wizards and are
saved, can be edited through designer.
Slide 26
DTS Package Operations:
Protection
· Save dialog box allows to set
passwords
· Owner password puts limits on
both editing and execution of the
package
DWH-Ahsan Abdullah
26
Enter an Owner password. Assigning an Owner password puts limits on who can both edit and
run the package.
Enter a User password. Assigning a User password puts limits only on who can edit the package.
If you create a User password, you must also create an Owner password.
366
img
Slide 27
DTS Package Operations:
Scheduling/Execution
· Right click any saved package to
schedule or execute it
DWH-Ahsan Abdullah
27
To schedule a package or to execute a package, first right click the package and then select the
required option.
Slide 28
DTS Package Operations:
Versioning
· Right click any saved package to view
its version information
DWH-Ahsan Abdullah
28
If we want to get version information of a package we can see it by right clicking the package and
selecting version information. First column contains creation date and the other column contains
the description about changes if it is saved with the package.
367
img
Slide 29
DTS Tasks
· DTS Package contains one or more
tasks
· Task defines single work item
­ Establishing connections
­ Importing and exporting data
­ Transforming data
­ Copying database objects
­ etc
DWH-Ahsan Abdullah
29
DTSPackages contain a sequence of tasks. When a package is executed these tasks are performed
in sequence or in parallel. These tasks are the single work item in a package. Tasks can be
establishing connections, extraction of data from sources, transformations applied on data,
loading data to destination, generation of automated email messages to administrator in case of
some problem during the package execution..
Slide 30
DTS Tasks: Menu
· Wizard collects the tasks that are
invisible to users
· Designer allows to view and arrange
tasks manually
Set of all possible tasks in
designer, drag the required
task in design area and set its
properties
DWH-Ahsan Abdullah
30
A DTS task is a discrete set of functionality, executed as a single step in a package. Each task
defines a work item to be performed as part of the data movement and data transformation
process, or as a job to be executed.
DTS supplies a number of tasks that are part of the DTS object model and can be accessed
graphically, through DTS Designer, or programmatically. These tasks, which can be configured
individually, cover a wide variety of data copying, data transformation, and notification
situations. For example:
Importing and exporting data. DTS can import data from a text file or an OLE DB data source
(for example, a Microsoft Access 2000 database) into SQL Server. Alternatively, data can be
368
img
exported from SQL Server to an OLE DB data destination (for example, a Microsoft Excel 2000
spreadsheet). DTS als o allows high-speed data loading from text files into SQL Server tables.
Transforming data. DTS Designer includes a Transform Data task that allows you to select data
from a data source connection, map the columns of data to a set of transformations, and send the
transformed data to a destination connection. DTS Designer also includes a Data Driven Query
task that allows you to map data to parameterized queries.
Copying database objects. With DTS, you can transfer indexes, views, logins, stored procedures ,
triggers, rules, defaults, constraints, and user-defined data types in addition to the data.
In addition, you can generate the scripts to copy the database objects. Sending and receiving
messages to and from other users and packages. DTS includes a Send Mail task that allows you to
send an e-mail if a package step succeeds or fails. DTS also includes an Execute Package task
that allows one package to run another as a package step, and a Message Queue task that allows
you to use Message Queuing to send and receive messages between packages.
Executing a set of Transact -SQL statements or Microsoft ActiveX® scripts against a data
source. The Execute SQL and ActiveX Script tasks allow you to write your own SQL statements
and scripting code and execute them as a step in a package workflow.
Slide 31
DTS Transformations
· After extraction from source data can
be transformed
­ Using available DTS transformations
­ Using customized transformations
DWHAhsan Abdullah
-
31
While transferring data from source to destination that may be a single source of truth, data may
require to be transformed. Power of DTS tools lies in the support of data transformations. Some
transformations are already available with DTS tools and customized transformations can be
performed through VB Script or Java Script.
369
img
Slide 32
Available Transformations: Available
· Available transformations are:
­ Copy column transformation
­ ActiveX Script transformations
­ Date time string transformations
­ Uppercase and lowercase string
transformations
­ Middle of string transformations
­ Read and write file transformations
DWH-Ahsan Abdullah
32
The slide shows the list of transformations that are alre ady available with DTS tools i.e. DTS
Designer and DTS import/export wizard. Wizard has a support of two transformations out of six
shown over here:
·  Copy column transformation
·  Active-X script transformation
The rest four are accessed through DTS designer and scripts.
Copy Column Transformation: Describes the transformation used to copy source data to the
destination.
ActiveX Script Transformation: Explains how to use Microsoft ActiveX® scripts to define
column -level transformations.
Date Time String Transformation: Describes the transformation used to convert a source date
into a new destination format.
Uppercase String Transformation: Describes the transformation used to convert a string into
uppercase characters.
Lowercase String Transformation: Describes the transformation used to convert a string into
lowercase characters.
Middle of String Transformation: Describes the transformation used to extract a substring from
a source and optionally change its case or trim white space before placing the result i the
n
destination.
Trim String Transformation: Describes the transformation used to remove leading, trailing, or
embedded white space from a source string and place the (optionally case-shifted) result in the
destination.
Read File Transformation: Describes the transformation used to copy the contents of a file
specified by a source column to a destination column.
Write File Transformation: Describes the transformation that creates a new data file for each
file named in a source column and initializes the contents of each file from data in a second
source column.
370
img
Slide 33
DTS Transformations: Customized
· Each available transformation has ActiveX
Script working at its back
· To customize an available transformation one
needs to modify the ActiveX Script according
to one's need
DWH-Ahsan Abdullah
33
In SQL Server, transformations are applied through running ActiveX Scripts. When we apply an
available transformation, tools in SQL Server generate ActiveX Script automatically. This auto
generated script can be modified or customized according to our needs.  Customized
transformations are those in which we customize the auto generated ActiveX Scripts to fulfill our
particular need.
For Example:
An available transformation can transform Saad Munir Rao to SAAD MUNIR RAO but if we
want to transform it as S. M. Rao then we need to customize the transformation.
Slide 34
DTS Transformations: ActiveX
· ActiveX Script
DWH-Ahsan Abdullah
34
Designer provides such an interface to write/customize ActiveX Scripts. To access it we will see
it later on.
371
img
Slide 35
DTS Connections
· An important and prior most task is
the establishment of valid connection
· DTS allows following varieties of
connections:
­ Data source connection
­ File Connection
­ Data link connection
DWH-Ahsan Abdullah
35
To successfully execute Data Transformation Services (DTS) tasks that copy and transform data,
a DTS package must establish valid connections to its source and destination data and to any
additional data sources (for example, lookup tables). Because of its OLE DB architecture, DTS
allows connections to data stored in a wide variety of OLE DB-compliant formats. In addition,
DTS packages usually can connect to data in custom or nonstandard formats if OLE DB
providers are available for those data sources and if you use Microsoft® Data Link files to
configure those connections
Slide 36
DTS Connections
Data Source, File Connection, Data Link
· Data source connection
­ All OLE DB supported databases
· MS SQL Server
· Oracle
· MS Access 2000
· File Connection
­ Text files
· Data link connection
­ Intermediate files containing connection strings
DWH-Ahsan Abdullah
36
DTS allows the following varieties of connections:
A data source connection. These are connections to: standard databases such as Microsoft SQL
ServerTM 2000, Microsoft Access 2000, Oracle, dBase, Paradox; OLE DB connections to ODBC
data sources; Microsoft Excel 2000 spreadsheet data; HTML sources; and other OLE DB
providers.
A file connection. DTS provides additional support for text files. When specifying a text file
connection, you specify the format of the file. For example:
·  Whether a text file is in delimited or fixed field format.
Whether the text file is in a Unicode or an ANSI format.
·  The row delimiter and column delimiter if the text file is in fixed field format.
372
img
·
The text qualifier.
Whether the first row contains column names.
A data link connection. These are connections in which an intermediate file outside of SQL
Server stores the connection string.
Slide 37
DTS Connection:Menu
Set of all possible connections
in designer, drag the required
connection in design area and
set its properties
Data source connection
File connection
Data link connection
DWH-Ahsan Abdullah
37
Slide 38
Package Workflow
· Package usually includes more than
one tasks
· To maintain order of execution of
tasks, workflow is required to be
defined
ss
Task D
cce
n
On  pletio
Su
Task A
On
om
c
Task C
On
Fai
Task B
lur
e
Task E
DWH-Ahsan Abdullah
38
Precedence constraints sequentially link tasks in a package. In DTS, you can use three types
of precedence constraints, which can be accessed either through DTS Designer or
programmatically:
Unconditional. If you want Task 2 to wait until Task 1 completes, regardless of the outcome,
link Task 1 to Task 2 with an unconditional precedence constraint.
On Success. If you want Task 2 to wait until Task 1 has successfully completed, link Task 1 to
Task 2 with an On Success precedence constraint.
373
img
On Failure. If you want Task 2 to begin execution only if Task 1 fails to execute successfully,
link Task 1 to Task 2 with an On Failure precedence constraint. If you want to run an alternative
branch of the workflow when an error is encountered, use this constraint.
Slide 39
Package Workflow: Designer
DWH-Ahsan Abdullah
39
This slide shows the process of making workflows using the designer. It provides a graphical
interface making the workflow management very easy
374
Table of Contents:
  1. Need of Data Warehousing
  2. Why a DWH, Warehousing
  3. The Basic Concept of Data Warehousing
  4. Classical SDLC and DWH SDLC, CLDS, Online Transaction Processing
  5. Types of Data Warehouses: Financial, Telecommunication, Insurance, Human Resource
  6. Normalization: Anomalies, 1NF, 2NF, INSERT, UPDATE, DELETE
  7. De-Normalization: Balance between Normalization and De-Normalization
  8. DeNormalization Techniques: Splitting Tables, Horizontal splitting, Vertical Splitting, Pre-Joining Tables, Adding Redundant Columns, Derived Attributes
  9. Issues of De-Normalization: Storage, Performance, Maintenance, Ease-of-use
  10. Online Analytical Processing OLAP: DWH and OLAP, OLTP
  11. OLAP Implementations: MOLAP, ROLAP, HOLAP, DOLAP
  12. ROLAP: Relational Database, ROLAP cube, Issues
  13. Dimensional Modeling DM: ER modeling, The Paradox, ER vs. DM,
  14. Process of Dimensional Modeling: Four Step: Choose Business Process, Grain, Facts, Dimensions
  15. Issues of Dimensional Modeling: Additive vs Non-Additive facts, Classification of Aggregation Functions
  16. Extract Transform Load ETL: ETL Cycle, Processing, Data Extraction, Data Transformation
  17. Issues of ETL: Diversity in source systems and platforms
  18. Issues of ETL: legacy data, Web scrapping, data quality, ETL vs ELT
  19. ETL Detail: Data Cleansing: data scrubbing, Dirty Data, Lexical Errors, Irregularities, Integrity Constraint Violation, Duplication
  20. Data Duplication Elimination and BSN Method: Record linkage, Merge, purge, Entity reconciliation, List washing and data cleansing
  21. Introduction to Data Quality Management: Intrinsic, Realistic, Orr’s Laws of Data Quality, TQM
  22. DQM: Quantifying Data Quality: Free-of-error, Completeness, Consistency, Ratios
  23. Total DQM: TDQM in a DWH, Data Quality Management Process
  24. Need for Speed: Parallelism: Scalability, Terminology, Parallelization OLTP Vs DSS
  25. Need for Speed: Hardware Techniques: Data Parallelism Concept
  26. Conventional Indexing Techniques: Concept, Goals, Dense Index, Sparse Index
  27. Special Indexing Techniques: Inverted, Bit map, Cluster, Join indexes
  28. Join Techniques: Nested loop, Sort Merge, Hash based join
  29. Data mining (DM): Knowledge Discovery in Databases KDD
  30. Data Mining: CLASSIFICATION, ESTIMATION, PREDICTION, CLUSTERING,
  31. Data Structures, types of Data Mining, Min-Max Distance, One-way, K-Means Clustering
  32. DWH Lifecycle: Data-Driven, Goal-Driven, User-Driven Methodologies
  33. DWH Implementation: Goal Driven Approach
  34. DWH Implementation: Goal Driven Approach
  35. DWH Life Cycle: Pitfalls, Mistakes, Tips
  36. Course Project
  37. Contents of Project Reports
  38. Case Study: Agri-Data Warehouse
  39. Web Warehousing: Drawbacks of traditional web sear ches, web search, Web traffic record: Log files
  40. Web Warehousing: Issues, Time-contiguous Log Entries, Transient Cookies, SSL, session ID Ping-pong, Persistent Cookies
  41. Data Transfer Service (DTS)
  42. Lab Data Set: Multi -Campus University
  43. Extracting Data Using Wizard
  44. Data Profiling