ZeePedia buy college essays online

Data Warehousing

<<< Previous DWH Implementation: Goal Driven Approach Next >>>
Lecture No. 33
DWH Implementation: Goal Driven Approach
(Lecture based on "The data warehousing toolkit by Ralph Kimball and Margy Ross, 2nd Edition)
Business Dimensional Lifecycle: The Road Map Ralph Kimball's Approach
Implementing a data warehouse requires tightly integrated activities. As we discussed earlier,
there are differe nt DWH implementation strategies, we will be following Kimball's Approach.
Kimball is considered as an authority in the DWH field, and his goal driven approach is a result
of decades of practical experience. This presentation is a an overview of a data warehouse project
lifecycle, based on this approach, from inception through ongoing maintenance, identifying best
practices at each step, as well as potential vulnerabilities. It is believed that everyone on the
project team, including the business analyst, architect, database designer, data stager, and analytic
application developer, needs a high-level understanding of the complete lifecycle of a data
Selection &
Data Staging
Design &
Project Management
Figure -33.1: Business Dimensional Lifecycle (Kimball's Approach)
The business dimensional lifecycle framework, as shown in Figure 33.1, is depicted as a road
map, that is extremely useful if we're about to embark on the unfamiliar journey of data
warehousing. The Kimball's iterative data warehouse development approa ch drew on decades of
experience to develop the business dimensional lifecycle. The name was because it reinforced
several of key tenets for successful data warehousing. First and foremost, data warehouse projects
must focus on the needs of the business. Second, the data presented to the business users must be
dimensional. Finally while data warehousing is an ongoing process, each implementation project
should have a finite cycle with a specific beginning and end. Ongoing project management serves
as a foundation to keep the remainder of the lifecycle on track.
DWH Lifecycle: Key steps
1. Project Planning
2.  Business Requirements Definition
3.  Parallel Tracks
3.1 Lifecycle Technology Track
3.1.1 Technical Architecture
3.1.2 Product Selection
3.2 Lifecycle Data Track
3.2.1 Dimensional Modeling
3.2.2 Physical Design
3.2.3 Data Staging design and development
3.3 Lifecycle Analytic Applications Track
3.3.1 Analytic application specification
3.3.2 Analytic application development
Lifecycle Key Steps
Lifecycle begins with project planning during which we assess the organization's readiness for a
data warehouse initiative, establish the preliminary scope and justification, obtain resources, and
launch the project.
The second major task focuses on business requirements definition. The two -way arrow between
project planning and business requirements definition (as shown in Figure 33.1) shows the much
interplay between these two activities. Data warehouse designers must understand the needs of
the business and translate them into design considerations. Business users and their requirements
have an impact on almost every design and implementation decision made during the course of a
warehouse project. In road map, this is reflected by the three parallel tracks that follow.
The top track deals with technology. Technical architecture design establishes the overall
framework to support the integration of multiple technologies. Using the capabilities identified in
the architecture des ign as a shopping list, we then evaluate and select specific products.
The middle track emanating from business requirements definition focuses on data. We begin by
translating the requirements into a dimensional model which is then transformed into a physical
structure. Physical design activities focus on performance tuning strategies, such as aggregation,
indexing, and partitioning. Last but not least, data staging Extract -Transform-Load (ETL)
processes are designed and developed.
The final set of tasks spawned by the business requirements definition is the design and
development of analytic applications. The data warehouse project isn't done when we deliver
data. Analytic applications, in the form of parameter-driven templates and analyses, will satisfy a
large percentage of the analytic needs of business users.
1. Equally sized boxes (as shown in Figure 33.1) don't represent equally sized efforts, there
is a vast difference in the time and effort required for each major activity
2. Data warehousing is an ongoing process, each implementation project should have a
cycle with a specific beginning and an end.
DWH Lifecycle- Step 1: Project Planning
Assessing Readiness
Business sponsor
Business motivation
Business/IT relationship
The DWH lifecycle begins with the project planning phase. It consists of multiple activities that
must be performed before proceeding ahead in the lifecycle. Let's discuss the planning phase in
Readiness and risk assessment: Before proceeding ahead with significant data warehouse
expenditures, it is prudent to assess the organization's readiness to proceed. Five factors have
been identified as leading indicators of data warehouse success; any shortfalls represent ris ks or
vulnerabilities. Brief description in rank order of importance follows.
Business Sponsor: It is the most critical factor for successful data warehousing. Business
sponsors should have a clear vision for the potential impact of a data warehouse o the
organization. They should be passionate and personally convinced of the project's value while
realistic at the same time. Optimally, the business sponsor has a track record of success with other
internal initiatives. He or she should be a politically astute leader who can convince his or her
peers to support the warehouse.
Business motivation: The second readiness factor is having a strong, compelling business
motivation for building a data warehouse. This factor often goes hand in hand with sponsorship.
A data warehouse project can't merely deliver a nice-to-have capability; it needs to solve critical
business problems in order to garner the resources required for a successful launch and healthy
Feasibility: There are several aspects of feasibility, such as technical or resource feasibility, but
data feasibility is the most crucial. Are we collecting real data in real operational source systems
to support the business requirements? Data feasibility is a major concern because there is no
short -term fix if we're not already collecting reasonably clean source data at the right granularity.
Business/IT relationship: The fourth factor focuses on the relationship between the busi ess and
IT organizations. In your company, does the IT organization understand and respect the business?
Conversely, does the business understand and respect the IT organization? The inability to
honestly answer yes to these questions doesn't mean that you can't proceed. Rather, the data
warehouse initiative can be an opportunity to mend the fence between these organizations,
assuming that both deliver.
Culture: The final aspect of readiness is the current analytic culture within your company. Do
business analysts make decisions based on facts and figures, or are their decisions based on
intuition, anecdotal evidence, expert judgment or gut feeling?
Scoping: Requires the joint input of both the IT organization and business management. The
scope should be both meaningful in terms of its value to the organization and manageable. When
you are first getting started, you should focus on data from a single business process. Save the
more challenging, cross-process projects for a later phase. Sometimes project teams feel that the
delivery schedule is cast in concrete before project planning is even initiated. The prioritization
process can be used to convince IT and business management that adjustments are required.
Finally, remember to avoid the law of too when scoping -too firm of a commitment to too brief of
a timeline involving too many source systems and too many users in too many locations with too
diverse analytic requirements.
DWH Lifecycle- Step 1: Project Planning
Justification (cost vs. benefit)
Team development
Project Plan
§  Identifying all tasks.
§  User acceptance, milestones and deliverables.
§  Making and following a communication matrix.
§  Avoiding scope creep.
§  Partnership with business user.
Keys to project planning & Management
§  Business sponsor
§  Scope
§  Best team
§  Excellent project manager
Justificati on requires an estimation of the benefits and costs associated with a data warehouse.
The anticipated benefits grossly outweigh the costs. IT usually is responsible for deriving the
expenses. You need to determine approximate costs for the requisite hardwa re and software. Data
warehouses tend to expand rapidly, so be sure the estimates allow some room for short-term
Team Development (Staffing): Data warehouse projects require the integration of a cross
functional team with resources from both the business and IT communities. From the business
side of the house, the roles needed are business sponsor, business driver, business lead and
business users. Several other positions are staffed from either the business or IT organizations.
These are business system analyst, business subject matter expert, analytic application developer
and data warehouse educator. The roles typically staffed from the IT organization (or an external
consulting firm) are project manager, technical architect, technical support specialists, data
modeler, database administrator, metadata coordinator, data steward, data staging designer, data
staging developer, and data warehouse support.
Establishing the core team
Hardest to find talent.: The talent you need on the core team is the hardest to find
1. Part-time: Data modeler and database analyst (DBA)
2. Full-time: Experienced & educated people with least one successful implementation.
3. Full-time: (i) DSS Data analyst (ii) 4GL programmer with information center or end -user
support background (iii) experienced data­centric developer.
Data engineers to get started and create potential.
Data analysts or data usage specialists to stay in business.
So, who should be on this small core team? Typically, the firs t talent hired is a data modeler and
the second is a database analyst or DBA. While you need these data specialists, you don't need
them as fulltime, permanent participants of the core team. The best individuals at the core of your
team are people who intimately understand the broad issues of a complete data warehouse
program. Ideally, they studied the field extensively and participated in at least one successful
project. These folks are hard to come by.
Your next choice should be those with experience of packaging data for consumer use: a data
analyst with DSS experience, a 4GL programmer with information center or end -user support
background, a data ­centric developer with real experience using spiral or interactive methods.
You need data engineers to get started and to add new data resources to the warehouse. They
create the potential. You will need data analysts or data usage specialists to stay in business. They
deliver results.
Project Plan: Developing the data warehouse project plan involves identification of all the tasks
necessary to implement the data warehouse. The project plan should identify a user acceptance
checkpoint after every major milestone and deliverable to ensure that the project is still on track
and that the business is still intimately involved. The data warehouse project demands broad
communication. During the project planning phase a communication matrix is helpful to make
certain that the communication strategy is executed. Data warehouse projects are vulnerable to
scope creep largely due to our strong desire to satisfy users' requirements. We need to be most
watchful about the accumulation of minor changes that snowball. The most important thing to
remember about scope changes is that they shouldn't be made in an IT vacuum. The right answer
depends on the situation. Now is the time to leverage your partnership with the business to arrive
at an answer with which everyone can live.
The keys to data warehouse project planning and management include:
1. Having a solid business sponsor
2. Balancing high value and doability to define the scope
3. Working with the best team possible to develop a detailed project plan
4. Being an excellent project manager by motivating, managing, and communicating up,
down, and across the organization .
DWH Lifecycle- Step: 2 Requirements Definition
Requirements preplanning
Requirements collection
Post collection
The second phase in the DWH lifecycle is requirements definition. Performing a requirements
analysis is critical to the success of any project. W ithout a clear goal in mind, success is dubious.
Establishing a broad view of the business' requirements should always be the first step. The
understanding gained will guide everything that follows, and the details can be filled in for each
phase in turn. The double headed arrow between the planning and requirements definition phase
indicates that user requirements drives not only the succeeding phases but the preceding phase as
wee. The changes in user requirements may effect the project plan. This phase is accomplished in
three steps
Requirements preplanning: This phase consists of activities like choosing the forum,
identifying and preparing the requirements team and finally selecting, scheduling and preparing
the business representatives.
Business requirements collection: The requirements collection process flows from an
introduction through structured questioning to a final wrap-up. The major activities involved are,
launching, determining interview flow, wrapping up and conducting data centric interviews.
Post collection: The phase consists of steps like debriefing, documentation, prioritization and
Each of the phases is discussed in detail in the following slides
DWH Lifecycle- Step: 2 Requirements Definition
Requirements Preplanning
§  Interviews
§  Facilitated sessions
§  Hybrid
Requirements team
Business representatives
§  Selecting
§  Scheduling
§  Preparing
Requirements Preplanning
Before sitting down with the business community to gather requirements, it is suggested to set
you up for a productive session by considering the following:
Choose the Forum: There are two primary techniques for gathering requirements i.e. interviews
or facilitated sessions. Both have their advantages and disadvantages. Interviews encourage lots
of individual participation. They are also easier to schedule. Facilitated sessions may reduce the
elapsed time to gather requirements, although they require more time commitment from each
participant. Kimball prefers using a hybrid approach with interviews to gather the gory details
and then facilitation to bring the group to consensus. However, the forum choice depends on the
team's skills, the organization's culture, and what you've already subjected your users to. This is a
case in which one size definitely does not fit all.
Identify and Prepare the Requirements Team: Regardless of the approach, you need to identify
and prepare the project team members who are involved. If you're doing interviews, you need to
identify a lead interviewer whose primary responsibility is to ask the great open -ended questions.
Meanwhile, the interview scribe takes copious notes. Before you sit down with users, you need to
make sure you're approaching the sessions with the right mindset. Since the key to getting the
right answers is asking the right questions, we recommend that questionnaires be formulated
before user meetings. It is a tool to organize your thoughts and serve as a fallback device in case
your mind goes blank during the interview session.
Business Representatives Soliciting: If this is first foray into data warehousing, talk to business
people that represent horizontal breadth across the organization. This coverage is critical to
formulating the data warehouse bus matrix blueprint. Within the target user c mmunity, one
should cover the organization vertically.
Scheduling: Schedule representatives nicely. The scheduler needs to allow ½ hour between
meetings for debriefing and other necessities. Interviewing is extremely taxing because you must
be completely focused for the duration of the session. Consequently, it is recommended to
schedule three to four sessions in a day because the interviewers get very tired after that, and
productivity goes down.
Preparing: The optimal approach is to conduct a project launch meeting with the users. The
launch meeting disseminates a consistent message about the project. The interview team must
prepare the interviewees by highlighting the topics to be covered in the upcoming session. It is
advised that do not include a copy of the questionnaire, which is not intended for public
dissemination. One can ask the interviewees to bring copies of their key reports and analyses.
DWH Lifecycle- Step : 2 Requirements Definition
Requirements Collection
Interview flow
§  Rules for effective interviewing
§  Role
§  Flow
§  Consistency
§  Vocabulary
Wrap up
Conducting data centric interviews
Launch: Responsibility for introducing the interview should be established prior to gathering in a
conference room. The designated kicko ff person should script the primary points to be conveyed
in the first couple of minutes. The introduction should convey a crisp, business-centric message
rather than rambling on about the hardware, software, and other technical jargon.
Interview Flow: The objective of an interview is to get business users to talk about what they do,
and why they do it. Determining how they track progress and success translates directly into the
dimensional model. If we're meeting with a person who has more hands-on data experience, we
indirectly probe to better understand the dimensionality of the business, along with hierarchical
roll -ups. If the interviewee is more analytic, we ask about the types of analyses currently being
performed. If we are meeting with business executives , we usually don't delve into the details,
and just ask about their vision for better leveraging information in the organization.
Rules for Effective interviewing include: (i) Remember your interview role and listen and absorb
like a sponge. (ii) Strive for a conversational flow, don't dive too quickly. (iii) Verify
communication and capture terminology precisely because mostly its inconsistent (iv) establish a
peer basis with the interviewee; use his or her vocabulary.
Wrap-up: At the conclusion of interview, ask each interviewee about his success criteria for the
project which should be measurable. At this point, the interviewees must understand that just
because we discussed a capability in the meeting doesn't guarantee that it'll be included in the first
phase of the project. We thank interviewees for their brilliant insights and let them know what's
happening next and what their involvement will be.
Conducting Data -Centric Interviews: While we're focused on understanding the requirements o f
the business, it is helpful to intersperse sessions with the source system data gurus or subject
matter experts to evaluate the feasibility of supporting the business needs. These data-focused
interviews are quite different from the ones just described. The goal is to assess that the necessary
core data exists before momentum builds behind the requirements. A more complete data audit
will occur during the dimensional modeling process. We're trying to learn enough at this point to
manage the organization's expectations appropriately.
DWH Lifecycle- Step : 2 Requirements Definition
Post Collection
§  Same "frequency"
Fill-in gaps
Review reports/material
§  Meaningful write-up of each individual interview.
§  Identify key business process
§  Requirements of key processes
§  Process x group/user matrix
Debriefing: Immediately following the interview, the interview team should debrief. You want to
ensure that you're on the same page about what was learned, as well as being prepared fo r any
surprises or inconsistencies. It is also helpful to review your notes quickly to fill in any gaps
while the interview is still fresh in your memory. Likewise, you should examine the reports
gathered to gain further offline insight into the dimensionality that must be supported in the data
Documentation: While documentation is everyone's least favorite activity, it is critical for both
user validation and project team reference materials. There are two levels of documentation that
typically result from the requirements process. The first is to write up each individual interview.
This activity can be quite time -consuming because the write-up should not be merely a stream -of-
consciousness transcript but should make sense to someone who wasn't in the interview. The
second level of documentation is a consolidated findings document. Organize the document by
first identifying the key business processes. Consequently, it is logical to organize the
requirements of the business into the same buckets that will, in turn, become implementation
efforts. Sometimes the processes are brought together in a matrix to convey the opportunities
across the organization. The rows of the opportunity matrix identify the business processes while
the columns identify th organizational groups or functions.
Prioritization and Consensus: Four Cell Quadrant Technique
The requirements findings document serves as the basis for presentations back to senior
management representatives. The requirements wrap-up presentation is positioned as a findings
review and prioritization meeting. Once the findings have been reviewed, it is time to prioritize.
The four-cell quadrant technique (shown in Figure 33.2) is an effective tool for reaching consen-
sus on a data warehouse developme nt plan that focuses on the right initial opportunities. The
quadrant's vertical axis refers to the potential impact or value to the business. The horizontal axis
conveys feasibility of each of the findings.
Figure -33.2: The quadrant method
So the proc ess having higher feasibility and impact is given higher priority over the process
having lower feasibility and impact. In example of Figure 33.2, process A has highest priority
while the process D has lowest priority.
Table of Contents:
  1. Need of Data Warehousing
  2. Why a DWH, Warehousing
  3. The Basic Concept of Data Warehousing
  4. Classical SDLC and DWH SDLC, CLDS, Online Transaction Processing
  5. Types of Data Warehouses: Financial, Telecommunication, Insurance, Human Resource
  6. Normalization: Anomalies, 1NF, 2NF, INSERT, UPDATE, DELETE
  7. De-Normalization: Balance between Normalization and De-Normalization
  8. DeNormalization Techniques: Splitting Tables, Horizontal splitting, Vertical Splitting, Pre-Joining Tables, Adding Redundant Columns, Derived Attributes
  9. Issues of De-Normalization: Storage, Performance, Maintenance, Ease-of-use
  10. Online Analytical Processing OLAP: DWH and OLAP, OLTP
  11. OLAP Implementations: MOLAP, ROLAP, HOLAP, DOLAP
  12. ROLAP: Relational Database, ROLAP cube, Issues
  13. Dimensional Modeling DM: ER modeling, The Paradox, ER vs. DM,
  14. Process of Dimensional Modeling: Four Step: Choose Business Process, Grain, Facts, Dimensions
  15. Issues of Dimensional Modeling: Additive vs Non-Additive facts, Classification of Aggregation Functions
  16. Extract Transform Load ETL: ETL Cycle, Processing, Data Extraction, Data Transformation
  17. Issues of ETL: Diversity in source systems and platforms
  18. Issues of ETL: legacy data, Web scrapping, data quality, ETL vs ELT
  19. ETL Detail: Data Cleansing: data scrubbing, Dirty Data, Lexical Errors, Irregularities, Integrity Constraint Violation, Duplication
  20. Data Duplication Elimination and BSN Method: Record linkage, Merge, purge, Entity reconciliation, List washing and data cleansing
  21. Introduction to Data Quality Management: Intrinsic, Realistic, Orr’s Laws of Data Quality, TQM
  22. DQM: Quantifying Data Quality: Free-of-error, Completeness, Consistency, Ratios
  23. Total DQM: TDQM in a DWH, Data Quality Management Process
  24. Need for Speed: Parallelism: Scalability, Terminology, Parallelization OLTP Vs DSS
  25. Need for Speed: Hardware Techniques: Data Parallelism Concept
  26. Conventional Indexing Techniques: Concept, Goals, Dense Index, Sparse Index
  27. Special Indexing Techniques: Inverted, Bit map, Cluster, Join indexes
  28. Join Techniques: Nested loop, Sort Merge, Hash based join
  29. Data mining (DM): Knowledge Discovery in Databases KDD
  31. Data Structures, types of Data Mining, Min-Max Distance, One-way, K-Means Clustering
  32. DWH Lifecycle: Data-Driven, Goal-Driven, User-Driven Methodologies
  33. DWH Implementation: Goal Driven Approach
  34. DWH Implementation: Goal Driven Approach
  35. DWH Life Cycle: Pitfalls, Mistakes, Tips
  36. Course Project
  37. Contents of Project Reports
  38. Case Study: Agri-Data Warehouse
  39. Web Warehousing: Drawbacks of traditional web sear ches, web search, Web traffic record: Log files
  40. Web Warehousing: Issues, Time-contiguous Log Entries, Transient Cookies, SSL, session ID Ping-pong, Persistent Cookies
  41. Data Transfer Service (DTS)
  42. Lab Data Set: Multi -Campus University
  43. Extracting Data Using Wizard
  44. Data Profiling