ZeePedia buy college essays online


Data Warehousing

<<< Previous DWH Implementation: Goal Driven Approach Next >>>
 
img
Lecture No. 34
DWH Implementation: Goal Driven Approach
(Lecture based on "The data warehousing toolkit by Ralph Kimball and Margy Ross, 2nd Edition)
Figure: 34.1: Business Dimensional Lifecycle (Kimball's Approach)
We have already discussed that the DWH development lifecycle (Kimball's Approach) has three
parallel tracks emanating from requirements definition. These are technology track, data track
and analytic applications track. The focus will be on technology track and analytic track as we
have discussed thoroug hly about data in previous lectures. So first of all we will discuss
technology track in detail.
Technical Architecture Design
Constructing a kitchen involves mason, plumber (water, gas), carpenter, electrician, iron -smith,
painter, interior designer AND Takhedar.
Need a common document which links the works of all, i.e. blue-print or a map NEED Architect.
Tech. Arch. design is a similar document.
Help catch problems on paper and minimize surprises.
As shown in Figure34.1, the first task in technolo gy track is the technical architecture design.
What is meant by architecture? Is it really needed? The questions can be answered by considering
the following analogy. Before constructing a home, we must have a complete home designed by
some good architect. To further simplify things, let's narrow down our example to a single unit of
home like a kitchen. Building a kitchen involves people like mason, plumber both for gas and
water, carpenter, electrician and the list goes on. All have their own expertise and specific tasks to
perform which when integrate result in a fully functional kitchen. But how these roles coordinate?
where should be the sink, where should be the entrance, where should be the window (if there),
where should be the cupboards etc. All these are specified by the architect who is the person who
develops and gives the overall construction plan i.e. architecture of the kitchen keeping in view
the overall home.
286
img
The architecture plan thus provides an overall picture and a blue print of all facilities in combine.
It's a document that is used by the contractor (implementer) for scheduling different construction
activities.
The architecture plan, in both cases, allows us to catch problems on paper (such as having the
dishwasher too far from the s ink) and minimize mid project surprises. It supports the coordination
of parallel efforts while speeding development through the reuse of modular components.
Technical Architecture Design: More
Identify components needed i.e. now versus required at a later stage sink vs. gas -light.
Organizing framework for integration of technologies
Supports communication regarding a consistent set of technical requirements:
· Within the team
· Upward to management, and
· Outward to vendors.
Since architecture is a document that gives an overall picture, the major components are easily
identifiable. Similarly we can also identify immediately required components versus those which
will be incorporated at a later stage. In our kitchen analogy the sink is a must component, a
kitchen without a sink is not a kitchen. Similarly, having a gas -light in kitchen is a nice -t o-have
component. It is better if some kitchen has gas-light for electric load shading days but a kitchen
without it, will still be a kitchen.
Thus much like a blue print for a new home, the technical architecture is the blueprint for the
warehouse's technical services and elements. The architecture plan serves as an organizing
framework to support the integration of technologies.
Most importantly, the archite cture plan serves as a communication tool. Kitchen construction
blueprints allow the architect, general contractor, subcontractors, and homeowner to
communicate from a common document. The plumber knows that the electrician has power in
place for the garb age disposal. Likewise, the data warehouse technical architecture supports
communication regarding a consistent set of technical requirements within the team, upward to
management, and outward to vendors. It provides a common ground for all stakeholders f r
o
mutual and consistent understanding of the overall system.
Note: All points discussed or to be discussed may not be completely applicable in our local
environment
DWH Lifecycle- Step 3.1:Technology Track
8-Step Process
1. Establish an Architecture Tas k Force (2-3 people)
2. Collect Architecture-Related Requirements
·  Business needs => HW not other way round
·  Architectural implications of business needs
·  Timing, performance and availability needs
·  Talk to IT people for current standards, directions and boun daries.
287
img
·
Lessons learned.
3. Document Architecture Requirements
·  Make a matrix (row business process & column architectural implication)
·  Global sales coverage => 24/7 availability, data mirroring, adequate network
bandwidth etc.
8 Step Process
Data warehouse teams approach the technical architecture design process from opposite ends of
the spectrum. Some teams are so focused on data warehouse delivery that the architecture feels
like a distraction and impediment to progress and eventually, these teams often end up rebuilding.
At the other extreme, some teams want to invest two years designing the architecture while
forgetting that the primary purpose of a data warehouse is to solve business problems, not address
any plausible (and not so plausible) technical challenge. Neither end of the architecture spectrum
is healthy; the most appropriate response lies somewhere in the middle. Kimball suggests an
eight-step process for building technical architecture. All steps will be discussed in detail one by
one.
1. Establish an Architecture Task Force
It is most useful to have a small task force of two to three people focus on architecture design.
Typically these are technical architect, the data staging designer and analytic application
developer. This group needs to establish its charter and deliverables time line. It also needs to
educate the rest of the team (and perhaps others in the IT organization) about the importance of
architecture.
2. Collect Architecture-Related Requirements:
Defining the technical architectu re is not the first box in the lifecycle diagram, as shown in Figure
34.1. This implies that the architecture is created to support high value business needs; it's not
meant to be an excuse to purchase the latest, greatest products.
The key input into the design process should come from the business requirements definition
findings with a slightly different filter to drive the architecture design. The focus is to uncover the
architectural implications associated with the business's critical needs e.g. like any timing,
availability, and performance needs.
In addition to leveraging the business requirements definition process, additional interviews
within the IT organization are also conducted. These are purely technology -focused sessions to
understand current standards, planned technical directions, and nonnegotiable boundaries.
Also, lessons learned from prior information delivery projects, as well as the organization's
willingness to accommodate operational change on behalf of the warehouse, can be uncovered
such as identifying updated transactions in the source system.
3. Document Architecture Requirements
Once the business requirements definition process is leveraged and supplemental IT interviews
conducted, the findings need to be documented. A simplistic MATRIX can be used for this
purpose. The rows of the matrix list each business requirement that has an impact on the
architecture, while matrix columns contain the list of architectural implications.
288
img
As an example supposes that a business is spread globally and there is a need to deliver global
sales performance data on a nightly basis. The technical implications might include 24/7
worldwide availability, data mirroring for loads, robust metadata for support global access, ade -
quate network bandwidth, and sufficient staging horsepower to handle the complex integration of
operational data and so on.
DWH Lifecycle- Step 3.1:Technology Track
4. Develop a high-level Arch. Model
·  Several days of heavy thinking in conference room.
·  Grouping of requirements (data staging, data access, meta)
·  High level refinement (up -front, nuts -bolts hidden) of major systems.
5. Design and Specify the Subsystems
·  For each subsystem (data staging), detailed list of capabilities.
·  Do research (internet, peers etc.) more graphic models generated.
·  Also consider security, physical infrastructure, and configuration.
·  Sometimes infrastructure i.e. HW and SW pre-determined.
·  Determine Architecture Implementation Phases.
·  For more than 1TB DWH, revisit infrastructure.
After the architecture requirements have been documented, models are formulated to support the
identified needs the architecture task force often sequesters itself in a conference room for several
days of heavy thinking. The team groups the architecture requirements into major components,
such as data staging, data access, metadata, and infrastructure. From there the team drafts and
refines the high-level architectural model. This drawing is similar to the front elevation page on
housing blueprints. It illustrates what th e warehouse architecture will look like from the street, but
it is dangerously simplistic because significant details are embedded in the pages that follow.
It is time now to do a detailed design of the subsystems. For each component, such as data staging
services, the task force will document a laundry list of requisite capabilities. The more specific,
the better, because what's important to your data ware house is not necessarily critical to mine.
This effort often requires preliminary research to better understand the market. Fortunately, there
is no shortage of information and resources available on the Internet, as well as from networking
with peers. The subsystem specification results in additional detailed graphic models. In addition
to documenting the capabilities of the primary subsystems, we also must consider our security
requirements, as well as the physical infrastructure and configuration needs. Often, we can
leverage enterprise-level resources to assist with the security strategy. In some cases the
infrastructure choices, such as the server hardware and database software, are predetermined. At
this point in time we must have got an idea of what should be the implementation steps/phases
that will be used for the DWH implementation. However, if building a large data warehouse, over
1 TB in size, we should revisit these infrastructure platform decisions to ensure that they can
scale as required. Size, scalability, performance, and flexibility are also key factors to consider
when determining the role of OLAP cubes in the overall technical architecture.
289
img
DWH Lifecycle- Step 3.1:Technology Track
6. Determine Architectural implementation phases
·  Can't implement everything simultaneously.
·  Some are negotiable mandatory, others nice-t o-haves, later.
·  Business requirements set the priority.
·  Priorities assigned by looking at all the requirements.
7. Document the Technical Architecture
§  Document phases decided.
§  Material for those not present in the conference room.
§  Adequate details for skilled profess ional (carpenter in kitchen)
8. Review and finalize the technical architecture
§  Educate organization, manage expectations.
§  Communicate to varying level of details to different levels of team
§  Subsequently, put to use immediately for product selection
Like the Kitchen's analogy, we likely can't implement all aspects of the technical architecture at
once. Some are nonnegotiable mandatory capabilities, whereas others are nice-to-haves that can
be deferred until a later date. Again, we refer back to the busine ss requirements to establish
architecture priorities. Business requirements drive the architecture and not the other way round.
We must provide sufficient elements of the architecture to support the end -t o-end requirements of
the initial project iteration. It would be ineffective to focus solely on data staging services while
ignoring the capabilities required for metadata and access services.
We need to document the technical architecture, including the planned implementation phases,
for those who were not sequestered in the conference room. The technical architecture plan
document should include adequate detail so that skilled professionals can proceed with
construction of the framework, much like carpenters frame a house based on the blueprint.
Eventually the architecture building process (Technology Track) comes to an end.
With a draft plan in hand, the architecture task force is back to educating the organization and
managing expectations. The architecture plan should be communicated, at varying l vels of
e
detail, to the project team, IT colleagues, business sponsors, and business leads. Following the
review, documentation should be updated and put to use immediately in the product selection
process.
290
img
DWH Lifecycle- Step 3.1:Technology Track
3.1.2 Product selection and Installation
·
Understand corporate purchasing process
·
Product evaluation matrix
o Not too vague/generic
o Not too specific
·
Market research (own ugly son)
o Understand players and offerings
o Internet, colleagues, exhibitions etc.
o RFP is an option, but time consuming and beauty contest
·
Narrow options, perform detailed evaluations
o Few vendors can meet tech.& functional requirements
o Involve business reps.
o You drive the process, not the vendors.
o Centered around needs, not bells -and -whistles.
o Talk to references of similar size installations.
Understand the corporate purchasing process: The first step before selecting new products is
to understand the internal hardware and software purchase approval processes, whether we like
them or not. Perhaps expenditures need to be approved by the capital appropriations committee.
Or you may be asked to provide a bank guarantee against the funds released to buy hardware.
Develop a product evaluation matrix: Using the architecture plan as a starting point, we
develop a spreadsheet -based evaluation matrix that identifies the evaluation criteria, along with
weighting factors to indicate importance. The more specific the criteria, the better. If the criteria
are too vague or generic, every vendor will say it ca n satisfy our needs. On the other hand, if the
criterion is too specific, everyone will shout favoritism.
Conduct market research: We must be informed buyers when selecting products, which mean
more extensive market research to better understand the players and their offerings. We must not
place the ball in vendor's court because he will never bring forth limitations of his tool. Its like
once a Badsha Salamat asked his Wazir to bring the most beautiful child of his Kingdom. Wazir
returned thrice with th e same boy who was ugly. Badshah warned his Wazir of severe
consequences and gave him yet another chance to search. Wazir returned with the same boy
again. Badshah was astonished and angry (at the same time) and asked his Wazir why he was
bringing the same ugly boy again and again although he was not up to the standards? Wazir
replied that he had walked around all over the town, but couldn't find anyone as beautiful as that
boy, who was his son. Thus we must not rely on vendors and must make self efforts to gain as
much insight into the tools as possible. For this purpose, we can use potential research sources
including the Internet, industry publications, colleagues, conferences, vendors, exhibitions and
analysts (although be aware that analyst opinions m y not be as objective as we're lead to
a
believe).
Narrow options to a short list and perform detailed evaluations. Despite the plethora of
products available in the market, usually only a small number of vendors can meet both our
291
img
functionality and technical requirements. By comparing preliminary scores from the evaluation
matrix, we should focus on a narrow list of vendors about whom we are serious and disqualify the
rest. Once we're dealing with a limited number of vendors, we can begin the detailed evaluations.
Business representatives should be involved in this process if we're evaluating data access tools.
As evaluators, we should drive the process rather than allow the vendors to do the driving. We
share relevant information from the architecture plan so that the sessions focus on our needs
rather than on product bells and whis tles. Be sure to talk with vendor references, both those
provided formally and those elicited from your informal network. If possible, the references
should represent similarly sized installations.
DWH Lifecycle- Step 3.1:Technology Track
3.1.2 Product selection and Installation
§
Conduct prototype, if necessary
§  If one clear winner bubbles up, it is good.
§  Winner due to experience, relationship, commitment
§  Prototype with no more than two products
§  Demonstrate using a limited, yet realistic application using flat text file.
§
Keep the competition "hot"
§  Even if single winner, keep at least two in
§  Use virtual competition to bargain with the winner
§
Select product, install on trial, and negotiate
§  Make private not public commitment.
§  Don't let the vendor you are completely sold.
§  During trial period, put to real use.
§  Near the end of trial, negotiate.
Conduct prototype, if necessary: After performing the detailed evaluations, somet imes a clear
winner bubbles to the top, often based on the team's prior experience or relationships. In other
cases, the leader emerges due to existing corporate commitments. In either case, when a sole
candidate emerges as the winner, we can bypass the prototype step. If no vendor is the apparent
winner, we conduct a prototype with no more than two products.
Keep the competition "hot": Even if a single winner is left, it is a good piece of advice that
always keep at least two. What if you keep one? The sole vendor may take benefit of the situation
that he is the only player and create a situation favorable for him. He might get an upper hand in
the bargaining process, and mold things according to his facility and benefit. To avoid such a
situation enlist a competitor too, even if a single vendor is the winner. This will create a
competitive environment which may ultimately turn into your favor.
Select product, install on trial, and negotiate: It is time to select a product. Rather than
immediately signin g on the dotted line, preserve your negotiating power by making a private, not
public, commitment to a single vendor. Embark on a trial period where you have the opportunity
292
img
to put the product to real use in your environment. As the trial draws to a close, you have the
opportunity to negotiate a purchase that's beneficial to all parties involved.
DWH Lifecycle- Step 3.3: Analytic Applications Track
§
Overview
§  Design and develop applications for analysis.
§  It is really the "fun part".
§  Technology used to help the business.
§  Strengthen relationship between IT and business user.
§  The DWH "face" to the business user.
§  Querying NOT completely ad-hoc.
§  Parameter driven querying satisfy large % of needs.
§  Develop consist analytic frame -work instead of shades of Excel macros.
The final set of parallel activities following the business requirements definition in Figure 34.1 is
the analytic application track, where we design and develop the applications that address a
portion of the users' analytic requirements. As a we ll-respected application developer once told,
"Remember, this is the fun part!" We're finally using the investment in technology and data to
help users make better decisions. The applications provide a key mechanism for strengthening the
relationship between the project team and the business community. They serve to present the data
warehouse's face to its business users, and they bring the business needs back into the team of
application developers.
While some may feel that the data warehouse should be a completely ad hoc query environment,
delivering parameter-driven analytic applications will satisfy a large percentage of the business
community's needs. There's no sense making every user start from scratch. Constructing a set of
analytic applications establishes a consistent analytic framework for the organization rather than
allowing each Excel macro to tell a slightly different story. Analytic applications also serve to
encapsulate the analytic expertise of the organization, providing a jump -start for the less
analytically inclined.
DWH Lifecycle- Step 3.3: Analytic Applications Track
§
3.3.1 Analytic applications specification
§
Starter set of 10-15 applications.
§
Prioritize and narrow to critical capabilities.
§
Single template use to get 15 applications.
§
Set standards: Menu, O/P, look feel.
§
From standard: Template, layout, I/P variables, calculations.
§
Common understanding between business & IT users.
Following the business requirements definition, we need to review the findings and collected
sample reports to identify a starter set of approximately 10 to 15 analytic applications. We want
to narrow our initial focus to the most critical capabilities so that we can manage expectations and
ensure on-time delivery. Business community input will be critical to this prioritization process.
While 15 applications may not sound like much, the number of specific analyses that can be
created from a single template merely by changing variables will surprise you.
293
img
Before we start designing the initial applicati ns, it's helpful to establish standards for the
o
applications, such as common pull-down menus and consistent output look and feel. Using the
standards, we specify each application template, capturing sufficient information about the layout,
input variables, calculations, and breaks so that both the application developer and business
representatives share a common understanding.
During the application specification activity, we also must give consideration to the organization
of the applications. We need to identify structured navigational paths to access the applications,
reflecting the way users think about their business. Leveraging the Web and customizable
information portals are the dominant strategies for disseminating application access.
DWH Li fecycle- Step 3.3: Analytic Applications Track
§
3.3.2 Analytic applications development
§
Standards: naming, coding, libraries etc.
§
Coding begins AFTER DB design complete, data access tools installed, subset of
historical data loaded.
§
Tools: Product speci fic high performance tricks, invest in tool-specific education.
§
Benefits: Quality problems will be found with tool usage => staging.
§
Actual performance and time gauged.
When we move into the development phase for the analytic applications, we again need to focus
on standards. Standards for naming conventions, calculations, libraries, and coding should be
established to minimize future rework. The application development activity can begin once the
database design is complete, the data access tools an d metadata are installed, and a subset of his -
torical data has been loaded. The application template specifications should be revisited to
account for the inevitable changes to the data model since the specifications were completed.
Each tool on the market has product -specific tricks that can cause it to literally walk on its head
with eyes closed. Therefore, rather than trying to learn the techniques via trial and error, you
should invest in appropriate tool-specific education or supplemental resources for the
development team.
While the applications are being developed, several ancillary benefits result. Application
developers, armed with a robust data access tool, quickly will find needling problems in the data
haystack despite the quality assurance performed by the staging application. This is one reason
why we prefer to get started on the application development activity prior to the supposed
completion of staging. Of course, we need to allow time in the schedule to address any flaws
identified by the analytic applications. The developers also will be the first to realistically test
query response times. Now is the time to begin reviewing our performance -tuning strategies.
The application development quality-assurance activities cannot be completed until the data is
294
img
stabilized. We need to make sure that there is adequate time in the schedule beyond the final
data staging cutoff to allow for an orderly wrap -up of the application development tasks.
DW Lifecycle- Step 4: Deployment
·
The three tracks converge at deployment.
·
Not natural, require substantial pre -planning, courage, will -power and honesty.
·
Something like waiting for arrival of Baarat , as only then meals can be served
·
The Dulaaha is the key like data.
·
Should serve uncooked data i.e. Dulaaha under a Sehara (not possible now) or wait and
miss the deadline.
The technology, data, and analytic application tracks converge at deployment. Deployment is
similar to serving meal at a large marriage gathering while the Baraat is late. The Baraat has n ot
reached yet, only then the meal will be served. The organizers from Dulhans side are in a fix. If
they serve meal without Dulha, there can be a big showdown, and Dulha may decide to go back.
If they don't serve the meals, guests are getting angry and hungry by the minute. It can be
difficult to predict exactly how long it will take to serve the food, as it is linked with the arrival of
the Baraat. If it were old days when Dulhas used to wear sehras it was quite possible to place
anybody wearing sehra and serve food instead of waiting and missing the deadline, but that's no
more a norm nowadays. In the case of data warehouse deployment, the data (is the main entree,
analogous to the Dulha). Unfortunately, in data warehousing, even if the data isn't fully available
or ready, some people often still proceed with deployment because we have told the warehouse
users that they'd be served on a specific date and time (i.e. when the Dulha and his relatives will
arrive).
DW Lifecycle- Step 4: Deployment
§
Other than data readiness, education and support are critical.
§
Educate on complete warehouse deliverable:
§  Data
§  Analytic applications
§  Data access tools
§  Education tools (1 to 2 days of development for each hour of education)
§
For effective education:
§  Understand the audience, don't overwhelm.
§  Train after delivery of data and analytic applications
§  Postpone education, if DWH not ready.
§  "No education, no access policy".
Readiness assessment: Perhaps more important, a successful deployment demands the courage
and willpower to assess the project's preparedness to deploy honestly. Deployment is similar to
serving meal to friends and relatives on a wedding, which we have already discussed. It can be
difficult to predict exactly how long it will take for the Baraat to arrive.
295
img
Education: Since the user community must accept the warehouse for it to be deemed successful,
education is critical. The education program needs to focus on the complete warehouse
deliverable: data, analytic applications, and the data access tool (as appropriate). Consider the
following for an effective education program; (i) Understand your target audience, don't
overwhelm, (ii) Don't train the business community early prior to the availability of data and
analytic applications, (iii) No release, no education: Postpone the education (and deployment) if
the data warehouse is not ready to be released, (iv) Gain the sponsor's commitment to a "no
education, no access" policy.
DW Lifecycle- Step 5: Maintenance and Growth
§
Support
§
Education
§
Technical support
§
Program support
§
Growth
We've made it through deployment, so now we're ready to kick back and relax. Not so quickly!
Our job is far from complete once we've deployed. We need to continue to invest resources in the
listed areas. Each is discussed in detail in the following slides.
DW Lifecycle- Step 5: Maintenance and Growth
Support
§
Critical to hook the user.
§
For first several weeks, work with users.
§
No news is NOT good news.
§
Relocate to the users if needed.
§
If problems uncovered, be honest, immediate action to fix.
§
If deliverable is of not high quality, rework could be substantial.
296
img
User support
It is crucial immediately following the deployment in order to ensure that the business community
gets hooked. For the first several weeks following user education, the support team should be
working proactively with the users. Relocate (at least temporarily) to the business community so
that the users have easy access to support resources. If problems with the data or applications a re
uncovered, be honest with the business to build credibility while taking immediate action to
correct the problems. If your warehouse deliverable is not of high quality, the unanticipated
support demands for data reconciliation and application rework can be overwhelming.
DW Lifecycle- Step 5: Maintenance and Growth
Education
§
Continuing education program.
§
Formal refresher, as well as advanced courses and repeat introductory course.
§
Informal education for developers and power users for exchange o f ideas.
Education
We need to provide a continuing education program for the data warehouse. The curriculum
should include formal refresher and advanced courses, as well as repeat introductory courses.
More informal education can be offered to the deve lopers and power users to encourage the
interchange of ideas.
DW Lifecycle- Step 5: Maintenance and Growth
Technical support
§  No longer nice-to-have, but to be treated as a production environment.
§
Performance to be monitored proactively and usage trends noted.
§
Business users not responsible to tell that the performance has degraded.
Technical support
The data warehouse is no longer a nice-to-have but needs to be treated as a production
environment, complete with service level agreements. Of course, technical support should
proactively monitor performance and system capacity trends. Don't want to rely on the business
community to tell that performance has degraded.
DW Lifecycle- Step 5: Maintenance and Growth
297
img
Program support
§  A DWH phase may wind-down, but the DWH program lives.
§
Market your success.
§
Ensure implementation on track and address to business needs.
§
Ongoing checkpoint to be implemented.
§
Don't loose focus, else failure.
Program support
We need to continue monitoring progress against the agreed-on success criteria. We need to
market our success. We also need to ensure that the existing implementations remain on track and
continue to address the needs of the business. Ongoing checkpoint reviews are a key tool to
assess and identify opportunities for improvement with prior deliverables. Data warehouses most
often fall off track when they lose their focus on serving the information needs of the business
users.
298
Table of Contents:
  1. Need of Data Warehousing
  2. Why a DWH, Warehousing
  3. The Basic Concept of Data Warehousing
  4. Classical SDLC and DWH SDLC, CLDS, Online Transaction Processing
  5. Types of Data Warehouses: Financial, Telecommunication, Insurance, Human Resource
  6. Normalization: Anomalies, 1NF, 2NF, INSERT, UPDATE, DELETE
  7. De-Normalization: Balance between Normalization and De-Normalization
  8. DeNormalization Techniques: Splitting Tables, Horizontal splitting, Vertical Splitting, Pre-Joining Tables, Adding Redundant Columns, Derived Attributes
  9. Issues of De-Normalization: Storage, Performance, Maintenance, Ease-of-use
  10. Online Analytical Processing OLAP: DWH and OLAP, OLTP
  11. OLAP Implementations: MOLAP, ROLAP, HOLAP, DOLAP
  12. ROLAP: Relational Database, ROLAP cube, Issues
  13. Dimensional Modeling DM: ER modeling, The Paradox, ER vs. DM,
  14. Process of Dimensional Modeling: Four Step: Choose Business Process, Grain, Facts, Dimensions
  15. Issues of Dimensional Modeling: Additive vs Non-Additive facts, Classification of Aggregation Functions
  16. Extract Transform Load ETL: ETL Cycle, Processing, Data Extraction, Data Transformation
  17. Issues of ETL: Diversity in source systems and platforms
  18. Issues of ETL: legacy data, Web scrapping, data quality, ETL vs ELT
  19. ETL Detail: Data Cleansing: data scrubbing, Dirty Data, Lexical Errors, Irregularities, Integrity Constraint Violation, Duplication
  20. Data Duplication Elimination and BSN Method: Record linkage, Merge, purge, Entity reconciliation, List washing and data cleansing
  21. Introduction to Data Quality Management: Intrinsic, Realistic, Orr’s Laws of Data Quality, TQM
  22. DQM: Quantifying Data Quality: Free-of-error, Completeness, Consistency, Ratios
  23. Total DQM: TDQM in a DWH, Data Quality Management Process
  24. Need for Speed: Parallelism: Scalability, Terminology, Parallelization OLTP Vs DSS
  25. Need for Speed: Hardware Techniques: Data Parallelism Concept
  26. Conventional Indexing Techniques: Concept, Goals, Dense Index, Sparse Index
  27. Special Indexing Techniques: Inverted, Bit map, Cluster, Join indexes
  28. Join Techniques: Nested loop, Sort Merge, Hash based join
  29. Data mining (DM): Knowledge Discovery in Databases KDD
  30. Data Mining: CLASSIFICATION, ESTIMATION, PREDICTION, CLUSTERING,
  31. Data Structures, types of Data Mining, Min-Max Distance, One-way, K-Means Clustering
  32. DWH Lifecycle: Data-Driven, Goal-Driven, User-Driven Methodologies
  33. DWH Implementation: Goal Driven Approach
  34. DWH Implementation: Goal Driven Approach
  35. DWH Life Cycle: Pitfalls, Mistakes, Tips
  36. Course Project
  37. Contents of Project Reports
  38. Case Study: Agri-Data Warehouse
  39. Web Warehousing: Drawbacks of traditional web sear ches, web search, Web traffic record: Log files
  40. Web Warehousing: Issues, Time-contiguous Log Entries, Transient Cookies, SSL, session ID Ping-pong, Persistent Cookies
  41. Data Transfer Service (DTS)
  42. Lab Data Set: Multi -Campus University
  43. Extracting Data Using Wizard
  44. Data Profiling