Types of Data Warehouses: Financial, Telecommunication, Insurance, Human Resource

<< Classical SDLC and DWH SDLC, CLDS, Online Transaction Processing

Normalization: Anomalies, 1NF, 2NF, INSERT, UPDATE, DELETE >>

Lecture Handout

Data Warehousing

Lecture No. 05

Types of Data Warehouses

�

Financial

�

Telecommunication

�

Insurance

�

Human Resource

Financial DWH

�

First data warehouse that an organization builds. This is appealing because:

�

Nerve center, easy to get attention.

�

In most organizations, smallest data set.

�

Touches all aspects of an organization, with a common denomination i.e. money.

�

Inherent structure of data directly influenced by the day-to-day activities of

financial processing.

�

Financial data warehouses suffer from an anomaly of inability to match balanced

accounts, due to many legitimate reasons.

Financial data warehouses are often the first data warehouse that an organization builds. This is

appealing because:

Financial data is ALWAYS at the nerve center of the organization. Therefore, getting attention

drawn to a well-built financial data warehouse is a very easy thing to do. In most organizations

(but not all), financial data represents one of the smallest volumes of data that exist. Finance cuts

across all of the aspects of the corporate data and has a common denominator i.e. money.

Financial data inherently has a structure of data directly influenced by the day -to-day activities of

financial processing and the list goes on. For these reasons, finance becomes a very good target

for beginning to build the corporate data warehouse.

Financial data warehouses have some severe drawbacks that are not found elsewhere. When

purists use a financial data warehouse, they are overjoyed to find that it is almost impossible to

reconcile down to the rupee. Again and again they will report that they saw a report yesterday that

stated the balance to be Rs. 12,345,678 but when did the same report today the using our

accounting MIS system the balance was Rs. 13,245,678. Hence one can not trust the new

systemJ.

Note that it is unlikely that the financial data warehouse would balance to the last rupee, such a

notion is completely wrong for a number of reasons. The accounting periods may be different in

different operational systems, but in a data warehouse, it is the corporate calendar. The

classifications of regions may change. In the operational system Northern Pakis tan may consist of

Murree, Abbotabad etc. but in the national data warehouse, Northern Pakistan consists of every

major city north of Lahore. In the operational system (for example) sale or oranges is recorded in

dozen in one part of Pakistan, while in other they are sold by weight and in another part by crates.

However, in the national data warehouse, they are accounted for based on dozen and the list goes

on.

Telecommunication

Dominated by sheer volume of data.

Many ways to accommodate call level detail:

�

Only a few months of call level detail,

�

Storing lots of call level detail scattered over different storage media,

�

Storing only selective call level detail, etc.

�

Unfortunately, for many kinds of processing, working at an aggregate level is

s imply not possible.

Telecommunications Data Warehouse

The telecommunications data warehouse is dominated by the sheer volume of data generated at

the call level subject area. There are many other types of data in this environment as well. No

other data warehouse environment is dominated by the size of one of the subject areas as call

level detail dominates a telecommunications data warehouse.

There are many ways that call level detail can be accommodated:

�

Only a few months of call level detail,

�

Stori ng lots of call level detail scattered over different storage media,

�

Summarizing or aggregating call level detail,

�

Storing only selective call level detail, and so forth.

Unfortunately, for many kinds of processing, the only way it can be accommodated is by working

at the call level detail. Working at a summary level or an aggregate level simply is not a

possibility.

Insurance

Insurance data warehouses are similar to other data warehouses BUT with a few exceptions.

�

Stored data that is very, very old, used for actuarial processing.

�

Typical business may change dramatically over last 40-50 years, but not insurance.

Therefore, if someone says that the insurance business has not changed much over the

last 40-50 years would be telling the truth.

�

In retailing or telecomm there are a few important dates, but in the insurance environment

there are many dates of many kinds. In the retail sales environment there are a few dates

such as: sale date, stocking date, and perhaps manufacturing date. In the banking

environment there is the transaction date. However, in the insurance environment there

are dates everywhere of every kind.

�

Long operational business cycles, in years. Thus the operating speed is different.

Consider a bank, you insert the ATM card and draw the money instantly, or deposit a

check and it is cashed at most in a week. You go to store and purchase an item, and

instantly make a telephone call. However, in an insurance environment a claim is made

and it may take several years before it is settled.

�

Transactions are not gathered and processed, but are in kind of "frozen" state.

�

Thus a very unique approach of design & implementation.

Differences between different types of data warehouses

�

Financial data warehouses, often the first a corporation builds. However, it will not

reconcile down to the last rupee with the existing operational financial environment.

�

Insurance data warehouses are similar to other data warehouses with a few exceptions:

such as the length of time that insurance dat a warehouses exists, in terms of the dates

found in the business, and in terms of the operational business cycle.

�

The human resources data warehouse is different from others because this without doubt

has only one major subject area (yes you guessed it right!).

�

The telecommunications data warehouse is dominated by the sheer volume of data

generated at one subject area.

Indeed there are many types of data warehouses, each with their own peculiarities and all of them

can not be discussed here.

Typical Applications of DWH

There are, and there can be many many applications of a data warehouse. It is not possible to

discuss all of them. Some representative applications are listed to be discussed as follows:

�

Fraud detection.

�

Profitability analysis.

�

Direct mail/database marketing.

�

Credit risk prediction.

�

Customer retention modeling.

�

Yield management.

�

Inventory management.

ROI on any one of these applications can justify hardware/software and consultancy costs in most

organizations.

Fraud detection

�

By observing data usage patterns.

�

People have typical purchase patterns.

�

Deviation from patterns.

�

Certain cities notorious for fraud.

�

Certain items bought by stolen cards.

�

Similar behavior for stolen phone cards.

You can look at patterns of data usage, and behavior of customer and detect fraud very easily.

Lets suppose you are a Credit Card Company. You know that Danyal as a customer has certain

behaviors. He tends to buy airplane tickets, make hotel reservations, rents cars using his credit

card. When he goes shopping, it is usually only on a weekend and only on certain stores that he

goes shopping and buys sports goods and action clothing. And then all of sudden you see very

different behaviors coming in, such as credit card transactions in Karachi. Karachi by the way has

very high stolen credit card rate. And the patterns of buying behavior are stereos, and CDs and

TVs i.e. electronic goods that are typically the things that people buy on stolen credit cards. You

can notice and say "look, this is not airplane or hotel reservations, this does not look like Danyal".

It is not in the city where Danyal lives or goes to very often. Although he was there a week ago,

so maybe this is how you can tell that when and where his credit card was stolen. And the

purchases were made after he left the city. Hence "high probability of fraud".

So I can predict based on the behavior of the individuals, when things out of the typical routine

happen. Then I can stop the fraud happening even before it has spread. That's the example in

credit card but you can do the same thing with telephone cards. I am sure you have used these

telephone cards. You punch in your number and you get access to telephone network. You find

people at airports or railway stations who make there living by stealing the telephone card codes.

They stay near the phones at the air port, they act like they are talking on the phone, but they are

really looking over your shoulder when you are punching in the code. And then they memorize

the code and then go sell it on the street to people who make long distance phone calls to all their

friends in Brazil and Australia all these expensive places. You can look at the calling patterns and

say look "Danyal makes regular calls to areas like say Rawalpindi and to friends in Lahore" and

all of sudden we see all these phone calls to Toba Taik Singh or Bhawalpur, meaning there is

some thing wrong here.

Profitability Analysis

�

Banks know if they are profitable or not.

�

Don't know which customers are profitable.

�

Typically more than 50% are NOT profitable.

�

Don't know which one?

�

Balance is not enough, transactional behavior is the key.

�

Restructure products and pricing strategies.

�

Life-time profitability models (next 3-5 years).

It is another example. If you go to an average bank, to Pakistan or any where in the world, the

bank will know if they are profitable or not. But they don't know which customers are profitable

or not, and at most banks this is true, especially true for retail banks. These banks do business

with the consumers, such that more then 50 % of the customers are unprofitable. In other word

the bank is loosing money on 50% of their customers. But they don't know which 50%? That's

the problem. So the idea behind profitability analysis is to do the analysis to figure out which

customers are profitable and which customers are not profitable. And then based on that they can

restructure their product offering and there pricing strategies to do more profitable business. I

used a banking example but same is true for Telecommunications Company or any other

consumer oriented business that requires access to detail data. It is not sufficient to just

know

the account balance, but also transactional behavior and so on to do profitability analysis. And

once I know profitability retrospectively, I can also build predictive models to understand what

it's going to be prospectively. So I can build what's called lifetime value models to using the

historical data to predict what the future profitability will be for the next 3 or 5 years.

Direct mail marketing

�

Targeted marketing.

�

Offering high bandwidth package NOT to all users.

�

Know from call detail records of web surfing.

�

Saves marketing expense, saving pennies.

�

Knowing your customers better.

Assume you are a telecommunications company, and you have an offering that you can make

such as to give value added Internet access through mobile phones. You can send a letter or an

SMS to every customer in the country and tell them about this value added Internet access

service. But 95% of that mail will be completely wasted. Because a small percentage of

customers will be interested in such as service. So the question here is how you your message to

the right people in a meaningful way. If you have the call detailed records, you can tell very very

easily which of your customer's access the Internet and which of them just make missed calls.

Because the patters of the call detail records of Internet access are completely different from

making voice calls and making missed calls. They have different durations; they have different

data bandwidth requirement and so on. So by having access to detail data, you can target those

people that are better to get a second account, or better to get a value added connection/service.

Credit risk prediction

�

W ho should get a loan?

�

Customer segregation i.e. stable vs. rolling.

�

Qualitative decision making NOT subjective.

�

Different interest rates for different customers.

�

Do not subsidize bad customer on the basis of good.

Credit risk prediction is another area where data warehouse is used quite frequently. Let's say I

and my friend both apply for home loan to construct a house. How would the bank determine

which one of those should get a loan? So typically they will say, ok I have been employed for 5

years, have good salary, married, have children etc etc and then based on the context of

application then they decide yes we think you are a low risk we will give you the loan. Or say no

you are a high risk because you had a job only for one year, your salary is not high, and you have

lots of change of jobs recently, so you know we don't feel so good about the risk we are taking,

so we won't give you the loan.

So typically today it is based on what is called qualitative decision -making. Basically in the

offices these are just references say OK Saeed is a nice person or is related to me or is my

neighbor so we give him the loan. And I don't really know Waleed over here so we don't give

him a loan. This is not the best way to give fair loan to people and is not the best way to manage

risk in the business. So with credit risk, the idea is to build quantitative models, which predict risk

based on historical data. And not only I use the historical data to predict the risk, if I get more

sophisticated, what I can do is called risk based pricing. So today lets say you offer me (Saeed) a

loan, and also to the other applicant i.e. Waleed also loan, and both are given the loan at the same

interest rates. However, if Waleed is a lower risk then Saeed then why should both pay the same

interest rates? If Waleed is a low risk, then he should get a lower interest rate other wise the bank

is subsidizing the higher risk loan. So what's happening is that you are using very precise credit

prediction models in order to adjust the interest to give lower interest to people who are at lower

risk of default. And again it should use some data mining algorithms to predict the risk, which is

called the risk index on every individual consumer. So that every consumer gets a loan at an

interest that is a ppropriate to their risk level.

Yield Management

�

Works for fixed inventory businesses.

�

The price of item suddenly goes to zero.

�

Item prices vary for varying customers.

�

Example: Air Lines, Hotels, Car rentals etc.

�

Price of (say) Air Ticket depends on:

� How much in advance ticket was bought?

� How many vacant seats were present?

� How profitable is the customer?

� Ticket is one-way or return?

Yield management works, for example for sophisticated airlines which use a variable pricing

mechanism for the seats. Say you are flying on a plane; 90% of the time the person sitting next to

you pays a different price for the ticket as compared to yours. The seats are basically same, yet

you pay a different price for the ticket, why is it that? Because the airline used a sophisticated

method to see how much in advance the person bought the ticket, and how many seats were

available on the plane at the time the person bought the ticket. Or did they buy a round ticket or

one-way ticket; there are all kinds of sophisticated ways to do the pricing. Yield management

helps decide what should be the price at any time for that airline seat.

Yield management is important for almost any type of fixed inventory business. For example,

once the plane leaves, the price of the seat if it is empty or full means nothing any more. So the

goal of the airline is to fill every seat with the maximum possible price for that seat. The same is

true for hotels. For example a three star hotel had an empty room last night. If they can get Rs.

1500 for that night it is better then having nothing at all. So someone who booked the room for

example as a group, would have paid lower price per room as compared to someone who is

booking only one room. The price is different, because the hotels use yield manageme nt to fill out

their hotel rooms as profitably as they can. This is applicable to car rentals also.

Agriculture Systems

�

Agri and related data collected for decades.

�

Metrological data consists of 50+ attributes.

�

Decision making based on expert judgment.

�

Lack of integration results in underutilization.

�

What is required, in which amount and when?

Each year different government departments and agencies in Pakistan create tens of thousands of

digital and non digital files from thousands of pest-scouting surveys, yield surveys, metrological

data collection, river flows etc. This data collection has been going on for decades. The data

collected has never been compiled, standardized and integrated to give a complete picture. Thus

the lack of data integration and standardization contributes to an under -utilization of historical

data, and inevitably results in an inability to perform any scientific predictive analysis for

effective decision support and policy making. An Agriculture data warehouse is the answer, as

processing 100+ variables by experts for large historical data is not possible. This has repeatedly

resulted in tragic outcomes.

Although the benefits are many, but are not without some major issues or problems, some of them

are as follows:

Major Issues

� Unavailability of data and in an illegible format and form.

�

Most of the detail data is not digitized; therefore it can not be readily used by computers.

�

Dispersion of data across multiple and geographically displaced organizations.

�

Even though the said organizations are required to share the data with common citizens,

no official data sharing mechanism is currently in place.

Until the requisite training is provided, government employees and farmers are not equipped with

enough technical and literate skills to use latest information technology tools to use the data

warehouse

Table of Contents: