Research Methods STA630
USE OF SECONDARY DATA
Prior to the discussion of secondary data, let us look at the advantages and disadvantages of the use of
content analysis that was covered in the last lecture. In a way content analysis is also the study of
documents through which the writers try to communicate, though some of the documents (like
population census) may simply contain figures.
1. Access to inaccessible subjects: One of the basic advantages of content analysis is that it allows
research on subjects to which the researcher does not have physical access. These could be people of
old civilizations, say their marriage patterns. These could also be the documents form the archives,
speeches of the past leaders (Quaid-e-Azam) who are not alive, the suicide notes, old films, dramas,
2. Non-reactivity: Document study shares with certain types of observations (e.g., indirect observation
or non participant observation through one-way mirror) the advantage of little or no reactivity,
particularly when the document was written for some other purpose. This is unobtrusive. Even the
creator of that document, and for that matter the characters in the document, is not in contact with the
researcher, who may not be alive.
3. Can do longitudinal analysis: Like observation and unlike experiments and survey, document
study is especially well suited to study over a long period of time. Many times the objective of the
research could be to determine a trend. One could pick up different periods in past and try to make
comparisons and figure out the changes (in the status of women) that may have occurred over time.
Take two martial periods in Pakistan, study the news papers and look at the reported crime in the press.
4. Use Sampling: The researcher can use random sampling. One could decide on the population,
develop sampling frame and draw sample random sample by following the appropriate procedure. For
example how women are portrayed in weekly English news magazines. One could pick up weekly
English news magazines, make a listing of articles that have appeared in the magazines (sampling
frame), and draw a simple random sample.
5. Can use large sample size: Larger the sample closer the results to the population. In
experimentation as well as in survey research there could be limitations due to the availability of the
subjects or of the resources but in document analysis the researcher could increase the sample and can
have more confidence in generalization. Let us assume that a researcher is studying the matrimonial
advertisements in the newspapers over a long period of time, there should be no problem in drawing a
sample as large as several thousand or more.
6. Spontaneity: The spontaneous actions or feelings can be recorded when they occurred rather than at
a time specified by the researcher. If the respondent was keeping a diary, he or she may have been
recording spontaneous feelings about a subject whenever he or she was inspired to do so. The contents
of such personal recording could be analyzed later on.
7. Confessions: A person may be more likely to confess in a document, particularly one to be read only
after his or her death, than in an interview or mailed questionnaire study. Thus a study of documents
such as diaries, posthumously published autobiographies, and suicide notes may be the only way to
obtain such information.
Research Methods STA630
8. Relatively low cost: Although the cost of documentary analysis can vary widely depending on the
type of document analyzed, how widely documents are dispersed, and how far one must travel to gain
access to them, documentary analysis can be inexpensive compared to large-scale surveys. Many a
time's documents are gathered together in a centralized location such as library where the researcher can
study them for only the cost of travel to the repository.
9. High quality: Although documents vary tremendously in quality, many documents, such as news
paper columns, are written by skilled commentators and may be more valuable than, for example,
poorly written responses to mailed questionnaires.
1. Bias: Many documents used in research were not originally intended for research purposes. The
various goals and purposes for which documents are written can bias them in various ways. For
example, personal documents such as confessional articles or autobiographies are often written by
famous people or people who had some unusual experience such as having been a witness to a specific
event. While often providing a unique and valuable research data, these documents usually are written
for the purpose of making money. Thus they tend to exaggerate and even fabricate to make good story.
They also tend to include those events that make the author look good and exclude those that cast him or
her in a negative light.
2. Selective survival: Since documents are usually written on paper, they do not withstand the elements
well unless care is taken to preserve them. Thus while documents written by famous people are likely
to be preserved, day-to-day documents such as letters and diaries written by common people tend either
to be destroyed or to be placed in storage and thus become inaccessible. It is relatively rare for common
documents that are not about some events of immediate interest to the researcher (e.g., suicide) and not
about famous occurrence or by some famous person to be gathered together in a public repository that is
accessible to researchers.
3. Incompleteness: Many documents provide incomplete account to the researcher who has had no
prior experience with or knowledge of the events or behavior discussed. A problem with many personal
documents such as letters and diaries is that they were not written for research purposes but were
designed to be private or even secret. Both these kinds of documents often assume specific knowledge
that researcher unfamiliar with certain events will not possess. Diaries are probably the worst in this
respect, since they are usually written to be read only by the author and can consist more of "soul
searching" and confession than of description. Letters tend to be little more complete, since they are
addressed to a second person. Since many letters assume a great amount of prior information on the
part of the reader.
4. Lack of availability of documents: In addition to thee bias, incompleteness, and selective survival of
documents, there are many areas of study for which no documents are available. In many cases
information simply was never recorded. In other cases it was recorded, but the documents remain secret
or classified, or have been destroyed.
5. Sampling bias: One of the problems of bias occurs because persons of lower educational or income
levels are less likely to be represented in the sampling frames. The problem of sampling bias by
educational level is more acute for document study than for survey research. It is a safe generalization
that a poorly educated people are much less likely than well educated people to write documents.
6. Limited to verbal behavior: By definition, documents provide information only about respondent's
verbal behavior, and provide no direct information on the respondent's nonverbal behavior, either that
of the document's author or other characters in the document.
Research Methods STA630
7. Lack of standardized format: Documents differ quite widely in regard to their standardization of
format. Some documents such as newspapers appear frequently in a standard format. Large dailies
always contain such standard components as editorial page, business page, sports page, and weather
report. Standardization facilitates comparison across time for the same newspapers and comparison
across different newspapers at one point in time. However, many other documents, particularly
personal documents have no standard format. Comparison is difficult or impossible, since valuable
information contained in the document at one point in time may be entirely lacking in an earlier or later
8. Coding difficulties: For a number of reasons, including differences in purpose for which the
documents were written, differences in content or subject matter, lack of standardization, and
differences in length and format, coding is one of the most difficult tasks facing the content analyst.
Documents are generally written arrangements, rather than numbers are quite difficult to quantify. Thus
analysis of documents is similar to analysis of open-ended survey questions.
9. Data must be adjusted for comparability over time: Although one of the advantages of document
study is that comparisons may be made over a long period of time, since external events cause changes
so drastic that even if a common unit of measure is used for the entire period, the value of this unit may
have changed so much over time that comparisons are misleading unless corrections are made. Look at
the change in measuring distance, temperature, currency, and even literacy in Pakistan.
Use of Secondary Data: Existing Statistics/Documents
Secondary data refer to information gathered by someone other than the researcher conducting the
present study. Secondary data are usually historical, already assembled, and do not require access to
respondents or subjects. Many types of information about the social and behavioral world have been
collected and are available to the researcher. Some information is in the form of statistical documents
(books, reports) that contain numerical information. Other information is in the form of published
compilations available in a library or on computerized records. In either case the researcher can search
through collections of information with a research question and variables in mind, and then reassemble
the information in new ways to address the research question.
Secondary data may be collected by large bureaucratic organization like the Bureau of Statistics or other
government or private agencies. These data may have been collected for policy decisions or as part of
The data may be a time bound collection of information (population census) as well as spread over long
periods of time (unemployment trends, crime rate). Secondary data are used for making comparisons
over time in the country (population trends in the country) as well as across the countries (world
Selecting Topic for Secondary Analysis
Search through the collections of information with research question and variables in mind, and then
reassemble the information in new ways to address the research question.
It is difficult to specify topics that are appropriate for existing statistics research because they are so
varied. Any topic on which information has been collected and is publicly available can be studied. In
fact, existing statistics projects may not neatly fit into a deductive model of research design. Rather
researchers creatively recognize the existing information into the variables for a research question after
first finding what data are available.
Research Methods STA630
Experiments are best for topics where the researcher controls a situation and manipulates an
independent variable. Survey research is best for topics where the researcher asks questions and learns
about reported attitudes and behavior. Content analysis is for topics that involve the content of
messages in cultural communication.
Existing statistics research is best for topics that involve information collected by large bureaucratic
organizations. Public or private organizations systematically gather many types of information. Such
information is collected for policy decisions or as a public service. It is rarely collected for purposes
directly related to a specific research question. Thus existing statistics research is appropriate when a
researcher wants to test hypotheses involving variables that are also in official reports of social,
economic and political conditions. These include descriptions of organizations or people in them.
Often, such information is collected over long periods. For example, existing statistics can be used by
researcher who wants to see whether unemployment and crime rates are associated in 100 cities across a
20 year period.
As part of the trends, say in development, researchers try to develop social indicators for measuring the
well being of the people. A social indicator is any measure of wellbeing used in policy. There are many
specific indicators that are operationalization of well-being. It is hoped that information about social
well being could be combined with widely used indicators of economic performance (e.g., gross
national product) to better inform government and other policy making officials.
The main sources of existing statistics are government or international agencies and private sources. An
enormous volume and variety of information exists. If you plan to conduct existing statistics research, it
is wise to discuss your interests with an information professional in this case, a reference librarian,
who can point you in the direction of possible sources.
Many existing documents are "free" that is, publicly available at libraries but the time and effort it
takes to research for specific information can be substantial. Researchers who conduct existing statistics
research spend many hours in libraries or on the internet.
There are so many sources of existing statistics like: UN publications, UNESCO Statistical Yearbook,
UN Statistical Yearbook, Demographic Yearbook, Labor Force Survey of Pakistan, and Population
Secondary Survey Data
Secondary analysis is a special case of existing statistics; it is reanalysis of previously collected survey
or other data that was originally gathered by others. As opposed to primary research (e.g., experiments,
surveys, and content analysis), the focus is on analyzing rather than collecting data.
Secondary analysis is increasingly used by researchers. It is relatively inexpensive; it permits
comparisons across groups, nations, or time; it facilitates replication; and permits asking about issues
not thought by the original researchers. There are several questions the researcher interested in
secondary research should ask: Are the secondary data appropriate for the research question? What
theory and hypothesis can a researcher use with the data? Is the researcher already familiar with the
substantive area? Does the researcher understand how the data were originally gathered and coded?
Large-scale data collection is expensive and difficult. The cost and time required for major national
surveys that uses rigorous techniques are prohibitive for most researchers. Fortunately, the
organization, preservation, and dissemination of major survey data sets have improved. Today, there
are archives of past surveys open to researchers (e.g., data on Population Census of Pakistan,
Demographic Survey of Pakistan).
Reliability and Validity
Existing statistics and secondary data are not trouble free just because a government agency or other
source gathered the original data. Researchers must be concerned with validity and reliability, as well as
with some problems unique to this research technique.
Research Methods STA630
A common error is the fallacy of misplaced concreteness. It occurs when someone gives a false
impression of accuracy by quoting statistics in greater detail than warranted by how the statistics are
collected and by overloading detail. For example, in order to impress an audience, a politician might
say that every year 3010,534 persons, instead of saying 3 million persons, are annually being added to
the population of Pakistan.
Validity: Validity problems occur when the researcher's theoretical definition does not match that of
the government agency or organization that collected the information. Official policies and procedures
specify definitions for official statistics. For example, a researcher defines a work injury as including
minor cuts, bruises, and sprains that occur on the job, but the official definition in government reports
only includes injuries that require a visit to a physician or hospital. Many work injuries as defined by
thee researcher will not be in the official statistics. Another example occurs when a researcher defines
people unemployed if they would work if a good job was available, if they have to work part-time when
they want full-time work, and if they have given up looking for work. The official definition, however,
includes only those who are now actively seeking work (full or part-time) as unemployed. The official
statistics exclude those who stopped looking, who work part-time out of necessity, or who do not look
because they believe no work is available. In both the cases the researcher's definition differs from that
in official statistics.
Another validity problem arises when official statistics are a proxy for a construct in which the
researcher is really interested. This is necessary because the researcher cannot collect original data. For
example, the researcher wants to know how many people have been robbed, so he or she uses police
statistics on robbery arrests as a proxy. But the measure is not entirely valid because many robberies are
not reported to the police, and reported robberies do not always result in an arrest.
Another validity problem arises because the researcher lacks control over how information is collected.
All information, even that in official government reports, is originally gathered by people in
bureaucracies as part of their job. A researcher depends on them for collecting organizing, reporting,
and publishing data accurately. Systematic errors in collecting the initial information (e.g., census
people who avoid poor neighborhoods and make-up information, or people who put a false age on their
ID card); errors in organizing and reporting information (e.g., police department that is sloppy about
filing crime reports and loses some); errors in publishing information (e.g., a typographical error in a
table) all reduce measurement validity.
Reliability: Stability reliability problems develop when official definition or the method of collecting
information changes over time. Official definitions of work injury, disability, unemployment, literacy,
poverty, and the like change periodically. Even if the researcher learns of such changes, consistent
measurement over time is impossible.
Equivalence reliability can also be a problem. For example, studies of police department suggest that
political pressures to increase arrests are closely related to the number of arrests. It could be seen when
political pressures in one city may increase arrests (e.g., a crackdown on crime), whereas pressures in
another city may decrease arrests (e.g., to show drop in crime shortly before an election in order to make
officials look better).
Researchers often use official statistics for international comparisons but national governments collect
data differently and the quality of data collection varies.
Inferences from Non-Reactive Data:
A researcher's ability to infer causality or to test a theory on the basis of non-reactive data is limited. It
is difficult to use unobtrusive measures to establish temporal order and eliminate alternative
explanations. In content analysis, a researcher cannot generalize from the content to its effects on those
who read the text, but can only use the correlation logic of survey research to show an association
among variables. Unlike the case of survey research, a researcher does not ask respondents direct
questions to measure variables, but relies on the information available in thee text.
Table of Contents: