ZeePedia

DATA ANALYSIS:Information, Editing, Editing for Consistency

<< TYPES OF PROBABILITY SAMPLING:Systematic Random Sample
DATA TRANSFROMATION:Indexes and Scales, Scoring and Score Index >>
img
Research Methods ­STA630
VU
Lesson 29
DATA ANALYSIS
Once the data begins to flow in, attention turns to data analysis. If the project has been done correctly,
the analysis planning is already done. Back at the research design stage or at least by the completion of
the proposal or the pilot test, decisions should have been made about how to analyze the data.
During the analysis stage several interrelated procedures are performed to summarize and rearrange the
data. The goal of most research is to provide information. There is a difference between raw data and
information.
Information refers to a body of facts that are in a format suitable for decision making, whereas data are
simply recorded measures of certain phenomenon.  The raw data collected in the field must be
transformed into information that will answer the sponsor's (e.g. manager's) questions. The conversion
of raw data into information requires that the data be edited and coded so that the data may be
transferred to a computer or other data storage medium.
If the database is large, there are many advantages to utilizing a computer. Assuming a large database,
entering the data into computer follows the coding procedure.
Editing
Occasionally, a fieldworker makes a mistake and records an improbable answer (e.g., birth year: 1843)
or interviews an ineligible respondent (e.g., someone too young to qualify). Seemingly contradictory
answers, such as "no" to automobile ownership but "yes" to an expenditure on automobile insurance,
may appear on a questionnaire. There are many problems like these that must be dealt with before the
data can be coded. Editing procedures are conducted to make the data ready for coding and transfer to
data storage.
Editing is the process of checking and adjusting the data for omissions, legibility, and consistency.
Editing may be differentiated from coding, which is the assignment of numerical scales or classifying
symbols to previously edited data.
The purpose of editing is to ensure the completeness, consistency, and readability of the data to be
transferred to data storage. The editor's task is to check for errors and omissions on the questionnaires
or other data collection forms.
The editor may have to reconstruct some data. For instance, a respondent may indicate weekly income
rather than monthly income, as requested on the questionnaire. The editor must convert the information
to monthly data without adding any extraneous information. The editor "should bring to light all hidden
values and extract all possible information from a questionnaire, while adding nothing extraneous."
Field Editing
In large projects, field supervisors are often responsible for conducting preliminary field edits. The
purpose of field editing the same day as the interview is to catch technical omissions (such as a blank
page), check legibility of the handwriting, and clarify responses that are logically or conceptually
inconsistent. If a daily field editing is conducted, a supervisor who edits completed questionnaires will
frequently be able to question the interviewers, who may be able to recall the interview well enough to
correct any problems. The number of "no answers," or incomplete answers can be reduced with a rapid
follow-up simulated by a field edit. The daily edit also allows fieldworkers to re-contact the respondent
to fill in omissions before the situation has changed. The field edit may also indicate the need for
further training of interviewers.
97
img
Research Methods ­STA630
VU
In-House Editing
Although almost simultaneous editing in the field is highly desirable, in many situations (particularly
with mail questionnaires), early reviewing of the data is not possible. In-house editing rigorously
investigates the results of data collection.
Editing for Consistency:
The in-house editor's task is to ensure that inconsistent or contradictory responses are adjusted and that
answers will not be a problem for coders and keyboard punchers. Consider the situation in which a
telephone interviewer has been instructed to interview only registered voters that requires voters to be
18 years old. If the editor's reviews of a questionnaire indicate that the respondent was only 17 years of
age, the editor's task is to eliminate this obviously incorrect sampling unit. Thus, in this example, the
editor's job is to make sure that thee sampling unit is consistent with thee objectives of the study.
Editing requires checking for logically consistent responses. The in-house editor must determine if the
answers given by a respondent to one question are consistent with those given to other, related
questions. Many surveys utilize filter questions or skip questions that direct the sequence of questions,
depending upon respondent's answer. In some cases the respondent will have answered a sequence of
questions that should not have been asked. The editor should adjust these answers, usually to "no
answer' or "inapplicable," so that the responses will be consistent.
Editing for Completeness: In some cases the respondent may have answered only the second portion
of a two-part question. An in-house editor may have to adjust the answers to the following question for
completeness.
Does your organization have more than one Internet Web site? Yes ____ No. _____
If a respondent checked neither "yes" nor "No", but indicated three Internet Web sites, the editor may
check the "yes" to ensure that this answer is not missing from the questionnaire.
Item Non-response: It is a technical term for an unanswered question on an otherwise complete
questionnaire. Specific decision rules for handling this problem should be meticulously outlined in the
editorial instructions. In many situations the decision rule will be to do nothing with the unanswered
question: the editor merely indicates in item non response by writing a message instructing the coder to
record a "missing value" or blank as the response. However, in case the response is necessary then the
editor uses the plug value. The decision rule may to "plug in" an average or neutral value in each case
of missing data. A blank response in an interval scale item with a mid point would be to assign the mid
point in the scale as the response to that particular item. Another way is to assign to the item the mean
value of the responses of all those who have responded to that particular item. Another choice is to give
the item the mean of the responses of this particular respondent to all other questions measuring thee
variables. Another decision rule may be to alternate the choice of the response categories used as plug
values (e.g. "yes" the first time, "no" the second time, "yes" the third time, and so on).
The editor must also decide whether or not an entire questionnaire is "usable." When a questionnaire
has too many (say 25%) answers missing, it may not be suitable for the planned data analysis. In such a
situation the editor simply records thee fact that a particular incomplete questionnaire has been dropped
from the sample.
Editing Questions Answered out of Order: Another situation an editor may face is thee need to
rearrange the answers to an open-ended response to a question. For example, a respondent may have
provided the answer to a subsequent question in his answer to an earlier open-ended response question.
Because thee respondent had already clearly identified his answer, the interviewer may have avoided
asking thee subsequent question. The interviewer may have wanted to avoid hearing "I have already
answered that earlier" and to maintain rapport with the respondent and therefore skipped the question.
To make the response appear in the same order as on other questionnaires, the editor may remove the
out-of-order answer to the section related to the skipped question.
98
img
Research Methods ­STA630
VU
Coding
Coding involves assigning numbers or other symbols to answers so the responses can be grouped into
limited number of classes or categories. The classifying of data into limited categories sacrifices some
data detail but is necessary for efficient analysis. Nevertheless, it is recommended that try to keep the
data in raw form so far it is possible. When the data have been entered into the computer you can
always ask the computer to group and regroup the categories. In case the data have been entered in the
compute in grouped form, it will not be possible to disaggregate it.
Although codes are generally considered to be numerical symbols, they are more broadly defined as the
rules for interpreting, classifying, and recording data. Codes allow data to be processed in a computer.
Researchers organize data into fields, records, and files. A field is a collection of characters (a character
is a single number, letter of the alphabet, or special symbol such as the question mark) that represent a
single type of data. A record is collection of related fields. A file is a collection of related records.
File, records, and fields are stored on magnetic tapes, floppy disks, or hard drives.
Researchers use a coding procedure and codebook. A coding procedure is a set of rules stating that
certain numbers are assigned to variable attributes. For example, a researchers codes males as 1 and
females as 2. Each category of variable and missing information needs a code. A codebook is a
document (i.e. one or more pages) describing the coding procedure and the location of data for variables
in a format that computers can use.
When you code data, it is very important to create a well-organized, detailed codebook and make
multiple copies of it. If you do not write down the details of the coding procedure, or if you misplace
thee codebook, you have lost thee key to the data and may have to recode the data again.
Researchers begin thinking about a coding procedure and a codebook before they collect data. For
example a survey researcher pre-codes a questionnaire before collecting thee data. Pre-coding means
placing the code categories (e.g. 1 for male, 2 for female) on the questionnaire. Sometimes to reduce
dependence on codebooks, researchers also place the location in the computer format on the
questionnaire.
If the researcher does not pre-code, his or her first step after collecting and editing of data is to crate a
codebook. He or she also gives each case an identification number to keep track of the cases. Next, the
researcher transfers the information from each questionnaire into a format that computers can read.
Code Construction
When the question has a fixed-alternative (closed ended) format, the number of categories requiring
codes is determined during the questionnaire design stage. The codes 8 and 9 are conventionally given
to "don't know" (DK) and "no answer" (NA) respectively. However, many computer program fields
recognize a blank field or a certain character symbol, such as a period (.), as indicating a missing value
(no answer).
There are two basic rules for code construction. First, the coding categories should be exhaustive ­ that
is, coding categories should be provided for all subjects or objects or responses. With a categorical
variable such as sex, making categories exhaustive is not a problem. However, when the response
represents a small number of subjects or when the responses might be categorized in a class not
typically found, there may be a problem.
Second, the coding categories should also be mutually exclusive and independent. This means that
there should be no overlap between the categories, to ensure that a subject or response can be placed in
only one category. This frequently requires that an "other" code category be included, so that the
99
img
Research Methods ­STA630
VU
categories are all inclusive and mutually exclusive. For example, managerial span of control might be
coded 1, 2, 3, 4, and "5 or more." The "5 or more" category ensures everyone a place in a category.
When a questionnaire is highly structured, pre-coding of the categories typically occurs before the data
are collected. In many cases, such as when researchers are using open-ended response questions, a
framework for classifying responses to questions cannot be established before data collection. This
situation requires some careful thought concerning the determination of categories after editing process
has been completed. This is called post-coding or simply coding. The purpose of coding open-ended
response questions is to reduce the large number of individual responses to a few general categories of
answers that can be assigned numerical scores. Code construction in these situations necessarily must
reflect the judgment of the researcher. A major objective in code-building process is to accurately
transfer the meaning from written answers to numeric codes.
Code Book
A book identifying each variable in a study and its position in thee data matrix. The book is used to
identify a variable's description, code name, and field. Here is a sample:
·
Q/V No.
Field/ col. No.
Code values
·
--
1-5
Study number
·
-
6
City
·
1 = Lahore
·
2 = Rawalpindi
·
3 = Karachi
·
7 -9
Interview No.
·
Sex
10
1 = Male
·
2 = Female
·
Age
11-12
Actual
·
Education
13
1 = Non literate
2 = Literate
Production Coding
Transferring the data from the questionnaire or data collection form after the data have been collected is
called production coding. Depending upon the nature of the data collection form, codes may be written
directly on the instrument or on a special coding sheet.
Data Entries
Use of scanner sheets for data collection may facilitate the entry of the responses directly into the
computer without manual keying in the data. In studies involving highly structured paper
questionnaires, an Optical scanning system may be used to read material directly to the computer's
memory into the computer's memory. Optical scanners process the marked-sensed questionnaires and
store thee answers in a file.
Cleaning Data
The final stage in the coding process is the error checking and verification, or "data cleaning" stage,
which is a check to make sure that all codes are legitimate. Accuracy is extremely important when
coding data. Errors made when coding or entering data into a computer threaten the validity of
measures and cause misleading results. A researcher who has perfect sample, perfect measures, and no
errors in gathering data, but who makes errors in the coding process or in entering data into a computer,
can ruin a whole research project.
100
Table of Contents:
  1. INTRODUCTION, DEFINITION & VALUE OF RESEARCH
  2. SCIENTIFIC METHOD OF RESEARCH & ITS SPECIAL FEATURES
  3. CLASSIFICATION OF RESEARCH:Goals of Exploratory Research
  4. THEORY AND RESEARCH:Concepts, Propositions, Role of Theory
  5. CONCEPTS:Concepts are an Abstraction of Reality, Sources of Concepts
  6. VARIABLES AND TYPES OF VARIABLES:Moderating Variables
  7. HYPOTHESIS TESTING & CHARACTERISTICS:Correlational hypotheses
  8. REVIEW OF LITERATURE:Where to find the Research Literature
  9. CONDUCTING A SYSTEMATIC LITERATURE REVIEW:Write the Review
  10. THEORETICAL FRAMEWORK:Make an inventory of variables
  11. PROBLEM DEFINITION AND RESEARCH PROPOSAL:Problem Definition
  12. THE RESEARCH PROCESS:Broad Problem Area, Theoretical Framework
  13. ETHICAL ISSUES IN RESEARCH:Ethical Treatment of Participants
  14. ETHICAL ISSUES IN RESEARCH (Cont):Debriefing, Rights to Privacy
  15. MEASUREMENT OF CONCEPTS:Conceptualization
  16. MEASUREMENT OF CONCEPTS (CONTINUED):Operationalization
  17. MEASUREMENT OF CONCEPTS (CONTINUED):Scales and Indexes
  18. CRITERIA FOR GOOD MEASUREMENT:Convergent Validity
  19. RESEARCH DESIGN:Purpose of the Study, Steps in Conducting a Survey
  20. SURVEY RESEARCH:CHOOSING A COMMUNICATION MEDIA
  21. INTERCEPT INTERVIEWS IN MALLS AND OTHER HIGH-TRAFFIC AREAS
  22. SELF ADMINISTERED QUESTIONNAIRES (CONTINUED):Interesting Questions
  23. TOOLS FOR DATA COLLECTION:Guidelines for Questionnaire Design
  24. PILOT TESTING OF THE QUESTIONNAIRE:Discovering errors in the instrument
  25. INTERVIEWING:The Role of the Interviewer, Terminating the Interview
  26. SAMPLE AND SAMPLING TERMINOLOGY:Saves Cost, Labor, and Time
  27. PROBABILITY AND NON-PROBABILITY SAMPLING:Convenience Sampling
  28. TYPES OF PROBABILITY SAMPLING:Systematic Random Sample
  29. DATA ANALYSIS:Information, Editing, Editing for Consistency
  30. DATA TRANSFROMATION:Indexes and Scales, Scoring and Score Index
  31. DATA PRESENTATION:Bivariate Tables, Constructing Percentage Tables
  32. THE PARTS OF THE TABLE:Reading a percentage Table
  33. EXPERIMENTAL RESEARCH:The Language of Experiments
  34. EXPERIMENTAL RESEARCH (Cont.):True Experimental Designs
  35. EXPERIMENTAL RESEARCH (Cont.):Validity in Experiments
  36. NON-REACTIVE RESEARCH:Recording and Documentation
  37. USE OF SECONDARY DATA:Advantages, Disadvantages, Secondary Survey Data
  38. OBSERVATION STUDIES/FIELD RESEARCH:Logic of Field Research
  39. OBSERVATION STUDIES (Contd.):Ethical Dilemmas of Field research
  40. HISTORICAL COMPARATIVE RESEARCH:Similarities to Field Research
  41. HISTORICAL-COMPARATIVE RESEARCH (Contd.):Locating Evidence
  42. FOCUS GROUP DISCUSSION:The Purpose of FGD, Formal Focus Groups
  43. FOCUS GROUP DISCUSSION (Contd.):Uses of Focus Group Discussions
  44. REPORT WRITING:Conclusions and recommendations, Appended Parts
  45. REFERENCING:Book by a single author, Edited book, Doctoral Dissertation