Research Methods STA630
TYPES OF PROBABILITY SAMPLING
Probability samples that rely on random processes require more work than nonrandom ones. A
researcher must identify specific sampling elements (e.g. persons) to include in the sample. For
example, if conducting a telephone survey, the researcher needs to try to reach the specific sampled
person, by calling back several times, to get an accurate sample.
Random samples are most likely to yield a sample that truly represents the population. In addition,
random sampling lets a researcher statistically calculate the relationship between the sample and the
population that is the size of sampling error. A non-statistical definition of the sampling error is the
deviation between sample result and a population parameter due to random process.
Simple Random Sample
The simple random sample is both the easiest random sample to understand and the one on which other
types are modeled. In simple random sampling, a research develops an accurate sampling frame, selects
elements from sampling frame according to mathematically random procedure, then locates the exact
element that was selected for inclusion in the sample.
After numbering all elements in a sampling frame, the researcher uses a list of random numbers to
decide which elements to select. He or she needs as many random numbers as there are elements to be
sampled: for example, for a sample of 100, 100 random numbers are needed. The researcher can get
random numbers from a random number table, a table of numbers chosen in a mathematically random
way. Random-number tables are available in most statistics and research methods books. The numbers
are generated by a pure random process so that any number has an equal probability of appearing in any
position. Computer programs can also produce lists of random number.
A random starting point should be selected at the outset.
Random sampling does not guarantee that every random sample perfectly represents the population.
Instead, it means that most random samples will be close to the population most of the time, and that
one can calculate the probability of a particular sample being inaccurate. A researcher estimates the
chance that a particular sample is off or unrepresentative by using information from the sample to
estimate the sampling distribution. The sampling distribution is the key idea that lets a researcher
calculate sampling error and confidence interval.
Systematic Random Sample
Systematic random sampling is simple random sampling with a short cut for random selection. Again,
the first step is to number each element in the sampling frame. Instead of using a list of random
numbers, researcher calculates a sampling interval, and the interval becomes his or her own quasi
random selection method. The sampling interval (i.e. 1 in K where K is some number) tells the
researcher how to select elements from a sampling frame by skipping elements in the frame before one
for the sample.
Sampling intervals are easy to compute. We need the sample size and the population size. You can
think of the sample interval as the inverse of the sampling ratio. The sampling ratio for 300 names out
of 900 will be 300/900 = .333 = 33.3 percent. The sampling interval is 900/300 = 3
Begin with a random start. The easiest way to do this is to point blindly at a number from those from
the beginning that are likely to be part of the sampling interval.
When the elements are organized in some kind of cycle or pattern, the systematic sampling will not give
a representative sample.
Research Methods STA630
Stratified Random Sample
When the population is heterogeneous, the use of simple random sample may not produce representative
sample. Some of the bigger strata may get over representation while some of the small ones may
entirely be eliminated. Look at the variables that are likely to affect the results, and stratify the
population in such a way that each stratum becomes homogeneous group within itself. Then draw the
required sample by using the table of random numbers. Hence in stratified random sampling a sub-
sample is drawn utilizing simple random sampling within each stratum. (Randomization is not done for
There are three reasons why a researcher chooses a stratified random sample: (1) to increase a sample's
statistical efficiency, (2) to provide adequate data for analyzing the various subpopulations, and (3) to
enable different research methods and procedures to be used in different strata.
1. Stratification is usually more efficient statistically than simple random sampling and at worst it
is equal to it. With the ideal stratification, each stratum is homogeneous internally and
heterogeneous with other strata. This might occur in a sample that includes members of several
distinct ethnic groups. In this instance, stratification makes a pronounced improvement in
Stratified random sampling provides the assurance that the sample will accurately reflect the
population on the basis of criterion or criteria used for stratification. This is a concern because
occasionally simple random sampling yields a disproportionate number of one group or another,
and the sample ends up being less representative than it could be.
Random sampling error will be reduced with the use of stratified random sampling
Because each group is internally homogeneous but there are comparative differences
Between groups. More technically, a smaller standard error may result from stratified
Sampling because the groups are adequately represented when strata are combined.
2. It is possible when the researcher wants to study the characteristics of a certain population
subgroups. Thus if one wishes to draw some conclusions about activities in different classes of
student body, stratified sampling would be used.
3. Stratified sampling is also called for when different methods of data collection are applied in
different parts of the population. This might occur when we survey company employees at the
home office with one method but mist use a different approach with employees scattered over
The ideal stratification would be based on the primary variable (the dependent variable) under study.
The criterion is identified as an efficient basis for stratification. The criterion for stratification is that it
is a characteristic of the population elements known to be related to the dependent variable or other
variables of interest. The variable chosen should increase homogeneity within each stratum and increase
heterogeneity between strata.
Next, for each separate subgroup or stratum, a list of population elements must be obtained. Serially
number the elements within each stratum. Using a table of random numbers or some other device, a
separate simple random sample is taken within each stratum. Of course the researcher must determine
how large a sample must be drawn from each stratum
Proportionate versus Disproportionate
If the number of sampling units drawn from each stratum is in proportion to the relative population size
of the stratum, the sample is proportionate stratified sampling. Sometime, however, a disproportionate
stratified sample will be selected to ensure an adequate number of sampling units in every stratum
Research Methods STA630
In a disproportionate, sample size for each stratum is not allocated in proportion to the population size,
but is dictated by analytical considerations.
The purpose of cluster sampling is to sample economically while retaining the characteristics of a
probability sample. Groups or chunks of elements that, ideally, would have heterogeneity among the
members within each group are chosen for study in cluster sampling. This is in contrast to choosing
some elements from the population as in simple random sampling, or stratifying and then choosing
members from the strata, or choosing every nth case in the population in systematic sampling. When
several groups with intra-group heterogeneity and inter-group homogeneity are found, then a random
sampling of the clusters or groups can ideally be done and information gathered from each of the
members in the randomly chosen clusters.
Cluster samples offer more heterogeneity within groups and more homogeneity among and
homogeneity within each group and heterogeneity across groups.
Cluster sampling addresses two problems: researchers lack a good sampling frame for a dispersed
population and the cost to reach a sampled element is very high. A cluster is unit that contains final
sampling elements but can be treated temporarily as a sampling element itself. Researcher first samples
clusters, each of which contains elements, then draws a second a second sample from within the clusters
selected in the first stage of sampling. In other words, the researcher randomly samples clusters, and
then randomly samples elements from within the selected clusters. He or she can create a good
sampling frame of clusters, even if it is impossible to create one for sampling elements. Once the
researcher gets a sample of clusters, creating a sampling frame for elements within each cluster becomes
more manageable. A second advantage for geographically dispersed populations is that elements within
each cluster are physically closer to each other. This may produce a savings in locating or reaching each
A researcher draws several samples in stages in cluster sampling. In a three-stage sample, stage 1 is
random sampling of big clusters; stage 2 is random sampling of small clusters within each selected big
cluster; and the last stage is sampling of elements from within the sampled within the sampled small
clusters. First, one randomly samples the city blocks, then households within blocks, then individuals
within households. This can also be an example of multistage area sampling.
The unit costs of cluster sampling are much lower than those of other probability sampling designs.
However, cluster sampling exposes itself to greater biases at each stage of sampling.
This plan is adopted when further information is needed from a subset of the group from which some
information has already been collected for the same study. A sampling design where initially a sample
is used in a study to collect some preliminary information of interest, and later a sub-sample of this
primary sample is used to examine the matter in more detail, is called double sampling.
What is the Appropriate Sample Design?
A researcher who must make a decision concerning the most appropriate sample design for a specific
project will identify a number of sampling criteria and evaluate the relative importance of each criterion
before selecting a sample design. The most common criteria
Degree of Accuracy
Selecting a representative sample is, of course, important to all researchers. However, the error may
vary from project to project, especially when cost saving or another benefit may be a trade-off for
reduction in accuracy.
Research Methods STA630
The costs associated with the different sampling techniques vary tremendously. If the researcher's
financial and human resources are restricted, this limitation of resources will eliminate certain methods.
For a graduate student working on a master's thesis, conducting a national survey is almost always out
of the question because of limited resources. Managers usually weigh the cost of research versus the
value of information often will opt to save money by using non-probability sampling design rather than
make the decision to conduct no research at all.
Advance Knowledge of the Population
Advance knowledge of population characteristics, such as the availability of lists of population
members, is an important criterion. A lack of adequate list may automatically rule out any type of
National versus Local Project
Geographic proximity of population elements will influence sample design. When population elements
are unequally distributed geographically, a cluster sampling may become more attractive.
Need for Statistical Analysis
The need for statistical projections based on the sample is often a criterion. Non-probability sampling
techniques do not allow researcher to use statistical analysis to project the data beyond the sample.
Table of Contents: