Clinical Psychology (PSY401)
THE OBSERVATIONAL ASSESSMENT AND ITS TYPES
Observation is a visual method of gathering information on activities: of what happens, what your
object of study does or how it behaves.
In the study of products you may be interested in activities because some products are essentially
activity with little or no tangible essence, like computer programs, courses of education, dramas and
other presentations on stage or on TV. There are also activities related to "static" artifacts, notably their
manufacture and use that you perhaps will want to study.
To assess and understand behavior, one must first know what one is dealing with. It comes as no surprise,
then, that behavioral assessment employs observation as a primary technique. A clinician can try to
understand a phobic's fear of heights, a student's avoidance of evaluation settings, or anyone's tendency to
overeat. These people could be interviewed or assessed with self-report inventories. But many
clinicians would argue that unless those people are directly observed in their natural environments, true un-
derstanding will be incomplete. To determine the frequency, strength, and pervasiveness of the problem
behavior or the factors that are maintaining it, behavioral clinicians advocate direct observation.
Of course, all this is easier said than done. Practically speaking, it is difficult and expensive to maintain
trained observers and have them available. This is especially true in the case of adults who are
being treated on an outpatient basis. It is relatively easier to accomplish with children or those with
cognitive limitations. It is likewise easier to make observations in a sheltered or institutional setting.
In some cases, it is possible to use observers who are characteristically part of the person's
environment (such as spouse, parent, teacher, friend, or nurse). In certain instances, it is even possible to
have the client do some self-observation. Of course, there is the ever-present question of ethics. Clinical
psychologists must take pains to make sure that people are not observed without their knowledge or that
friend and associates of the client are not unwittingly drawn into the observational net in a way that
compromises their dignity and right to privacy.
For all these reasons, naturalistic observation has never been used in clinical practice as much as it might
be. Indeed, observation is still more prominent in research than in clinical practice. However, one
need not be a diehard proponent of the behavioral approach to concede the importance of
observational data. It is not unlikely that clinicians of many different persuasions have arrived at
incomplete pictures of their clients. After all, they may never see them except during the 50-minute
therapy hour or through the prism of objective or projective test data. But because of the cumbersome
nature of many observational procedures, for years most clinicians opted for the simpler and seemingly
more efficient methods of traditional assessment.
Naturalistic observation is hardly a new idea. McReynolds (1975) traced the roots of naturalistic
observation to the ancient civilizations of Greece and China. About 50 years a g o , Barker and Wright
(1951) described their systematic and detailed recordings of the behavior of 7 year-old over one day (a
major effort that took an entire book). Beyond this, all of us recognize instantly that our own informal
assessments of friends and associates are heavily influenced by observations of their naturally occurring
behavior. But observation, like testing, is useful only when steps are taken to ensure its reliability and
Clinical Psychology (PSY401)
Example of Naturalistic Observation
Over the years, many forms of naturalistic observation have been used for specific settings. These settings
have included classrooms, playgrounds, general and psychiatric hospitals, home environments,
institutions for those with mental retardation, and therapy sessions in outpatient clinics. Again, it is
important to note that many of the systems employed in these settings have been most widely-used
for research purposes. But most, of them are adaptable for clinical use.
Because experiences in the family or home have such pervasive effects on adjustment, it is not
surprising that a number of assessment procedures have been developed for behaviors occurring in
this setting. One of the best known systems for home observation is the Behavioral Coding System
(BCS) developed by Patterson (1977) a n d h i s colleagues (Jones, Reid, & Patterson, 1979). This
observational system was designed for use in the homes of pre-delinquent boys who exhibit
problems in the areas of aggressiveness and noncompliance. Trained observers spend one or two
hours in the homes of such boys, observing and recording family interactions. Usually the
observations are made immediately before or during dinner. Observers are not allowed to interact with
family members (although occasionally they may talk with them before or after the observations to gain
better acceptance of the procedure). Each family member is observed for two 5-minute periods during each
observational occasion. Observations are made of behaviors in 28 categories, and every 6 seconds
during the period a given family member is being observed, the observer notes whether these
behaviors have or have not occurred.
In a recent study, Patterson and Forgatch 11995) reported observational data-in this case, the sum of
multiple categories of aversive behavior (such as yelling humiliating destructiveness) ----coded from
home interactions between 67 children and their respective families. All these children had been
referred for treatment because of antisocial behavior problems. Interestingly, Patterson and Forgatch
(1995) found that children's aversive behavior scores at treat ment termination significantly
predicted future arrests over the two-year follow-up period. In contrast, no teacher, mother, or father rating
of the children at t e r mi n a t i o n s i g n i f i c a n t l y p r ed i c t e d arrests. Thus, in this study, the predictive
value of naturalistic observation (over more traditional ratings by parents or teachers) was demonstrated.
Clinical child psychologists must often deal with behavior problems that take place in the school
setting; some children are disruptive in class, overly aggressive on the playground, generally fearful, cling
to the teacher, will not concentrate, and so on. Although the verbal reports of parents and teacher are
useful, the most direct assessment procedure is actually to observe the problem behavior in its natural habitat.
Several coding systems have been developed over the years for use in school observation.
An example of a behavioral observation system used in school settings is Achenbach's (1994) Direct
Observation Form (DOF) of the Child Behavior checklist. The DOF is used to assess problem behaviors
that' may be observed in school classrooms or other settings (Achenbach, 1994). It consists of 96 problem
items, as well as an open-ended item that allows assessors to indicate problem behaviors not covered by
these items. Assessors are instructed to rate each item according to its frequency duration and intensity
within a 10 minute observation period. It is recommended that three to six 10_miuute observation periods be
completed so that scores can be averaged across_ occasions (Achenbach, 1994). In this way, a more reliable
and stable estimate of the child's level of behavior problems in the classroom can be obtained.
Observation techniques have long been used in such settings as psychiatric hospitals and institutions for
those with mental retardation. The sheltered characteristics of these settings have made careful observation
of behavior much more feasible than in more open, uncontrolled environments.
Clinical Psychology (PSY401)
An example of a hospital observation device is the Time Sample Behavioral Checklist (TSBC) de-
veloped by Gordon Paul and his associates (Mariotto & Paul, 1974). It is a time-sample behavioral
checklist that can be used with chronic psychiatric patients. By time-sample is meant that observations
are made at regular intervals for a given patient. Observers make a single 2second observation of the
patient once every waking hour. Thus, a daily behavioral profile can be constructed on each patient.
Interobserver reliability for this checklist has typically been quite high, and such scales as the TSBC are
helpful providing a comprehensive behavioral picture of the patient. For example, using the TSBC, Menditto
et al. (1996) documented how a combination of a relatively new antipsychotic medication (clozapine) and
a structured social learni ng program (Paul & Lentz, 1977) helped significantly decrease the
frequency of in appro p r i a t e behaviors and aggressive acts over a 6 month period in a sample of
chronically mentally ill patients on an inpatient unit.
Naturalistic observation has a great deal of intuitive appeal. It provides a picture of how
individuals actually behave that is unfiltered by self-reports, inferences, or other potentially contaminating
variables. However, this is easier said than done. Sometimes the specific kind of behavior in which
clinicians are interested does not occur naturally very often. Much time and resources can be wasted
waiting for the right behavior or situation to happen. The assessment of responsibility taking, for
example, may require day after day of expensive observation before the right situation arises. Then_ just as
the clinician is about to start recording, some unexpected "other" figure in the environment may step in
to spoil the situation by subtly changing its whole character. Furthermore, in free-flowing, spontaneous
situations, the client may move away so that conversations cannot be overheard, or the entire scene
may move down the hall too q u i c k l y to be followed. In short, naturalistic settings often put clinicians
at the mercy of events that can sometimes overwhelm opportunities for careful, objective assessment. As
a way of handling these problems, clinicians sometimes use controlled observation.
For many years, researchers have used techniques to elicit controlled samples of behavior (Lanyon &
Goodstein, 1982). These are really situational tests that put individuals in situations more or less
similar to those of real life. Direct observations are then made of how the individuals react. In a sense, this
is a kind of work-sample approach in which the behavioral test situation and the criterion behavior to
be predicted are quite similar. This should reduce errors in prediction, as contrasted, for example, to
psychological tests whose stimuli are far removed from the predictive situations.
STUDIES IN HONEST AND DECEIT
Early arrivals on this scene were the studies of Hartshorne and May and their associates (1928, 1929,-
1930). Although Hartshome and May were oriented principally toward research, the approaches
they used have found direct application in the assessment field. Because Hartshorne and May viewed
personality or character in habit-response terms, they attempted to measure it by directly sampling
behavior. For example if one wants to assess children's honesty, why not do so by confronting them
with situations where cheating is possible and then observe their responses? This is exactly what
Hartshorne and May did in assessing such behaviors as cheating, lying, and 'stealing. Using a series of
ingenious natural settings, they were able to execute their research under disguised yet highly
controlled conditions. Of particular interest were data that suggested that children's deceitful behavior
was highly situation-specific and should not be construed as reflecting a generalized trait.
RESPONSE TO STRESS
During World War II, the urgent demand for highly trained and resourceful military intelligence
personnel led to the development of a series of situational stress tests. Instead of using personality tests
to assess the manner in which the individual might handle disruptive or emotionally stressful
situations, the U.S. Office of Strategic Services_ used assigned tasks (OSS Assessment Staff, 1948).
Through both objective records and qualitative observation by trained staff, the assessment of reaction
Clinical Psychology (PSY401)
to stress was undertaken. Although the demands of war did not provide many good opportunities for the
strict validation of OSS assessment techniques, they did provide an excellent model of what is possible
in assessment. A sample OSS task is the following:
A large cube had to be constructed out of pegs, poles, and blocks. Since the job could not be done by one
person alone, two helpers were provided-but the task had to be completed in 10 minutes. The helpers
were actually stooges who interfered, were passive, made impractical suggestions, and the like. They
ridiculed the candidate and generally frustrated him terribly. In fact, no candidate was ever successful in
assembling the cube.
Somewhat related techniques were used in selecting candidates for the British Civil Service -
(Vernon,1950). Although stress was not incorporated into the British procedures, the tasks on which
candidates worked prior to their selection were based on careful job analyses. L. V. Gordon (1967) has
evaluated several work-sample approaches to assessment used in the prediction of the performance of
Peace Corps trainees.
PARENT ADOLESCENT CONFLICT
In order to more accurately assess the nature and degree of parent-adolescent conflict, Prinz and Kent
(1978) developed the Interaction Behavior Code (IBC) system. Using the IBC, several raters review and
rate audio taped discussions of families attempting to resolve a problem about which they disagree. Items
are rated separately for each family member according to the behavior's presence or absence during the
discussion (or for some items, the degree to which they are present). Summary scores are
calculated by averaging scores (across raters) for negative behaviors and positive behaviors.
For the strict behaviorist, of course, the preceding techniques represent a mixture of observation and
inference. When ratings of leadership, stress level, or ingenuity are made, what is really happening is that
observers are inferring something from behavior. They are not just compiling lists of behaviors or
checking off occurrences.
CONTROLLED PERFORMANCE TECHNIQUES
As seen in the OSS assessment studies, controlled situations allow one to observe behavior under conditions
that offer potential for control and standardization. A more exotic example is the case in which A. A.
Lazarus (1961) assessed claustrophobic behavior by placing a patient in a closed room that was made
progressively smaller by moving a screen. Similarly, Bandura (1969) has used films to expose people to a
graduated series of anxiety-provoking stimuli.
A series of assessment procedures using controlled performance techniques to study chronic snake
phobias illustrates several approaches to this kind of measurement (Bandura, Adams, & Beyer, 1977).
The test of avoidance behavior consisted of a series of 29 performance tasks requiring increasingly more
threatening interactions with a red-tailed boa constrictor. Subjects were instructed to approach a glass cage
containing the snake, to look down at it, to touch and hold the snake with gloved and then bare hands, to let it
loose in the room and then return it to the cage, to hold it within 12 cm of their faces, and finally to
tolerate the snake crawling in their laps while they held their hands passively at their sides.... Those
who could not enter the room containing the snake received a score of 0; subjects who did enter were
asked to perform the various tasks in the graded series. To control for any possible influence of
expressive cues from the tester, she stood behind the subject and read aloud the tasks to be
performed.... The avoidance score was the number of snake-interaction tasks the subject performed
Clinical Psychology (PSY401)
In the previous discussion of naturalistic observation, the observational procedures were designed for use by
trained staff: clinicians, research assistants, teachers, nurses, ward attendants, and others. But such
procedures are often expensive in both time and money. Furthermore, it is necessary in most cases
to rely on time-sampling or otherwise limit the extent of the observations. When dealing with
individual clients, it is often impractical or too expensive to observe them as they move freely about in
their daily activities. Therefore, clinicians have been relying increasingly on self-monitoring in-which
individuals observe and record their own behaviors, thoughts, and emotions
In effect, clients are asked to maintain behavioral logs or diaries over some predetermined time period.
Such a log can provide a running re c o rd o f th e freq u en cy , i n t en sity , an d d u rati on of certain
target behaviors, along with the stimulus conditions that accompanied them and the consequences that
followed. Such data are especially useful in telling both clinician and client how often the behavior in
question occurs. In addition, it can provide an index of change as a result of therapy (for example, by
comparing baseline frequency with frequency after six weeks of therapy). Also, it can help focus the
client's attention on undesirable behavior and thus aid in reducing it. Finally, clients can come to
realize the connections between environmental stimuli, the consequences of their behavior, and the
Of course, there are problems with self monitoring. Some clients may-be inaccurate r may purposely
distort their observations or recordings for various reasons. Others may simply resist the whole
procedure. Despite these obvious difficulties, self-monitoring has become a useful and efficient
technique. It can provide a great deal of information at very low cost. However, self-monitoring is
usually effective as a change agent only in conjunction with a larger program of therapeutic
A variety of monitoring aids has been developed. Some clients are provided-with- small counters or
stopwatches, depending upon what are to be monitored. Small file-sized or wallet sized cards have been
developed upon which clients can quickly and unobtrusively record their data. At a more informal level,
some clients are simply encouraged to make entries in a diary. Such aids are especially useful when
assessing or treating such problems as obesity, smoking, lack of assertiveness, and alcoholism. These aids
can help reinforce the notion that one's problems can be reduced to specific behaviors. Thus, a client
who started with global complaints of an ephemeral nature can begin to see that "not feeling good
about myself" really involves inability to stand up for one's rights in specific circumstances, speaking
without thinking, or whatever
The dysfunctional thought record (DTR) is completed by the client and provides the client and
therapist with a record of the client's automatic thoughts that are related to dysphoria or depression (J.
S. Beck, 1995). This DTR can help the therapist and client target certain thoughts and reactions for
change in a cognitive-behavioral treatment for depression. The client is instructed to complete the
DTR when she or he notices a change in mood. The situation, automatic thought(s), and associated
emotions are specified. The final two columns of the DTR can be filled out in the therapy session and
serve as a therapeutic intervention. In this way, clients are taught to recognize, evaluate, and modify
these automatic dysfunctional thoughts.
VARIABLES AFFECTING RELIABILITY OF OBSERVATIONS
Whether their data come from interviewing, testing, or observation, clinicians must be assured that
the data are reliable. In the case of observation, clinicians must have confidence that different
observers will produce basically the same ratings and scores. For example, when an observer of
interactions in the home returns ratings of a spouse's behavior as "low in empathy," what
assurance does the clinician have that someone else rating the same behavior in the same
Clinical Psychology (PSY401)
circumstances would have made' the same report? Many factors can affect the reliability of
observations. The following is a good sample of these factors.
COMPLEXITY OF TARGET BEHAVIOR
Obviously, the more complex the behavior to be observed, the greater the opportunity for unreliability.
Behavioral assessment typically focuses on less complex, lower-level behaviors (Haynes, 1998). Ob-
servations about what a person eats for breakfast (lower-level behavior) are likely to be more reliable than
those centering on interpersonal behavior (higher-level, more complex behavior). This applies to self-
monitoring as well. Unless specific agreed-upon behaviors are designated, the observer has an
enormous range of behavior upon which to concentrate. Thus, to identify an instance of interpersonal
aggression, one observer might react to sarcasm while another would fail to include it and focus
instead on clear, physical acts.
There is no substitute for the careful and systematic training of observers For example; observers
who are sent into psychiatric hospitals to study patient behaviors and then make diagnostic ratings
must be carefully prepared in advance. It is necessary to brief them extensively on just what the
definition of, say, depression is, what specific behaviors represent depression; and so on. Their goal
should not be to "please" their supervisor by coming up (consciously or unconsciously) with data
"helpful" to the project. Nor should they protect one another by talking over their ratings and then
"agreeing to agree."
Occasionally there are instances of observer drift, in which observers, who work closely together
subtly, without awareness, begin to drift away from other observers in their ratings. Although
reliability among the drifting observers may be acceptable, it is only so because, over time, they
have begun to shift their definitions of target behaviors .Occasionally, too, observers are not as careful
in their observations when they feel they are on their own as when they expect to be monitored
or checked (Reid, 1970). To guard against observer drift, regularly scheduled reliability checks (by an
independent rater) should be conducted and feedback provided to raters.
VARIABLES AFFECTING VALIDITY OF OBSERVATIONS
At this point, it seems unnecessary to reiterate the importance of validity. We have encountered the
concept before in our discussions of both interviewing and testing; it is no less critical in the case of
observation. But here, issues of validity can be deceptive. It seems obvious in interviewing that what
patients tell the interviewer may not correspond to their actual behavior in non interview settings.
Or in the case of projective tests, there may be validity questions about inferring aggression from
Rorschach responses that involve vicious animals, blood, or large teeth. After all, percepts are not the same
as "real" behavior. But in the case of observation, things seem much clearer. When a child is observed to
bully his peers unmercifully and these observations are corroborated by reports from teachers, there
would seem to be little question of the validity of the observers' data. Aggression is aggression: However,
things are not always so simple, as the following discussion will illustrate.
A behavioral observation schema should include the behaviors that are deemed important for the
research or clinical purposes at hand. Usually the investigator or clinician who develops the system also
determines whether or not the system shows content validity. But this process is almost circular, in the
sense that a system is valid if the clinician decides that it is valid. In developing the Behavioral
Coding System (BCS), Jones et al. 11975, circumvented this problem by organizing several categories of
noxious behaviors in children and then submitting them for ratings. By using mothers' ratings, they
were able to confirm their own a priori clinical judgments as to whether or not certain deviant behaviors
Clinical Psychology (PSY401)
were in fact noxious or aversive.
Another way to approach the validity of observations is to ask whether one's obtained observational
ratings correspond to what others (such as teachers, spouse, and friends) are observing in the same time
frame. For example, do observational ratings of children's aggression on the playground made by
trained observers agree with the ratings made b y the children s peers? In short, do the children per-
ceive each other's aggression in the same way that observers do?
Observational 'systems are usually derived from some implicit or explicit theoretical framework. For-
example, the BCS of Jones et al. (1975) was derived from a social learning framework that sees
aggression as the result of learning in the family. When the rewards for aggression are substantial,
aggression Mill occur. When such rewards are no longer contingent on the behavior, aggression should
subside. Therefore, the construct validity of the BCS could be demonstrated by showing that
children's aggressive behavior declines from a baseline point after clinical treatment, with clinical
treatment defined as rearranging the social contingencies in the family in a way that ought to reduce the
incidence of observed aggression.
MECHANICS OF RATINGS
It is important that a unit of analysis be specified .A unit of analysis is the length of time observations will
be made, along with the type and number of responses to be considered. For example, it might be decided
that every physical movement or gesture will be recorded for 1 minute ev ery 4 min utes. The total
observational time might consist of a 20-minute recess period for kindergarten children. This means that
every 4 minutes the child would be observed for 1 minute and all physical movements recorded.
These movements would then be coded or rated for the variable under study such as aggression, problem
soling, or dependency).
In addition to the units of analysis chosen, the specific form that the ratings will take must also be
decided. One could decide to record behaviors along a dimension of intensity: How strong was the
aggressive behavior? One might also include a duration record: How long did the behavior last? Or one
might use a simple frequency count: How many times in a designated period did the behavior under
Beyond this, a scoring procedure must be developed. Such procedures can range from making check
marks on a sheet of paper attached to a clipboard to the use of counters, stopwatches, timers, and even
laptop computers. All raters, of course, will employ the same procedure.
Another factor affecting the validity 4 observations is called reactivity. Patients or study participants
sometimes react to the fact hat they are being-observed by changing the way they behave. The talkative
person suddenly, becomes quiet. The complaining spouse suddenly becomes the epitome of self-
sacrifice. Sometimes an individual may even feel the need to apologize for the dog by saying; "He never
does that when he is alone with us.' In any case, reactivity can severely hamper the validity of ob-
servations because it makes the observed behavior unrepresentative of what normally occurs. The real
danger of reactivity is that the observer may not recognize its presence. If observed behavior is not a true
sample, this affects the extent to which one can generalize from this instance of behavior. Then, too,
observers may unwittingly interfere with or influence the very behavior they are sent to observe. In the case
of sexual dysfunction, for example, Conte 11986) has noted that behavioral ratings are so intrusive that
Clinical Psychology (PSY401)
clinicians usually have to rely on self-report methods.
SUGGESTIONS FOR IMPROVING RELIABILITY AND VALIDITY OF OBSERVATIONS
1) Decide on target behaviors that are both relevant and comprehensive.
2) Work from an explicit theoretical framework that will help define the behaviors of interest.
3) Employ trained observers
4) Make sure that the observational format is strictly specified
5) Be aware of such potential sources of error as bias and fluctuations in concentration.
6) Consider the possibility of reactivity
7) Giver careful consideration to how representative the observations really are
Table of Contents: