THE OBSERVATIONAL ASSESSMENT AND ITS TYPES:Home Observation

<< THE PROJECTIVE PERSONALITY TESTS:THE RORSCHACH

THE BEHAVIORAL ASSESSMENT THROUGH INTERVIEWS, INVENTORIES AND CHECK LISTS >>

Clinical Psychology (PSY401)

LESSON 21

THE OBSERVATIONAL ASSESSMENT AND ITS TYPES

Observation is a visual method of gathering information on activities: of what happens, what your

object of study does or how it behaves.

In the study of products you may be interested in activities because some products are essentially

activity with little or no tangible essence, like computer programs, courses of education, dramas and

other presentations on stage or on TV. There are also activities related to "static" artifacts, notably their

manufacture and use that you perhaps will want to study.

OBSERVATION METHODS

To assess and understand behavior, one must first know what one is dealing with. It comes as no surprise,

then, that behavioral assessment employs observation as a primary technique. A clinician can try to

understand a phobic's fear of heights, a student's avoidance of evaluation settings, or anyone's tendency to

overeat. These people could be interviewed or assessed with self-report inventories. But many

clinicians would argue that unless those people are directly observed in their natural environments, true un-

derstanding will be incomplete. To determine the frequency, strength, and pervasiveness of the problem

behavior or the factors that are maintaining it, behavioral clinicians advocate direct observation.

Of course, all this is easier said than done. Practically speaking, it is difficult and expensive to maintain

trained observers and have them available. This is especially true in the case of adults who are

being treated on an outpatient basis. It is relatively easier to accomplish with children or those with

cognitive limitations. It is likewise easier to make observations in a sheltered or institutional setting.

In some cases, it is possible to use observers who are characteristically part of the person's

environment (such as spouse, parent, teacher, friend, or nurse). In certain instances, it is even possible to

have the client do some self-observation. Of course, there is the ever-present question of ethics. Clinical

psychologists must take pains to make sure that people are not observed without their knowledge or that

friend and associates of the client are not unwittingly drawn into the observational net in a way that

compromises their dignity and right to privacy.

For all these reasons, naturalistic observation has never been used in clinical practice as much as it might

be. Indeed, observation is still more prominent in research than in clinical practice. However, one

need not be a diehard proponent of the behavioral approach to concede the importance of

observational data. It is not unlikely that clinicians of many different persuasions have arrived at

incomplete pictures of their clients. After all, they may never see them except during the 50-minute

therapy hour or through the prism of objective or projective test data. But because of the cumbersome

nature of many observational procedures, for years most clinicians opted for the simpler and seemingly

more efficient methods of traditional assessment.

NATURALISTIC OBSERVATION

Naturalistic observation is hardly a new idea. McReynolds (1975) traced the roots of naturalistic

observation to the ancient civilizations of Greece and China. About 50 years a g o , Barker and Wright

(1951) described their systematic and detailed recordings of the behavior of 7 year-old over one day (a

major effort that took an entire book). Beyond this, all of us recognize instantly that our own informal

assessments of friends and associates are heavily influenced by observations of their naturally occurring

behavior. But observation, like testing, is useful only when steps are taken to ensure its reliability and

validity.

163

Clinical Psychology (PSY401)

Example of Naturalistic Observation

Over the years, many forms of naturalistic observation have been used for specific settings. These settings

have included classrooms, playgrounds, general and psychiatric hospitals, home environments,

institutions for those with mental retardation, and therapy sessions in outpatient clinics. Again, it is

important to note that many of the systems employed in these settings have been most widely-used

for research purposes. But most, of them are adaptable for clinical use.

Home Observation

Because experiences in the family or home have such pervasive effects on adjustment, it is not

surprising that a number of assessment procedures have been developed for behaviors occurring in

this setting. One of the best known systems for home observation is the Behavioral Coding System

(BCS) developed by Patterson (1977) a n d h i s colleagues (Jones, Reid, & Patterson, 1979). This

observational system was designed for use in the homes of pre-delinquent boys who exhibit

problems in the areas of aggressiveness and noncompliance. Trained observers spend one or two

hours in the homes of such boys, observing and recording family interactions. Usually the

observations are made immediately before or during dinner. Observers are not allowed to interact with

family members (although occasionally they may talk with them before or after the observations to gain

better acceptance of the procedure). Each family member is observed for two 5-minute periods during each

observational occasion. Observations are made of behaviors in 28 categories, and every 6 seconds

during the period a given family member is being observed, the observer notes whether these

behaviors have or have not occurred.

In a recent study, Patterson and Forgatch 11995) reported observational data-in this case, the sum of

multiple categories of aversive behavior (such as yelling humiliating destructiveness) ----coded from

home interactions between 67 children and their respective families. All these children had been

referred for treatment because of antisocial behavior problems. Interestingly, Patterson and Forgatch

(1995) found that children's aversive behavior scores at treat ment termination significantly

predicted future arrests over the two-year follow-up period. In contrast, no teacher, mother, or father rating

of the children at t e r mi n a t i o n s i g n i f i c a n t l y p r ed i c t e d arrests. Thus, in this study, the predictive

value of naturalistic observation (over more traditional ratings by parents or teachers) was demonstrated.

School Observation

Clinical child psychologists must often deal with behavior problems that take place in the school

setting; some children are disruptive in class, overly aggressive on the playground, generally fearful, cling

to the teacher, will not concentrate, and so on. Although the verbal reports of parents and teacher are

useful, the most direct assessment procedure is actually to observe the problem behavior in its natural habitat.

Several coding systems have been developed over the years for use in school observation.

An example of a behavioral observation system used in school settings is Achenbach's (1994) Direct

Observation Form (DOF) of the Child Behavior checklist. The DOF is used to assess problem behaviors

that' may be observed in school classrooms or other settings (Achenbach, 1994). It consists of 96 problem

items, as well as an open-ended item that allows assessors to indicate problem behaviors not covered by

these items. Assessors are instructed to rate each item according to its frequency duration and intensity

within a 10 minute observation period. It is recommended that three to six 10_miuute observation periods be

completed so that scores can be averaged across_ occasions (Achenbach, 1994). In this way, a more reliable

and stable estimate of the child's level of behavior problems in the classroom can be obtained.

Hospital Observation

Observation techniques have long been used in such settings as psychiatric hospitals and institutions for

those with mental retardation. The sheltered characteristics of these settings have made careful observation

of behavior much more feasible than in more open, uncontrolled environments.

164

Clinical Psychology (PSY401)

An example of a hospital observation device is the Time Sample Behavioral Checklist (TSBC) de-

veloped by Gordon Paul and his associates (Mariotto & Paul, 1974). It is a time-sample behavioral

checklist that can be used with chronic psychiatric patients. By time-sample is meant that observations

are made at regular intervals for a given patient. Observers make a single 2second observation of the

patient once every waking hour. Thus, a daily behavioral profile can be constructed on each patient.

Interobserver reliability for this checklist has typically been quite high, and such scales as the TSBC are

helpful providing a comprehensive behavioral picture of the patient. For example, using the TSBC, Menditto

et al. (1996) documented how a combination of a relatively new antipsychotic medication (clozapine) and

a structured social learni ng program (Paul & Lentz, 1977) helped significantly decrease the

frequency of in appro p r i a t e behaviors and aggressive acts over a 6 month period in a sample of

chronically mentally ill patients on an inpatient unit.

CONTROLLED OBSERVATION

Naturalistic observation has a great deal of intuitive appeal. It provides a picture of how

individuals actually behave that is unfiltered by self-reports, inferences, or other potentially contaminating

variables. However, this is easier said than done. Sometimes the specific kind of behavior in which

clinicians are interested does not occur naturally very often. Much time and resources can be wasted

waiting for the right behavior or situation to happen. The assessment of responsibility taking, for

example, may require day after day of expensive observation before the right situation arises. Then_ just as

the clinician is about to start recording, some unexpected "other" figure in the environment may step in

to spoil the situation by subtly changing its whole character. Furthermore, in free-flowing, spontaneous

situations, the client may move away so that conversations cannot be overheard, or the entire scene

may move down the hall too q u i c k l y to be followed. In short, naturalistic settings often put clinicians

at the mercy of events that can sometimes overwhelm opportunities for careful, objective assessment. As

a way of handling these problems, clinicians sometimes use controlled observation.

For many years, researchers have used techniques to elicit controlled samples of behavior (Lanyon &

Goodstein, 1982). These are really situational tests that put individuals in situations more or less

similar to those of real life. Direct observations are then made of how the individuals react. In a sense, this

is a kind of work-sample approach in which the behavioral test situation and the criterion behavior to

be predicted are quite similar. This should reduce errors in prediction, as contrasted, for example, to

psychological tests whose stimuli are far removed from the predictive situations.

STUDIES IN HONEST AND DECEIT

Early arrivals on this scene were the studies of Hartshorne and May and their associates (1928, 1929,-

1930). Although Hartshome and May were oriented principally toward research, the approaches

they used have found direct application in the assessment field. Because Hartshorne and May viewed

personality or character in habit-response terms, they attempted to measure it by directly sampling

behavior. For example if one wants to assess children's honesty, why not do so by confronting them

with situations where cheating is possible and then observe their responses? This is exactly what

Hartshorne and May did in assessing such behaviors as cheating, lying, and 'stealing. Using a series of

ingenious natural settings, they were able to execute their research under disguised yet highly

controlled conditions. Of particular interest were data that suggested that children's deceitful behavior

was highly situation-specific and should not be construed as reflecting a generalized trait.

RESPONSE TO STRESS

During World War II, the urgent demand for highly trained and resourceful military intelligence

personnel led to the development of a series of situational stress tests. Instead of using personality tests

to assess the manner in which the individual might handle disruptive or emotionally stressful

situations, the U.S. Office of Strategic Services_ used assigned tasks (OSS Assessment Staff, 1948).

Through both objective records and qualitative observation by trained staff, the assessment of reaction

165

Clinical Psychology (PSY401)

to stress was undertaken. Although the demands of war did not provide many good opportunities for the

strict validation of OSS assessment techniques, they did provide an excellent model of what is possible

in assessment. A sample OSS task is the following:

A large cube had to be constructed out of pegs, poles, and blocks. Since the job could not be done by one

person alone, two helpers were provided-but the task had to be completed in 10 minutes. The helpers

were actually stooges who interfered, were passive, made impractical suggestions, and the like. They

ridiculed the candidate and generally frustrated him terribly. In fact, no candidate was ever successful in

assembling the cube.

Somewhat related techniques were used in selecting candidates for the British Civil Service -

(Vernon,1950). Although stress was not incorporated into the British procedures, the tasks on which

candidates worked prior to their selection were based on careful job analyses. L. V. Gordon (1967) has

evaluated several work-sample approaches to assessment used in the prediction of the performance of

Peace Corps trainees.

PARENT ADOLESCENT CONFLICT

In order to more accurately assess the nature and degree of parent-adolescent conflict, Prinz and Kent

(1978) developed the Interaction Behavior Code (IBC) system. Using the IBC, several raters review and

rate audio taped discussions of families attempting to resolve a problem about which they disagree. Items

are rated separately for each family member according to the behavior's presence or absence during the

discussion (or for some items, the degree to which they are present). Summary scores are

calculated by averaging scores (across raters) for negative behaviors and positive behaviors.

For the strict behaviorist, of course, the preceding techniques represent a mixture of observation and

inference. When ratings of leadership, stress level, or ingenuity are made, what is really happening is that

observers are inferring something from behavior. They are not just compiling lists of behaviors or

checking off occurrences.

CONTROLLED PERFORMANCE TECHNIQUES

As seen in the OSS assessment studies, controlled situations allow one to observe behavior under conditions

that offer potential for control and standardization. A more exotic example is the case in which A. A.

Lazarus (1961) assessed claustrophobic behavior by placing a patient in a closed room that was made

progressively smaller by moving a screen. Similarly, Bandura (1969) has used films to expose people to a

graduated series of anxiety-provoking stimuli.

A series of assessment procedures using controlled performance techniques to study chronic snake

phobias illustrates several approaches to this kind of measurement (Bandura, Adams, & Beyer, 1977).

BEHAVIORAL AVOIDANCE

The test of avoidance behavior consisted of a series of 29 performance tasks requiring increasingly more

threatening interactions with a red-tailed boa constrictor. Subjects were instructed to approach a glass cage

containing the snake, to look down at it, to touch and hold the snake with gloved and then bare hands, to let it

loose in the room and then return it to the cage, to hold it within 12 cm of their faces, and finally to

tolerate the snake crawling in their laps while they held their hands passively at their sides.... Those

who could not enter the room containing the snake received a score of 0; subjects who did enter were

asked to perform the various tasks in the graded series. To control for any possible influence of

expressive cues from the tester, she stood behind the subject and read aloud the tasks to be

performed.... The avoidance score was the number of snake-interaction tasks the subject performed

successfully.

166

Clinical Psychology (PSY401)

SELF MONITORING

In the previous discussion of naturalistic observation, the observational procedures were designed for use by

trained staff: clinicians, research assistants, teachers, nurses, ward attendants, and others. But such

procedures are often expensive in both time and money. Furthermore, it is necessary in most cases

to rely on time-sampling or otherwise limit the extent of the observations. When dealing with

individual clients, it is often impractical or too expensive to observe them as they move freely about in

their daily activities. Therefore, clinicians have been relying increasingly on self-monitoring in-which

individuals observe and record their own behaviors, thoughts, and emotions

In effect, clients are asked to maintain behavioral logs or diaries over some predetermined time period.

Such a log can provide a running re c o rd o f th e freq u en cy , i n t en sity , an d d u rati on of certain

target behaviors, along with the stimulus conditions that accompanied them and the consequences that

followed. Such data are especially useful in telling both clinician and client how often the behavior in

question occurs. In addition, it can provide an index of change as a result of therapy (for example, by

comparing baseline frequency with frequency after six weeks of therapy). Also, it can help focus the

client's attention on undesirable behavior and thus aid in reducing it. Finally, clients can come to

realize the connections between environmental stimuli, the consequences of their behavior, and the

behavior itself.

Of course, there are problems with self monitoring. Some clients may-be inaccurate r may purposely

distort their observations or recordings for various reasons. Others may simply resist the whole

procedure. Despite these obvious difficulties, self-monitoring has become a useful and efficient

technique. It can provide a great deal of information at very low cost. However, self-monitoring is

usually effective as a change agent only in conjunction with a larger program of therapeutic

intervention.

A variety of monitoring aids has been developed. Some clients are provided-with- small counters or

stopwatches, depending upon what are to be monitored. Small file-sized or wallet sized cards have been

developed upon which clients can quickly and unobtrusively record their data. At a more informal level,

some clients are simply encouraged to make entries in a diary. Such aids are especially useful when

assessing or treating such problems as obesity, smoking, lack of assertiveness, and alcoholism. These aids

can help reinforce the notion that one's problems can be reduced to specific behaviors. Thus, a client

who started with global complaints of an ephemeral nature can begin to see that "not feeling good

about myself" really involves inability to stand up for one's rights in specific circumstances, speaking

without thinking, or whatever

The dysfunctional thought record (DTR) is completed by the client and provides the client and

therapist with a record of the client's automatic thoughts that are related to dysphoria or depression (J.

S. Beck, 1995). This DTR can help the therapist and client target certain thoughts and reactions for

change in a cognitive-behavioral treatment for depression. The client is instructed to complete the

DTR when she or he notices a change in mood. The situation, automatic thought(s), and associated

emotions are specified. The final two columns of the DTR can be filled out in the therapy session and

serve as a therapeutic intervention. In this way, clients are taught to recognize, evaluate, and modify

these automatic dysfunctional thoughts.

VARIABLES AFFECTING RELIABILITY OF OBSERVATIONS

Whether their data come from interviewing, testing, or observation, clinicians must be assured that

the data are reliable. In the case of observation, clinicians must have confidence that different

observers will produce basically the same ratings and scores. For example, when an observer of

interactions in the home returns ratings of a spouse's behavior as "low in empathy," what

assurance does the clinician have that someone else rating the same behavior in the same

167

Clinical Psychology (PSY401)

circumstances would have made' the same report? Many factors can affect the reliability of

observations. The following is a good sample of these factors.

COMPLEXITY OF TARGET BEHAVIOR

Obviously, the more complex the behavior to be observed, the greater the opportunity for unreliability.

Behavioral assessment typically focuses on less complex, lower-level behaviors (Haynes, 1998). Ob-

servations about what a person eats for breakfast (lower-level behavior) are likely to be more reliable than

those centering on interpersonal behavior (higher-level, more complex behavior). This applies to self-

monitoring as well. Unless specific agreed-upon behaviors are designated, the observer has an

enormous range of behavior upon which to concentrate. Thus, to identify an instance of interpersonal

aggression, one observer might react to sarcasm while another would fail to include it and focus

instead on clear, physical acts.

TRAINING OBSERVERS

There is no substitute for the careful and systematic training of observers For example; observers

who are sent into psychiatric hospitals to study patient behaviors and then make diagnostic ratings

must be carefully prepared in advance. It is necessary to brief them extensively on just what the

definition of, say, depression is, what specific behaviors represent depression; and so on. Their goal

should not be to "please" their supervisor by coming up (consciously or unconsciously) with data

"helpful" to the project. Nor should they protect one another by talking over their ratings and then

"agreeing to agree."

Occasionally there are instances of observer drift, in which observers, who work closely together

subtly, without awareness, begin to drift away from other observers in their ratings. Although

reliability among the drifting observers may be acceptable, it is only so because, over time, they

have begun to shift their definitions of target behaviors .Occasionally, too, observers are not as careful

in their observations when they feel they are on their own as when they expect to be monitored

or checked (Reid, 1970). To guard against observer drift, regularly scheduled reliability checks (by an

independent rater) should be conducted and feedback provided to raters.

VARIABLES AFFECTING VALIDITY OF OBSERVATIONS

At this point, it seems unnecessary to reiterate the importance of validity. We have encountered the

concept before in our discussions of both interviewing and testing; it is no less critical in the case of

observation. But here, issues of validity can be deceptive. It seems obvious in interviewing that what

patients tell the interviewer may not correspond to their actual behavior in non interview settings.

Or in the case of projective tests, there may be validity questions about inferring aggression from

Rorschach responses that involve vicious animals, blood, or large teeth. After all, percepts are not the same

as "real" behavior. But in the case of observation, things seem much clearer. When a child is observed to

bully his peers unmercifully and these observations are corroborated by reports from teachers, there

would seem to be little question of the validity of the observers' data. Aggression is aggression: However,

things are not always so simple, as the following discussion will illustrate.

CONTENT VALIDITY

A behavioral observation schema should include the behaviors that are deemed important for the

research or clinical purposes at hand. Usually the investigator or clinician who develops the system also

determines whether or not the system shows content validity. But this process is almost circular, in the

sense that a system is valid if the clinician decides that it is valid. In developing the Behavioral

Coding System (BCS), Jones et al. 11975, circumvented this problem by organizing several categories of

noxious behaviors in children and then submitting them for ratings. By using mothers' ratings, they

were able to confirm their own a priori clinical judgments as to whether or not certain deviant behaviors

168

Clinical Psychology (PSY401)

were in fact noxious or aversive.

CONCURRENT VALIDITY

Another way to approach the validity of observations is to ask whether one's obtained observational

ratings correspond to what others (such as teachers, spouse, and friends) are observing in the same time

frame. For example, do observational ratings of children's aggression on the playground made by

trained observers agree with the ratings made b y the children s peers? In short, do the children per-

ceive each other's aggression in the same way that observers do?

CONSTRUCT VALIDITY

Observational 'systems are usually derived from some implicit or explicit theoretical framework. For-

example, the BCS of Jones et al. (1975) was derived from a social learning framework that sees

aggression as the result of learning in the family. When the rewards for aggression are substantial,

aggression Mill occur. When such rewards are no longer contingent on the behavior, aggression should

subside. Therefore, the construct validity of the BCS could be demonstrated by showing that

children's aggressive behavior declines from a baseline point after clinical treatment, with clinical

treatment defined as rearranging the social contingencies in the family in a way that ought to reduce the

incidence of observed aggression.

MECHANICS OF RATINGS

It is important that a unit of analysis be specified .A unit of analysis is the length of time observations will

be made, along with the type and number of responses to be considered. For example, it might be decided

that every physical movement or gesture will be recorded for 1 minute ev ery 4 min utes. The total

observational time might consist of a 20-minute recess period for kindergarten children. This means that

every 4 minutes the child would be observed for 1 minute and all physical movements recorded.

These movements would then be coded or rated for the variable under study such as aggression, problem

soling, or dependency).

In addition to the units of analysis chosen, the specific form that the ratings will take must also be

decided. One could decide to record behaviors along a dimension of intensity: How strong was the

aggressive behavior? One might also include a duration record: How long did the behavior last? Or one

might use a simple frequency count: How many times in a designated period did the behavior under

study occur?

Beyond this, a scoring procedure must be developed. Such procedures can range from making check

marks on a sheet of paper attached to a clipboard to the use of counters, stopwatches, timers, and even

laptop computers. All raters, of course, will employ the same procedure.

REACTIVITY

Another factor affecting the validity 4 observations is called reactivity. Patients or study participants

sometimes react to the fact hat they are being-observed by changing the way they behave. The talkative

person suddenly, becomes quiet. The complaining spouse suddenly becomes the epitome of self-

sacrifice. Sometimes an individual may even feel the need to apologize for the dog by saying; "He never

does that when he is alone with us.' In any case, reactivity can severely hamper the validity of ob-

servations because it makes the observed behavior unrepresentative of what normally occurs. The real

danger of reactivity is that the observer may not recognize its presence. If observed behavior is not a true

sample, this affects the extent to which one can generalize from this instance of behavior. Then, too,

observers may unwittingly interfere with or influence the very behavior they are sent to observe. In the case

of sexual dysfunction, for example, Conte 11986) has noted that behavioral ratings are so intrusive that

169

Clinical Psychology (PSY401)

clinicians usually have to rely on self-report methods.

SUGGESTIONS FOR IMPROVING RELIABILITY AND VALIDITY OF OBSERVATIONS

1) Decide on target behaviors that are both relevant and comprehensive.

2) Work from an explicit theoretical framework that will help define the behaviors of interest.

3) Employ trained observers

4) Make sure that the observational format is strictly specified

5) Be aware of such potential sources of error as bias and fluctuations in concentration.

6) Consider the possibility of reactivity

7) Giver careful consideration to how representative the observations really are

170

Table of Contents: