DECIDE: A FRAMEWORK TO GUIDE EVALUATION

<< EVALUATION PARADIGMS AND TECHNIQUES

EVALUATION >>

Human Computer Interaction (CS408)

Lecture

Lecture 30. Evaluation Part II

Learning Goals

The aim of this lecture is to introduce you the study of Human Computer Interaction,

so that after studying this you will be able to:

· Understand the DECIDE evaluation framework

DECIDE: A framework to guide evaluation

30.1

Well-planned evaluations are driven by clear goals and appropriate questions (Basili

et al., 1994). To guide our evaluations we use the DECIDE framework, which

provides the following checklist to help novice evaluators:

1. Determine the overall goals that the evaluation addresses.

2. Explore the specific questions to be answered.

3. Choose the evaluation paradigm and techniques to answer the questions.

4. Identify the practical issues that must be addressed, such as selecting participants.

5. Decide how to deal with the ethical issues.

6. Evaluate, interpret, and present the data.

Determine the goals

What are the high-level goals of the evaluation? Who wants it and why? An

evaluation to help clarify user needs has different goals from an evaluation to

determine the best metaphor for a conceptual design, or to fine-tune an interface, or to

examine how technology changes working practices, or to inform how the next

version of a product should be changed.

Goals should guide an evaluation, so determining what these goals are is the first step

in planning an evaluation. For example, we can restate the general goal statements

just mentioned more clearly as:

· Check that the evaluators have understood the users' needs.

· Identify the metaphor on which to base the design.

· Check to ensure that the final interface is consistent.

· Investigate the degree to which technology influences working practices.

· Identify how the interface of an existing product could be engineered to im-

prove its usability.

These goals influence the evaluation approach, that is, which evaluation paradigm

guides the study. For example, engineering a user interface involves a quantitative

engineering style of working in which measurements are used to judge the quality of

the interface. Hence usability testing would be appropriate. Exploring how children

talk together in order to see if an innovative new groupware product would help them

to be more engaged would probably be better informed by a field study.

279

Human Computer Interaction (CS408)

Explore the questions

In order to make goals operational, questions that must be answered to satisfy them

have to be identified. For example, the goal of finding out why many customers prefer

to purchase paper airline tickets over the counter rather than e-tickets can he broken

down into a number of relevant questions for investigation. What are customers'

attitudes to these new tickets? Perhaps they don't trust the system and are not sure that

they will actually get on the flight without a ticket in their hand. Do customers have

adequate access to computers to make bookings? Are they concerned about security?

Does this electronic system have a bad reputation? Is the user interface to the ticketing

system so poor that they can't use it? Maybe very few people managed to complete

the transaction.

Questions can be broken down into very specific sub-questions to make the evaluation

even more specific. For example, what does it mean to ask, "Is the user interface

poor?": Is the system difficult to navigate? Is the terminology confusing because it is

inconsistent? Is response time too slow? Is the feedback confusing or maybe

insufficient? Sub-questions can, in turn, be further decomposed into even finer-

grained questions, and so on.

Choose the evaluation paradigm and techniques

Having identified the goals and main questions, the next step is to choose the eval-

uation paradigm and techniques. As discussed in the previous section, the evaluation

paradigm determines the kinds of techniques that are used. Practical and ethical issues

(discussed next) must also be considered and trade-offs made. For example, what

seems to be the most appropriate set of techniques may be too expensive, or may take

too long, or may require equipment or expertise that is not available, so compromises

are needed.

Identify the practical issues

There are many practical issues to consider when doing any kind of evaluation and it

is important to identify them before starting. Some issues that should be considered

include users, facilities and equipment, schedules and budgets, and evaluators'

expertise. Depending on the availability of resources, compromises may involve

adapting or substituting techniques.

Users

It goes without saying that a key aspect of an evaluation is involving appropriate

users. For laboratory studies, users must be found and screened to ensure that they

represent the user population to which the product is targeted. For example, usability

tests often need to involve users with a particular level of experience e.g., novices or

experts, or users with a range of expertise. The number of men and women within a

particular age range, cultural diversity, educational experience, and personality

differences may also need to be taken into account, depending on the kind of product

being evaluated. In usability tests participants are typically screened to ensure that

they meet some predetermined characteristic. For example, they might be tested to

ensure that they have attained a certain skill level or fall within a particular

demographic range. Questionnaire surveys require large numbers of participants so

ways of identifying and reaching a representative sample of participants are needed.

280

Human Computer Interaction (CS408)

For field studies to be successful, an appropriate and accessible site must be found

where the evaluator can work with the users in their natural setting.

Another issue to consider is how the users will be involved. The tasks used in a

laboratory study should be representative of those for which the product is de signed.

However, there are no written rules about the length of time that a user should be

expected to spend on an evaluation task. Ten minutes is too short for most tasks and

two hours is a long time, but what is reasonable? Task times will vary according to

the type of evaluation, but when tasks go on for more than 20 minutes, consider

offering breaks. It is accepted that people using computers should stop, move around

and change their position regularly after every 20 minutes spent at the keyboard to

avoid repetitive strain injury. Evaluators also need to put users at ease so they are not

anxious and will perform normally. Even when users are paid to participate, it is

important to treat them courteously. At no time should users be treated

condescendingly or made to feel uncomfortable when they make mistakes. Greeting

users, explaining that it is the system that is being tested and not them, and planning

an activity to familiarize them with the system before starting the task all help to put

users at ease.

Facilities and equipment

There are many practical issues concerned with using equipment in an evaluation For

example, when using video you need to think about how you will do the recording:

how many cameras and where do you put them? Some people are disturbed by having

a camera pointed at them and will not perform normally, so how can you avoid

making them feel uncomfortable? Spare film and batteries may also be needed.

Schedule and budget constraints

Time and budget constraints are important considerations to keep in mind. It might

seem ideal to have 20 users test your interface, but if you need to pay them, then it

could get costly. Planning evaluations that can be completed on schedule is also im-

portant, particularly in commercial settings. There is never enough time to do

evaluations as you would ideally like, so you have to compromise and plan to do a

good job with the resources and time available.

Expertise

Does the evaluation team have the expertise needed to do the evaluation? For ex-

ample, if no one has used models to evaluate systems before, then basing an eval-

uation on this approach is not sensible. It is no use planning to use experts to review

an interface if none are available. Similarly, running usability tests requires expertise.

Analyzing video can take many hours, so someone with appropriate expertise and

equipment must be available to do it. If statistics are to be used, then a statistician

should be consulted before starting the evaluation and then again later for analysis, if

appropriate.

Decide how to deal with the ethical issues

The Association for Computing Machinery (ACM) and many other professional or-

ganizations provide ethical codes that they expect their members to uphold,

particularly if their activities involve other human beings. For example. people's

privacy should be protected, which means that their name should not be associated

281

Human Computer Interaction (CS408)

with data collected about them or disclosed in written reports (unless they give

permission). Personal records containing details about health, employment, education,

financial status, and where participants live should be confidential. Similarly, it

should not be possible to identify individuals from comments written in reports For

example, if a focus group involves nine men and one woman, the pronoun "she"

should not be used in the report because it will be obvious to whom it refers

Most professional societies, universities, government and other research offices

require researchers to provide information about activities in which human

participants will be involved. This documentation is reviewed by a panel and the re-

searchers are notified whether their plan of work, particularly the details about how

human participants will be treated, is acceptable.

People give their time and their trust when they agree to participate in an evaluation

study and both should be respected. But what does it mean to be respectful to users?

What should participants be told about the evaluation? What are participants' rights?

Many institutions and project managers require participants to read and sign an

informed consent. This form explains the aim of the tests or research and promises

participants that their personal details and performance will not be made public and

will be used only for the purpose stated. It is an agreement between the evaluator and

the evaluation participants that helps to confirm the professional relationship that

exists between them. If your university or organization does not provide such a form

it is advisable to develop one, partly to protect yourself in the unhappy event of

litigation and partly because the act of constructing it will remind you what you

should consider.

The following guidelines will help ensure that evaluations are done ethically and that

adequate steps to protect users' rights have been taken.

Tell participants the goals of the study and exactly what they should expect if

they participate. The information given to them should include outlining the

process, the approximate amount of time the study will take, the kind of data

that will be collected, and how that data will be analyzed. The form of the

final report should be described and, if possible, a copy offered to them. Any

payment offered should also be clearly stated.

Be sure to explain that demographic, financial, health, or other sensitive in-

formation that users disclose or is discovered from the tests is confidential. A

coding system should be used to record each user and, if a user must be iden-

tified for a follow-up interview, the code and the person's demographic details

should be stored separately from the data. Anonymity should also be promised

if audio and video are used.

Make sure users know that they are free to stop the evaluation at any time if

they feel uncomfortable with the procedure.

Pay users when possible because this creates a formal relationship in which

mutual commitment and responsibility are expected.

Avoid including quotes or descriptions that inadvertently reveal a person's

identity, as in the example mentioned above, of avoiding use of the pronoun

"she" in the focus group. If quotes need to be reported, e.g., to justify con-

clusions, then it is convention to replace words that would reveal the source

with representative words, in square brackets. Ask users' permission in

advance to quote them, promise them anonymity, and offer to show them a

copy of the report before it is distributed.

282

Human Computer Interaction (CS408)

The general rule to remember when doing evaluations is do unto others only what you

would not mind being done to you.

The recent explosion in Internet and web usage has resulted in more research on how

people use these technologies and their effects on everyday life. Consequently, there

are many projects in which developers and researchers are logging users' interactions,

analyzing web traffic, or examining conversations in chat rooms, bulletin boards, or

on email. Unlike most previous evaluations in human-computer interaction, these

studies can be done without users knowing that they are being studied. This raises

ethical concerns, chief among which are issues of privacy, confidentiality, informed

consent, and appropriation of others' personal stories (Sharf, 1999). People often say

things online that they would not say face to face. Further more, many people are

unaware that personal information they share online can be read by someone with

technical know-how years later, even after they have deleted it from their personal

mailbox (Erickson et aL 1999).

Evaluate, interpret, and present the data

Choosing the evaluation paradigm and techniques to answer the questions that satisfy

the evaluation goal is an important step. So is identifying the practical and ethical

issues to be resolved. However, decisions are also needed about what data to

collect, how to analyze it, and how to present the findings to the development team.

To a great extent the technique used determines the type of data collected, but there

are still some choices. For example, should the data be treated statistically? If

qualitative data is collected, how should it be analyzed and represented? Some general

questions also need to be asked (Preece et al., 1994): Is the technique reliable? Will

the approach measure what is intended, i.e., what is its validity? Are biases creeping

in that will distort the results? Are the results generalizable, i.e., what is their scope?

Is the evaluation ecologically valid or is the fundamental nature of the process being

changed by studying it?

Reliability

The reliability or consistency of a technique is how well it produces the same results

on separate occasions under the same circumstances. Different evaluation processes

have different degrees of reliability. For example, a carefully controlled experiment

will have high reliability. Another evaluator or researcher who follows exactly the

same procedure should get similar results. In contrast, an informal, unstructured

interview will have low reliability: it would be difficult if not impossible to repeat

exactly the same discussion.

Validity

Validity is concerned with whether the evaluation technique measures what it is

supposed to measure. This encompasses both the technique itself and the way it is

performed. If for example, the goal of an evaluation is to find out how users use a new

product in their homes, then it is not appropriate to plan a laboratory experiment. An

ethnographic study in users' homes would be more appropriate. If the goal is to find

average performance times for completing a task, then counting only the number of

user errors would be invalid.

283

Human Computer Interaction (CS408)

Biases

Bias occurs when the results are distorted. For example, expert evaluators performing

a heuristic evaluation may be much more sensitive to certain kinds of design flaws

than others. Evaluators collecting observational data may consistently fail to notice

certain types of behavior because they do not deem them important.

Put another way, they may selectively gather data that they think is important.

Interviewers may unconsciously influence responses from interviewees by their tone

of voice, their facial expressions, or the way questions are phrased, so it is important

to be sensitive to the possibility of biases.

Scope

The scope of an evaluation study refers to how much its findings can be generalized.

For example, some modeling techniques, like the keystroke model, have a narrow,

precise scope. The model predicts expert, error-free behavior so, for example, the

results cannot be used to describe novices learning to use the system.

Ecological validity

Ecological validity concerns how the environment in which an evaluation is

conducted influences or even distorts the results. For example, laboratory experiments

are strongly controlled and are quite different from workplace, home, or leisure

environments. Laboratory experiments therefore have low ecological validity because

the results are unlikely to represent what happens in the real world. In contrast,

ethnographic studies do not impact the environment, so they have high ecological

validity.

Ecological validity is also affected when participants are aware of being studied. This

is sometimes called the Hawthorne effect after a series of experiments at the Western

Electric Company's Hawthorne factory in the US in the 1920s and 1930s. The studies

investigated changes in length of working day, heating, lighting etc., but eventually it

was discovered that the workers were reacting positively to being given special

treatment rather than just to the experimental conditions

284

Table of Contents: