3. 1. 1 What Is Think-Aloud Usability Testing? n an empirical technique for assessing the usability of a prototype of an interface n "may be the single most valuable usability engineering method"-- Nielsen, 1993 n How to do? n ask a user to think-aloud while performing a task on your system n you watch silently and learn n how the user thinks about the task n where the user has problems using your system

Think-Aloud Protocol Analysis n Cognitive psychology interested in understanding how people solve problems n discover and understand the details of n what information people pay attention to n how they represent that information n how they bring prior knowledge to bear n what transformations they make to information in the course of solving some puzzle or performing some task n

Think-Aloud Protocol Analysis n think-aloud protocol analysis: been in use for 40 years n a method of understanding these details of thought n In cognitive psychology research, this method has two parts n collect think-aloud data (protocols) n analyze the data by building a model of it (usually a computer simulation).

1 st parts to this method n Collecting think-aloud data n proven to be extremely useful in understanding the usability of computer systems n cognitive psychologists have studied it quite thoroughly, and UI design can learn a lot from their studies

two parts to this method-2 n making a formal model(形式模型) of the data and processes n has not been as useful in most UI design n although there have been some dramatic successes n The analysis step of think-aloud data in UI design uses the critical incident technique

Understanding Verbalization n three types of verbalizations of thoughts n Talk-Aloud (Type 1) n Think-Aloud (Type 2) n Mediated processes (Type 3) n these three types can be understood by thinking about what is in Working Memory (WM).

Understanding Working Memory

Understanding Working Memory n WM stores several types of things n It stores all the results of perception once those things have been understood by the person, n E. g. , The picture of the Lion in last slice n WM also stores all the information that is brought in from long term memory (LTM) to solve a problem n WM also holds all the intermediate states in a problem solution, information that is figured out along the way to the solution

Understanding Working Memory n WM stores several types of things n WM holds a lot of clues as to what a person was thinking about as they solved the problem or performed a task n On the other hand, WM does NOT hold the processes that are used on the information n WM holds to generate those intermediate pieces of information. n Those processes are used by the cognitive processor but are not explicitly represented in WM.

Theory behind Think-aloud protocols n People can verbalize(描述) the linguistic contents（可用词语表达的内容） of their WM n A lot of information in WM is already in linguistic form (expressed in words) n Type 1 protocol: n to "talk aloud, " — get these pieces of information to come "out of their mouth" right after they enter WM

Type 1 protocol

a simple addition problem: 2+4 =?

Cognitive psychologists have shown that: n Cognitive psychologists have shown that n for the most part, asking a person to "talk aloud" as they work on a task does not n change their thinking strategies or n slow them down in their thinking.

However, much of the information is not linguistic in nature n with modern GUI computers systems n may include information about space, color, time, or other things that are not naturally spoken about with words n This is not to say that they can’t be expressed with words, people have to learn n a vocabulary n how to translate the perceptual information they are getting into that vocabulary n Learn how to express those new thing

An example n wine tasting n if you are not familiar with the skill, you will not be able to translate the sensations on your tongue into words n you will have sensations on your tongue when tasting the wine (non-linguistic information), n you could learn the translation into words given enough time and exposure to the "language. "

Type 2 protocol

Another example n do a jigsaw puzzle picturing a beach n Thinking aloud as you do the jigsaw puzzle would be as shown in the figure of next slice

Another example

What Cognitive psychologists tell us? asking people to translate everything they are thinking into words ("think aloud") does not change the way people think about problems n it does slow them down（？） n In fact , some time it will speed them up(according to the book of Usability Engineering ) n This is the type of protocol most often used in usability research n can get a lot of information about the quality of the UI n

The Type 3 protocol n The Type 3 protocol is when you ask someone to verbalize, but make some sort of demand on them to add more processing to the information. n This type of protocol is not recommended for UI design because psychologists have shown that n this does change the way people think as they solve problems, n in addition to slowing them down.

the Type 3 protocol

asking a person doing the jigsaw puzzle to explain how he found each piece.

Conclusion of type 3 n In type 3: n the information state reported was one that the person never passed through without the instruction to "explain, " n the explanation of the process was not what she actually did.

Conclusion of type 3 n For these reasons n Type 3 instructions should be avoided n if you notice that a person seems to n be explaining rather than just reporting what they think n you need to put them back on the right track.

Critical Incident Analysis n Collecting the data. by video tape n by action-and-voice software n n Analyst role in designing the computer design. n analyze this data to decide how to improve the computer system design

Critical Incident Analysis n History developed during World War II n develop procedures for the selection and classification of aircrew n This method also has parts, involving several different people in different roles n collecting the data and n analyzing it n UI design draws from both parts of this technique for its use. n

original critical incident technique n observers report critical incidents that they witness in the course of performing a task n E. g. , combat veterans reported actions of officers they observed during combat missions. n These observations are usually gathered through … after the task is performed n interviews or n questionnaires.

original critical incident technique n Concurrent recording of these observations is recommended in the original papers, but they were acknowledged to be impractical in many of the situations the critical incident technique was used n e. g. , during combat missions. n The observations gathered are then categorized and interpreted by analysts and put into a final report that summarizes the findings.

the general procedure for the critical incident technique n Someone performs a task in the real world n e. g. , combat missions. An observer (who may or may not be the "someone" performing the task) reports critical incidents after the fact in interviews or questionnaires administered by an analyst. n The analyst categorizes and interprets the observations. n The analyst writes a summarizing report of the data and interpretations. n

modified procedure of Usability studies in UI design have n n A user thinks aloud as he or she performs a task using a prototype of the computer system being evaluated, usually in a laboratory setting, usually videotaped or using screen-and-voice capture software. An analyst (who is not the user performing the task) looks at the recording of the think-aloud protocol session and reports critical incidents using the UAR format. The analyst categorizes and interprets the observations. The analyst writes a summarizing report of the data and interpretations.

Differences between the two procedure n users take the role of the "someone performing the task, " n the usability analyst takes both the roles of the observer and the analyst in the original critical incident technique. n Watching the videotape and writing up UARs takes the place of after-the-fact interviews or questionnaires.

Definition of the term “critical incident”（关键事件） n The definitions used in the 1954 Psychological Bulletin paper are as follows: n “By incident is meant any observable human activity that is sufficiently complete in itself（本身） to permit inferences and predictions to be made about the person performing the act. "

Definition of the term “critical incident” n “To be critical, an incident must occur in a situation where the purpose or intent (目的) of the act seems fairly clear to the observer and where its consequences are sufficiently definite（明确） to leave little doubt concerning its effects. " n "Such incidents are defined as extreme behavior, either outstandingly effective or ineffective with respect to attaining the general aims of the activity. "

The three important concepts mapped into UAR format n The three important concepts in these definitions can be mapped into our format for UARs n Map table

What a usability study includes? n n n the procedure for doing a usability study combines n the best of think-aloud protocols n the critical incident technique. It gets at users’ thoughts n what they pay attention to, n what information they miss, n what prior knowledge they bring to the task, and n what is puzzling or clear to them. it also provides a tractable way to n record important data in critical incident UARs n summarize the results in a final report.

You are testing the Interface, Not the Participant n Your attitude n you are testing the interface, not the participant n including n everything you do with the participants in your study n everything you do with the data they generate n come up in every aspect of n the study ethics n procedures n data analysis

You are testing the Interface, Not the Participant n Mantra : The user is not like me. n You are the system designer n You can never think like a typical user n You know too much about the system n You cannot prevent that knowledge from helping you use the system

get information about how real users will deal with the system n To get information about how real users will deal with your system, n must do empirical testing with people who know as much or as little about computers and your specific system as you expect your real users will know n This leads to the inescapable conclusions that n participants in your study are extremely important

the Participant n They are always giving "good data" whatever they say or do, Why? n Since whatever they say or do n comes from a basis of knowledge like that of your eventual actual users n gives you a valid indication of what your actual users will think about your system.

You are testing the Interface, Not the Participant n Kinds of people to use the Software n Participant will not pay attention always n What we do? n No matter what the participants do, n always ask yourself what about your system indicate them in that direction, n rather than blaming the participant for not knowing enough or not paying attention or not reading. n You are testing the interface, not the participant.

Voluntary Participation n Participation must be voluntary Do not put any pressure on people to continue in a study once they have started n Or It may get useless data n It is unlikely when sth. objectionable found about participating, if so , let them choose quit or not n it is important that you create an atmosphere in the testing situation n Tell the participants that they are allowed to stop at any time. n n Pay bonus to the people

Watch carefully and stop at proper time n Be sensitive to stop to be sensitive to the many ways people express a desire to stop a session n Some people would never say something as direct as "I want to stop now. “ n But they may express a lot of negative emotions n "I'm so stupid I'll never be able to finish this, “ n "This seems like it will go on forever, stupid &%@#$* computer!!!“ n

Watch carefully and stop at proper time n What you must do n gently ask, "Would you like to stop now? " n Do not wait for tears or for the user to throw the equipment across the room! n When they are in highly emotional state, you will not get any additional usable data from them anyway it is better for both of you if you diplomatically terminate the session n remember to ask yourself later what in your system brought this user to this emotional state n

Maintain Anonymity n It is your responsibility to maintain the anonymity of your participants, do as follows, n n Store data under a code number Map the number and name Do not record participants face Do not show videotapes without participants consent Mapping between consent participant names and numbers Name Jane Doe U 1 Fran Diaz U 2 Chris Smith U 3 . . . Number

Informed Consent is fundamental to every kind of experiment using human participants. n Ethically Obligated to tell the particiant n what the experiment is about n what procedures will be used n what compensation they will receive n what they can do if they object to something in the study n because their participation is voluntary, they are free to stop and leave at any time. n

Informed Consent n Written Consent: be sure that you have informed each participant of these things, give them a written consent form to read and sign n build enough time into your testing schedule to allow them to read it at a leisurely pace n ask questions if they have any problem n • When parents or guardians must also be informed and sign a consent form – – – Children others who cannot legally give consent (e. g. , mental patients), people under severe stress (e. g. , severely ill patients, incarcerated prisoners),

Example of Informed Consent n

Laws n laws that govern the use of humans in empirical studies. consist of federal regulations n tell you n when they apply and n what you have to do to comply with them n what types of observations are exempt from the regulations n Institutional Review Board(评审委员会) n Be dictated when an organization must form "Institutional Review Board" n to oversee all observations of people n all universities that receive federal funding must have IRBs n

Laws n Most recent version is available in the website the regulations for the "Protection of Human Research Subjects, “ n http: //www. access. gpo. gov/nara/cfrretrieve. html#page 1 n In USA, it is your and your organization's responsibility to understand how these laws apply to your work n In otherconutry, it is your responsibility to determine what the relevant laws are and apply them n All the steps like maintaining anonymity and informed consent has to be followed even if your are exempted from government regulations n

3. 2. 1 Defining the Study’s Framework n What should our development team do? n come to a consensus as to n the purpose of the system n the usability observation n WHY? It will influence choices in the more actual steps. n list questions (in next slice) n asked answered them for several times earlier in the development n right before usability tests are conducted is a good time to ask them again n summarize the answers

Problems asked yourself W 1: What problem is the system trying to solve? what work is it trying to support ？ n help in choosing the test suite of tasks. n W 2: What level of support will the user have? n training to use this system n documentation n online help n other resources n help in developing materials, choosing tasks, and in setting up a realistic situation. n

Problems asked yourself n W 3: What type of usage to evaluate? n Walk-up-and-use/first-time use? n Users skilled in the task domain using this particular system/for the first time? n Exploratory use with no time pressure / goal-directed use under time pressure? n Any other aspects? n The answer to this question will influence choice of n tasks n participants n definition of what a problem is in this system.

Problems asked yourself n W 4: What usability goals for this system? n That 90% of users can accomplish a simple task within 3 minutes with no training? n That half of the people trained in using this system will perform with less than one recoverable error per task? n help in choosing tasks and defining the criteria for identifying problems／good features of the system.

example: the test of Date/Time control panel n The Date/Time control panel : study framework n supports setting a computer's date, time, and time zone. n It is particularly useful to people traveling with laptops. n The control panel should require no training or online tutorial: all owners of computers should be able to use it intuitively (a walk-up-and-use situation).

example: the test of Date/Time control panel n The Date/Time control panel : study framework n Every user should be able to complete the tasks of setting the date, time, and time zone. n It is not critical that there be no errors committed in performing the task, but no complete task should take longer than 3 minutes.

3. 2. 2 Choosing What To Observe n The first thing n choose the tasks you want to give participants n Note: The choice of tasks is influenced by several things n the content of the tasks n the need for training to do the tasks, n the duration of the tasks, n integration of the tasks n E. g. , n Choosing task to test the date/time control panel

The Content of the Tasks n pick tasks that reflect actual or expected use of the system n If a similar system is already in place (even a paper-based system), actual tasks can be determined by observing what people already do with the existing system.

The Content of the Tasks n pick several tasks from the many you identify and include them in the test suite. n When You identify many real tasks through observation, interview, or sometimes data collected for other purposes n include the most frequent tasks in the test n get data on how your system supports the tasks users will do on the most regular basis

The Content of the Tasks n include tasks that cover the range of functionality of the system n exercise every part of it. n If you do not do this, then the entire system is not tested, only the subset represented in the tasks you chose. n Make sure you include not only tasks that create things from scratch but also delete or modify existing things.

The Content of the Tasks n include error-recovery tasks n e. g. , put the person in a situation where they would be had they made an error and ask them to recover from it. n include in the test suite very important tasks, though possibly infrequent n E. g. , emergency procedures are used infrequently, but should be tested because they are safetycritical.

The Need for Training to Do the Tasks n the training must be counted as part of the test suite of tasks n If require training to accomplish those tasks. n The task set becomes Should be Rather than 1. "Learn to do 1. "Task A. " Task A“ 2. "Task A, "

The Need for Training to Do the Tasks n This allows you to test training materials as well as the system n test training materials : n paper-based reference materials n Web pages n lectures n online tutorials.

The Duration of the Tasks n users cannot do tasks for a relatively long time, E. g. , n more than an hour at a time, or n more than two hours on a single day. n More than this is usually too tiring for participants. n Design your test suite to allow for n hourly breaks and n a break for the day after two hours.

The Duration of the Tasks n This duration includes the training to do the task. n If the training is on one day and the task performance is on the next, you will have to review the training on the second day to refresh the user's memory.

The Integration of Small Tasks n include tasks required the integration of several system features. n E. g. , do not just have small tasks that n create a new object, n but n start the user out with a system full of objects already created n make them navigate through those objects, n modify them n create new ones

The Integration of Small Tasks n include tasks that test the integration of your system with other systems they might use. E. g. , n how might the users include the results of something your system did into a report written in a word processor? n How might they include those results in email? On a Web page? n How might someone send them information through email that needs to be used in your system?

Example: Choosing Tasks to Test the Date/Time Control Panel n The Date/Time control panel can only support a very limited set of tasks. setting the time, n setting date n setting time zone of a person's computer. n n we see that the usability target in this case will not require user training and n that the tasks will last no more than a few minutes. n These facts make it reasonable to require participants to carry out all tasks in one single test session. n the Date/Time control panel no tasks would integrate with other applications. n

Example: Choosing Tasks to Test the Date/Time Control Panel n The suite of tasks to use in testing the usability of the Date/Time control panel would include the following: Set the time n Set the date n Set the time zone n

Setting Up a Realistic Situation for Data Collection n This situation will differ depending on the system you are testing. If system is a desktop system for a PC, an office-like setup is fine. n traditionally set-up for usability labs in large corporations, Microsoft and Sun. n

Setting Up a Realistic Situation for Data Collection n For many more types of computer systems, realistic set-ups take on new characteristics. Testing the interface to a microwave oven is best done in a kitchen-like setting. n Testing a PDA can be done almost anywhere, but you need to pay attention to lighting conditions that will be found in offices, in cars, outdoors, and on public transportation. n Navigation devices to be placed in cars need to be tested while driving (or in driving simulators). n

Setting Up a Realistic Situation for Data Collection The variety is endless today with the ubiquity of computer systems. n In general, think carefully about how your system will eventually be used and set up the situations to match those uses as closely as possible. n

Setting Up a Realistic Situation for Data Collection n Difference situations , difference the data-collection process. n traditionally capture the actions and voice of the users : n fixed video cameras n microphones. n In the more mobile situations, more dynamic datacapturing techniques must be employed. n wireless microphones n operate a camcorder and follow the user around as they work. n Some new software capture and playback the user's actions and voice with a portable computer.

Setting Up a Realistic Situation for Data Collection n each situation has its own requirements, the solutions will vary. capture at least : n what users do with your system (including observations about what is being displayed to your users) n what users say as they think aloud, synchronizing what they say with what they do. n When record the user's actions, you will need record the time during the session. n n some video recorders can put a timestamp on the signal. screen-capture software have a system clock within the area of the screen when the software is capturing.

Writing Up the Task Scenarios n Write a statements of tasks for the participants to perform. How to do? n each task is written on a separate sheet of paper n given to the user one task at a time. n Depending on the task, these descriptions are either purely in words, or they include pictures or diagrams to help describe the task. n In addition to being described individually, a series of tasks usually is accompanied by a cover story that links the tasks in the series into a meaningful narrative.

Practicing the Session n Do not underestimate the difficulty of making a session run smoothly. n A saying among experimental psychologists applies to usability testing: "Subjects are like pancakes; you always have to throw the first few away. "

Practicing the Session n take time to get everything to work in concert. n The hardware and software must perform normally. n The written instructions must be clear, or you will meet problem of comprehending the instructions instead of problems with the system you are testing. n Giving of verbal instructions must be smooth; otherwise, you will confuse the participant. n The data-capture equipment must work. n The solution to all these problems is practice, PRACTICE.

Practicing the Session n Practice first with yourself n which can at least debug the procedures of n what software must be running when, n what windows must be open, etc. n Walk through the entire session, n reading every piece of information, n doing every task. n Write everything down in a script so you can reproduce the session the next time you run it.

Practicing the Session n Practice first with yourself n Make sure your script includes reminders for things you should do n e. g. , to make sure there is a tape in the video camera or that the sound level is adequate. n If you find problems with your procedure when you walk through it yourself, fix those problems before bringing in anyone else to practice on.

Practicing the Session n Practice next with a friend: different eyes will see things you did not catch yourself in your instructions, tasks, or equipment set-up. n If your friend has enough time to spare, again, walk through everything in the entire session. n n fix any problems your friend finds before the next practice session.

Practicing the Session n Finally practice with someone who has a similar background to the users you will be using in your study. By this third session, you might expect all the software and hardware to work, n but don't be surprised, if yet another set of hands and eyes finds something else wrong! n This is an opportunity for you n to practice your delivery of the instructions n to determine whether the instructions themselves are comprehensible to someone who is much like your users. n

Practicing the Session Purpose in conducting these tests is to find problems with your computer system n If you uncover any problems while running these tests of your procedures, write them up in UARs. n It is not "cheating" to discover system problems during this preparatory phase of the testing. n Don't throw away valuable information just because you weren't expecting to get any during the preparation phase. n

Recruiting Users n different systems require different types of users to provide data. n n As different systems require different situations in order to provide a realistic test The most important consideration is n that the participants in your usability tests have the same background knowledge as your eventual users will have.

Recruiting Users n E. g. , if your system is for airline pilots to program the destinations of their aircraft, you need to use airline pilots as participants in the usability tests n if current airline pilots are not available, retired airline pilots would be the next best thing. n Educational software should be tested by students in the grade where the software fits into the curriculum. n Medical systems should be tested by doctors or nurses. n the Disney Company tests their new virtual reality games with visitors to their existing theme parks. n n some users are more difficult to obtain than others, but it is crucially important to get at least a handful of participants who represent the actual user group

Example: Preparing for a Think-Aloud Usability Test of the Date/Time Control Panel chose a screen-capture program to record both the user's interactions with the system and their voice. n Make sure that the system clock appeared on the bottom menu bar of the screen, so we would always have a record of the time of the interaction. n wrote up two of the tasks. one on each page. n

Imagine the following scenario You are a new reporter for the Pittsburgh Post Gazette. There has been some recent unrest in the Philippines, and you have volunteered to go to Manila to cover the story. You are waiting to board your flight at Pittsburgh International Airport. n You have a few minutes to spare before your flight. Using the Date-Time control panel on your laptop computer, adjust the time zone to the correct one for your destination. n When you arrive, you discover that the battery in your laptop is dead. Re-set the time on your computer to 2: 35 PM, which is the current local time. n

What to do in this example We composed a script for the test, which you will see and hear in the next section. n Since we were using these recordings for course materials, as well as using them to find usability problems in the Date/Time control panel, our script had to be a little different from what you would normally use: we stated specifically that the recordings would be used in a class. n We also composed a consent form specific to this testing session , which also mentioned the classroom use of the recordings. n Finally, the analyst rehearsed the script six times before collecting any actual data. n

What to do in this example n Any potential owner of a computer could be a participant in our study. n ask participants about their computer background. n Questions about computer use were written into the script. n for run the observations at the airport, with travelers who had quite some time to wait before their next flights.

Sample consent form n Sample consent form

3. 2. 4 Introducing the Participant to the Procedure n Describe the purpose of the study in general terms n Train the user to “Think-Aloud” n Explain the rules of the observer n Example: Introducing Participants to a Think Aloud-usability Test of the Date/Time control panel n Recording of participant briefings

When firstly meet your test participant n give the participant enough information for him or her to feel comfortable doing the think-aloud n describe the purpose of the study n explain how to think aloud n provide time to practice the technique n explain the rules of the observation

Describe the Purpose Of Study n Give a brief introduction to the entire study with the following elements. n n n Tell the user n n n n Introduce yourself (name and title). Introduction of Organisation and Purpose the goal of the particular study testing the Computer, not testing the user Participation is purely voluntary it is OK if they want to stop at any time(check the computer system at this time) Signature in consent form Show the Equipment and explain how to use Show the participant how you are going to record their voice

When done, ask participant if OK to start recording n if so, start the recording device. n If "no" ask them n what else they need to know n answer any questions. n If they persist in not wanting to be recorded, tell them their actions and voice must be recorded, it is a requirement in the study. n Excuse them if they continue to refuse.

Train the User to “Think Aloud” n Training the user is a bit awkward at first n Should give them practice doing on tasks n Ericsson and Simon’s Protocol Analysis n successfully to elicit good think-aloud behavior with users. n adapt these instructions to your task situation, script it, and read it almost verbatim (word by word) to your participants.

Train the User to “Think Aloud” n How to do the observation n Ask the user to perform a sample Think-Aloud n Ask the user to verbalize things he is searching for and see

Train the User to “Think Aloud” n If the user stop , request him to talk again n During the practice session, the participant stops talking for 5 to 10 seconds, say "Please keep talking. " n Don't say "What are you thinking" or "Please explain what you are doing" n because these phrases tend to encourage people to switch to Type 3 verbalizations (explaining or filtering what they are thinking),

Explain the Rules of the Observation before moving on to the observation phase of the usability test, explain the "rules" of the observation. n Cannot answer questions during the observation n Can ask questions but you'll record them and explain them after observation n Prompt the user to continue if user keeps quite n After explaining n ask the participant if any questions exist about the think-aloud procedure or anything else so far. n

Example: Introducing Participants to a Think-Aloud Usability Test of the Date/Time Control Panel n n For preparation the test of the Date/Time control panel, incorporated all the information into a script. The analyst follow the script when testing the software. Following the script are several recordings that capture our delivery of this introductory material to a typical user. As with all informal observations, the analyst may not have adhered perfectly to the script. n n Deviations or omissions from the script in the recordings can be found Typically experienced analysts are able to follow a script closely without appearing to be unnatural or to lack spontaneity.

Recordings of Participant Briefings 1. The first recording is of the analyst describing the purpose of the study in general terms. Refer to that subsection of this module, the script above, and the consent from, as you view it. Think about what the analyst is doing according to the procedures presented in this module and what she is doing that deviates from those procedures. 1. 2. Click Introduction_&_Consent to view the recording. After you have viewed the recording, click Intro_Critique for our critique of the recording. The second recording is of the analyst introducing the thinkaloud technique. Refer to the relevant previous subsection, and to the script above, as you view it. Think about how the analyst is following the procedures presented in this section and what she is doing that deviates from those procedures. 1. 2. Click How-to-TA to view the recording. After you have viewed the recording, click TA_Critique for our critique of the recording.

Recordings of Participant Briefings 3. The third recording is of the analyst demonstrating the thinkaloud technique and giving the participant a chance to practice. Refer to that subsection of this section and the script above as you view it. Think about where the analyst follows and deviates from the procedures presented in this section. 1. 2. 4. Click Practice to view the recording. After you have viewed the recording, click Practice_Critique for our critique of the recording. The last recording is of the analyst explaining the rules of the observation and giving the participant the task. Refer to that subsection of this section and the script above, as you view the recording. Think about where the analyst follows and deviates from the procedures presented in this section. 1. 2. Click Rules_& Tasks to view the recording. After you have viewed the recording, click Rules-&-Tasks_Critique for our critique of the recording.

3. 2. 5 Conducting the Observation n ready to conduct the observation n After introduced the participant to the general purpose and procedures of the study n After trained him/her how to think aloud n Contents Introduce the observation phase n Begin the observation n Conclude the observation n Example : Conducting an observation of Date/Time control panel n

Introduce the Observation Phase n Describe the system n Tell the participant as much about the system as you expect real users will know nif they will only have seen it through trade-magazine advertising or TV commercials, give them that level of overview. n. If they will have gone through an hour's training, this is the time to do that training.

Introduce the Observation Phase n Tasks you wish him / her to do n If there is more than one task, it is best to do one at a time rather than describe all of them and then set the user free to do them all on their own. n Have the tasks written on a sheet of paper, give to the participant, and tell them about it verbally so they get the information twice. n If diagrams or pictures might make the task easier to understand, take the time to prepare these ahead of time and walk through them with the user.

Introduce the Observation Phase n Ask the participant if she has any questions n any questions about the goals of the study, the procedures, the product, or the task. n Answer any questions that clarify what the user has to do, but not any that solve some of the problems they might encounter in doing the task. E. g. , ndon't answer questions like "How do I do that task? " nif asked reply "That's exactly the sort of thing I need to observe. I would like to see how the system helps you figure that out. "

Begin the Observation Check with the recording devices. n Keep away from the user n monitor their progress from another room n If running the test in the field, move out of the user's line of sight and sit quietly n n Note down their questions tell the user you will not answer questions about the product while they were working n should speak their questions aloud n jot down any questions and answer them later n

Begin the Observation n Prompt if they keep quite – – Just say "Please keep talking” Don’t say "What are you thinking? " or "Please explain what you are doing" Be sensitive to their willingness to quit n Answer Their Questions n Note their opinions and suggestions n

Conclude the Observation S 1: When finished, answer any questions you jotted down on your notes n S 2: Ask them if has any more questions about the system, the study, or the organization. n S 3: Answer these questions n If you can right then answer, n If you can not, put the user in touch with someone who can answer the questions. n n Although the test has got reliable data, it is a good idea to ask the user these questions (1) it gives them a chance to express their opinion, which people often want to do n (2) users some times have great ideas that you can only collect by asking. n

Conclude the Observation S 4: Ask for any opinions about the product they just tested. n S 5: Ask for any suggestions to improve the product. n s 6: Thank for their participating. n Reiterate that their participation will help you identify problems with the system n Give the participant whatever compensation you had promised them when they were recruited, or arrange for the compensation to be sent to them (e. g. , fill out any necessary paperwork to pay them or get their address to send them a copy of the final report). n Thank the participant again when they leave. n

Example : Conducting An Observation With Date/Time n Click at the link and view the recording n Introduce yourself n Introduce the task n Get the user’s consent n Explain the task n Ask the user to begin and prompt if he stops

Example : Conducting An Observation With Date/Time n In the observation, we decided not to give any training or introduction other than the name of the panel. n This was to simulate how real users would be introduced to this control panel. n it would just come installed on their machine, ostensibly without help or documentation. n Give the descriptions of the tasks to the participant, each task on a separate sheet of paper. as you view the recording. n Think about how the analyst follows and deviates from the procedures presented in the course material.

Example : Conducting An Observation With Date/Time At one point, the participant says, "I give up" but then he continues to look for an answer, so the analyst correctly lets him continue for some time. n He tries some ideas, summarizes the problem, tries some more, and, finally pauses for a length of time and gives the analyst an imploring look (which is not visible on this recording). n The analyst is sensitive to the participant's frustration, and decides to suggest that the participant move on to the next task. n He gratefully accepts and continues to "thinkaloud" very well. n

Another recording of a different participant n n n n The script for the analyst to conduct and conclude observation. The Conclusion recording. Refer to the previous material in this section as well as to this script as you view the recording. Think about where the analyst is following and diverging from the procedures presented in the course material. When done, read the critique. The recordings represent quite good Type-2 verbalizations. For contrast, we have included an example of a think-aloud session where the participant explain her thoughts rather than merely report them. The analyst should have interrupted the participant to ask her not to explain (as the script earlier requires), but failed to do so. When you view this recording, notice how much slower and less natural the process seems. Click the following link to view the Date. Time 2 recording.

3. 2. 6 Analyzing the Observation n Having collected think-aloud usability data from several participants n you now must analyze the data to find n good features that you want to preserve in future versions of your system, n problems with the system you want to fix, n possible solutions to those problems.

Establish Criteria For Critical Incidents n What is a problem? n It is important to think hard about what behaviors should be considered critical incidents n what observable behaviors indicate that a feature is so well designed that it should be preserved in future redesigns

Establish Criteria For Critical Incidents n What is a critical incident? n extreme behavior, either outstandingly effective or ineffective with respect to attaining the general aims of the activity n not everything a user does will be "critical"; not everything is worth thinking whether it should be fixed or preserved in the next version of the software. n Re-Defining Critical Incident n Different design situations will generate different criteria for criticality

Establish Criteria For Critical Incidents n List criteria for … before view the recorded data problems n good features n List them in a similar table, using it as a reference as you review the data. n no more than about 10 criteria n keeping more than that in mind as you review the recordings can get unmanageable. n Some useful criteria for systems and are easily prototyped in Visual Basic. n

View The Recorded Behavior And Write UAR’s View the record n Identify a critical incident n Give it a unique name and mark as a good feature or a bad feature n Find relationships with other UAR’s n

Evidence For The Aspect n It should state the actual behavior, includes what they said n what they did (typed words, clicked buttons). n what was on the screen at the time of the incident n something on the screen does NOT in itself mean that the user actually noticed it n n Evidence is just “facts”, NOT an interpretation may be more than one interpretation of the same facts n Interpretation will appear in the "explanation" slot n not in the "evidence" slot n

Evidence For The Aspect n Include pointer for replay it anytime later n tape-counter for a videotape n time stamp on the videotape n time shown on a clock displayed on a computer screen n Purpose and consequence should be fairly clear, include enough context to permit those inferences n what they knew n what they paid attention to n how they approached a problem

Evidence For The Aspect n n n Evidence should start with statement of goal n It is much easier to understand how the interface supported or failed to support a user n User will tell what he is trying to accomplish by saying something like n "I wanna find…" n "Now, let's try to…" n "Gotta go do…" when not explicitly voice the goal, other actions is evidence. E. g. , n the user's goal is to do the part of the task she just read from the task description. n the act of reading the task description would be evidence for that goal. Sometimes the system sets the user's goal. E. g. , n system presents a modeled dialog box. In VB, msgbox n Therefore, the appearance of a dialog box and the fact that it is modeled should be recorded as evidence for the user's goal.

Evidence For The Aspect n Include the effects of the user's actions in the evidence slot, These are usually: n n n Some side effects have no visible effect on the screen. n n screen shots descriptions of what happened on the screen and with the system. include a factual description of those side effects in evidence. When the consequence of a user's actions not be seen until much later in the recording. n n include the evidence for this effect, even if it is not contiguous with the incident itself. The evidence you include in the report must be complete enough to describe both the goal and the effects, no matter how separated they are by time.

On-line UARs vs. Paper UARs n In on-line UARs n easy to include actual clips of the recording n Can play the multimedia contents n Paper UARs n continue to be needed for many reasons n easier to carry and hand out copies at a meeting n even include dynamic(On-line play) evidence, consider using it as a back-up to static(Paper based) evidence.

Explanation Of The Aspect n Explanation is your hypothesis when they were performing the acts , about n what the user was seeing n Interpreting n Understanding n guessing n In the UARs, Clearly say n your understanding of the user’s background n your understanding of how the system actually works

Explanation Of The Aspect n Sometimes there will be more than one plausible explanation for the evidence n Record all the possible explanations n Different explanations point to different solutions n look for more evidence confirms or disconfirms the other explanations n Sometimes have to try more users on a shorter, more focused task n gather evidence to distinguish between different explanations.

Severity of the Problem or Benefit of the Good Feature n severity is related to the criteria used to identify the incident as critical. n giving up on a task is more severe than the user expressing distressed surprise

Severity of the Problem or Benefit of the Good Feature n there are no standard criteria for identifying critical incidents n what counts as critical varies from one design situation to the next n what counts as severe also varies with the design situation n establish criteria for measuring severity as each design situation dictates（设计规定）.

Possible Solution, Including Trade-Offs n It is so important to find evidence for the user’s goals A solution to a usability problem comes from supporting the user's goals directly. n Questions can lead to inspiration for a solution(设计灵感 ). n Ask n if the goal the user care is supported by the system n if not, why not? n Is it a reasonable goal, if so, how can you support it? n If it isn't a reasonable goal, ask what it was about the system that guided the user to form an unreasonable goal. n

Possible Solution, Including Trade-Offs n Record solutions generated by users themselves Sometimes users will generate solutions themselves n Record these suggestions and give them consideration much as those from the development team. n Do not give them more importance just because n they come from a user—users not knowing what features would actually serve them in the long run. n Any suggestion arising in a usability test is likely to be a reaction to a local difficulty and may not take into account enough understanding of the whole system to be truly insightful. n

Example : Analyze The Observation Of Date/Time n Notice: TA UARs seems bigger than HE UARs. the critical incident technique requires incidents to be "complete, " n they include the setting of a goal, the problem-solving required to accomplish that goal, and its final resolution (whether it is success, abandonment, or a change of goals). n HE UARs can identify problems in the small steps towards a goal (e. g. , understanding a label, or searching through a cluttered screen) n TA UARs require a complete chain of events towards a goal. n there is often evidence supporting the HE UARs within the TA UARs, but they will not be identical in scope. n

3. 2. 7 Finding Possible Re-Designs n After writing up UARs from several usability tests n Step back and generate ideas about how to redesign the system to fix any problems you have found. n Contents Relating different Usability Aspects n Determine Possible Solutions n Example: Looking for Possible Re designs of the Date/Time control panel n

Relate Different Usability Aspects n Relate repeated actions on different objects n UARs of same goal arose by different users or under different circumstances n slightly different goals arose with the same user n "Similar" goals usually share an action or an object. n Look for repeating the same action on different objects n E. g. , deleting a file and deleting a folder are both expressed as an action (delete) on an object (file or folder).

Relate Different Usability Aspects n A string of different actions on the same object may relate several UARs n E. g. , copying file 1 to another folder, then deleting file 1 (which has the effect of moving the file). n Such strings of related goals often show a larger goal that is not being supported well in the system.

Relate Different Usability Aspects Many problem UARs is about system feature relate them n E. g. , problems with the spelling checker, perhaps the operation of that entire feature should be rethought n Examine UARs that relate to other parts of the computer system outside the system you are designing n This integration with other applications is often a source of usability problems. n

Relate Different Usability Aspects n Ways relate UARs: heavily depend on the particular system you are designing. n Find commonalities n Always think about how to relate the UARs, these patterns have the potential to inspire better designs. n don't get bogged down in trying to impose patterns where they do not jump to mind (e. g. , don't spend more than a day thinking about this). n Sometimes there really are no big patterns, and the right thing to do is just to go about fixing the little things. n

Determine Possible Solutions n Write a solution that supports the users goals directly n If UARs suggests a larger user goal seems to be problematic, Ask yourself : n Is this larger goal intended to be supported by the system, and, if not, why not? n Is it a reasonable goal, and, if so, how can you support it? n If it isn't a reasonable goal, why the system guided the user to form an unreasonable goal?

Determine Possible Solutions n Write a solution that supports the users goals directly n If a set of UARs about a specific system feature, check： n what user goal the feature was designed to support. n Is this a reasonable goal? n Was there any direct evidence that users would form that goal n did anyone articulate that goal? If not, what other goals did the users express in place of that goal? n if they didn't voice the goal the feature was designed to support, what goals were they working on when they did use that feature?

Determine Possible Solutions n Write a solution that supports the users goals directly n If UARs concerning with other applications on the computer, n make sure you list the goals. n These goals are only a subset of what people will actually want to do with your system n use this list as a starting point, and generate other possible integration requirements.

Determine Possible Solutions n Make sure your solution does not violates any heuristics n Besides the general idea that the system redesign should support users‘ goals more directly, there are no hard and fast rules（硬性规定） about how to generate new designs. n Use the UARs and their relationships as inspiration for a redesign n check your ideas with a quick Heuristic Evaluation to make sure you are not violating heuristics. n prototype the new design and iteratively user test it again.

Example: Possible re-design of Date/Time n Check with the complete set of Date/Time UARs n Follow the material given under this topic

Summarizing Report Communicate the Results of the usability analysis n If the problem is severe it should not exceed three pages n If the problem is small then ranked them n In the report usability aspect must be fixed along with usability problem n If needed produce “highlight” videotape to support your report that summarizes all the problems mentioned in the UAR’s n

Example: A Summarizing Report for the Date/Time Control Panel n If the think-aloud data were all you had to report on, a summarizing report would not be necessary, because n the three UARs themselves would be sufficiently short for everyone to read. n For finding possible redesigns include the 28 UARs of Heuristic Evaluation n Sample. Of. Summarizing. Report. doc

3. 3. 1 Comparing Think-Aloud Usability Testing With Heuristic Evaluation n In the section, we will discuss n Many usability aspects identified in HE are confirmed in Think-Aloud Usability Tests n When HE predictions are not confirmed by Think-Aloud. n “False Alarms” Vs True Problems n Think-Aloud Usability tests can show Things HE can’t show

TA & HE n Two techniques for evaluating the interface designs. n heuristic evaluation: HE is an analytic technique that you can use quickly at a very early stage in design n (e. g. , with rough, paper sketches of ideas without the exact procedures worked out). n think-aloud usability testing: requires a more detailed design and is easily conducted on prototypes you can build in VB

TA & HE n Why teach both techniques? n many usability aspects predicted by HE are confirmed in usability tests n these two techniques will sometimes give conflicting information. n HE can uncover usability aspects that think-aloud tests cannot. n think-aloud usability tests can address usability aspects that HE cannot. n they do not overlap completely and are best used together.

Many Usability Aspects Identified in HE are Confirmed in Think-Aloud Usability Tests n Heuristics are general principles distilled from many years of design experience. n Heuristics can predict the usability problems that are revealed in think-aloud usability tests. n E. g. , Date. Time 3 recording of the Date/Time control panel. n problems uncovered by heuristic evaluation confirmed by the think-aloud usability tests. n though not completely.

Example n UAR HE 2 predicted the problem n n the map being so large in that it would have interaction but not being supportted In think-aloud recording, n the participant brings up the Time Zone tab, he moves the mouse pointer over the map and says (time 16: 28: 02): If my knowledge of geography were better, I would instantly be able to locate it [i. e. , the Philippines] on the map…[moving the mouse pointer around the Pacific Ocean] but I can't.

Example again n the interpretation of this behavior is that n n he thought he could do something with it, if he could have found the Philippines on the map, he would have wanted to click on it. Later he gives up searching through the time zone list, if you listen carefully, you can hear him click on several places in the map (time 16: 30: 02)

n n Example of the list of time zones Several HE-based UARs note the list of time zones HE 18: n Provides too much irrelevant information. HE 21: n Is inefficient, since a visual search through such a long list takes too long. TA: n. . . of course, it isn't in any kind of order, so we just have to sit here and scan through lots and lots of cities (in an exasperated voice)…and then we realize we went in the wrong direction, we have to go alllll the way back up (the user sighs). n evidence of the user's perception of inefficiency

And … n HE 27: n the very incompleteness of the list of cities would cause users confusion. n TA: n he fails to find either Manila or the Philippines and finally gives up on the task (time 16: 30: 00) n All this evidence supports the problem identified in HE 28: n the List. Box is a bad control for setting the time zone.

Example for supporting good feature n HE 5 and HE 16 n TA n The user has no difficulty using the OK button in several instances (times 16: 30: 10 and 16: 30: 50), n HE 22 n seeing the time zone setting on the Date/Time tab n TA n "of course, we are still in the wrong time zone"

When HE Predictions are not Confirmed by Think-Aloud Usability Tests n Happen in two ways: n Think-aloud testing contradicts an HE prediction. n HE predicts a good feature n participants in TA tests have problems n Think-aloud testing yields no evidence to support an HE problem prediction. n HE predicts a usability problem, n TA give no evidence to show users actually having that problem.

When HE Predictions are not Confirmed by Think-Aloud Usability Tests When HE predicts a good feature, but users have problems in TA tests：Believe think-aloud data! n Unless some aspect of the testing situation that makes it undeniably anomalous, E. g. , . n n the participant became ill and could not continue with the test n had noticeable difficulty concentrating during the test's last 15 minutes

"False Alarms" vs. True Problems n There is a debate in the human-computer interaction community n whether HE-predicted problems not be shown up in TA tests are n “false alarms“ or n true problems undetected in the tests. n fixing "false alarms, " will waste time—or worse, decreases the usability of the system! n true problems undetected by the think-aloud testing, fixing them is a good use of effort.

"False Alarms" vs. True Problems n To the heuristics, some situations unlikely to be covered in TA tests. E. g. , n most TA tests cover new users' experience of the system, not skilled users' experience. n TA tests use people who have never seen the system before, give these people a minimum of training, and record their initial interactions with the system. n This approach yields valuable information about how new users think as they use the system—but not about how experienced users might think.

"False Alarms" vs. True Problems n several heuristics give advice on how to handle errors and about help and documentation n errors that participants in think-aloud tests may or may not make n test participants may or may not choose to use help and documentation n We suggest two rules of thumb to apply to usability problems identified in HE but not confirmed in TA tests

two rules of thumb - 1 n If an HE problem does not show up in TA, review TA data to see if the situation the HE refers to did indeed arise. If arise, then it is possible that either n 1) the HE UAR reflects a false alarm n 2) other users in other circumstances will indeed have the trouble the HE predicted. n After all , only tested a few users on a few tasks, n we cannot possibly test all potential users. n the next user may have the problem n if more than one participant encounter this situation and none of those had problems n trust the data and assign the problem a low priority on the list of things to fix. n

two rules of thumb - 1 n If an HE problem does not show up in TA, n If the situation did not arise or if you had only one user who encountered n you have no reliable data to contradict the HEpredicted problem. n then judge the problem on the basis of its n Severity n relationship to other UARs n the difficulty involved in fixing it

two rules of thumb - 2 n A system being used often (e. g. , word、Excel), problems predicted by the Flexibility and Efficiency of Use heuristic , even if not show up in the TA tests. That heuristic predicts usability for skilled users, not new users, n TA tests typically will not provide data for that type of user. n

Example for Applying these rules of thumb to Date/Time control panel n Date/Time control panel is not used very often, so it falls first rule. n Since we had only one participant in our think-aloud test, all our HE predictions fall under the first rule's second clause —where we are advised to trust HE predictions.

Example for Applying these rules of thumb to Date/Time control panel UARs HE 6, HE 8, HE 12, and HE 14 all predict problems with the standard OK, Apply, and Cancel buttons, but the user seemed to have no problems with these. n Upon a closer examination: n we discover that the user only used the OK button when he was finished using the control panel. n He had no opportunity to encounter problems with these buttons because he never had the goals to apply or cancel, only to accept the changes he had made and close the window. n Therefore, we have no evidence for or against those HE predictions. n

Example for Applying these rules of thumb to Date/Time control panel n There is no evidence in the recording for or against the two UARs concerning help (HE 23 and HE 24) because the participant never used the help facility. n In such circumstances n probably decide to fix these problems predicted by heuristic evaluation, n Based on their relative severity and cost of applying the fixes.

Think-Aloud Usability Tests Can Show Things HEs Can't Show n n n some problems heuristic evaluation cannot show in principle Some ohter does not show in practice. When an HE is done using a paper prototype it cannot identify problems with the dynamics of the system. n E. g. , it cannot indicate where system speed will be a problem—but where users may become annoyed or impatient because the system seems slow. n HE will not detect situations where the feedback is so delayed , but the user becomes confused as to what is happening with the system.

Think-Aloud Usability Tests Can Show Things HEs Can't Show n interacting with a running prototype during usability testing , these problems happened. n only interaction with a real system will uncover problems in the code that will cause the system to crash. n HE assumes the system will work as sketched or prototyped.

Think-Aloud Usability Tests Can Show Things HEs Can't Show n if the code in the deployed system is different from the code in the prototype. n TA tests with a prototype will not uncover this type of problem n So , extensive testing with actual users must be done before any major release of software.

Practice tell us …… HEs are done without framing the analysis in terms of a real-world task, n TA tests typically try to simulate real task situations n This task-oriented framework often reveals usability aspects of the interaction n n E. g. , n n between disparate features in the system between it and other software in the users' environment. when copying something from the system into an e-mail message or a slide presentation, difficulties often arise. in practice, these usability aspects tend to appear more often during TA than during HE.

Date/time panel sample n the prototype responds almost instantaneously to the user's actions, n n The prototype is also sufficiently robust n n no occasion to complain the speed of the interface. that it does not crash during the tests. it is isolated enough n that it is unlikely to be used in integrative tasks that span Both of these are dynamic artifacts of the system

Remember! it is an endless source of challenge to designers of user interfaces. n remember: continued observation of users n whether with think-aloud usability tests n Or informally use product in the field n n It is an integral part of iterative design and the product life cycle