How Reliable Are Self-Reported Task Completion Rates?

by Jeff Sauro | December 8, 2015

Unmoderated testing provides many benefits. The most notable of which is the ability to collect metrics from a large and geographically diverse sample of participants quickly.

A common metric collected in usability tests is the task completion rate. It’s often called the fundamental usability metric because if users can’t complete a task, not much else matters.

But when you can’t see what the participant is doing, like in moderated (remote or in-person) testing, you need other means to assess task-completion.

One of the most-used methods for assessing task completion is a validation question; usually a multiple-choice question with a correct answer and some plausible wrong answers.

After users complete a task, such as finding a product (or renting a car as shown in the example above), they answer the validation question (e.g. what was the price of the blender?). The idea is that participants only know the answer—the price, brand name, or size—if they completed the task correctly.

Another method to assess task completion is by URL. You can verify task completion by checking that a participant visited a particular page on a website (called validation by URL).

Validating by question and URL have some shortcomings (e.g. guessing and constantly changing URLs), but are generally the best methods for assessing task completion in unmoderated studies. But there are many times when it’s difficult to validate by question or URL. For example, when participants go through a web application and all users are presented with the same information, or when there are many possible correct answers and pages, it’s difficult to know whether the participant actually completed the task.

An alternative way for assessing task completion is to simply ask participants whether they completed the task successfully (self-reported task completion). There’s some evidence that users can reliably self report usability problems, so perhaps task-completion rates may be another viable metric. We have some reason to be skeptical though. In earlier research, we’ve seen that participants are overconfident in their ability to complete tasks. So can we rely on self-reported task-completion rates?

Methods

To get some idea about the viability of self-reported task-completion rates, we looked at four unmoderated competitive benchmark studies conducted over the last two years in which we asked participants to self report task completion and had a method to verify the completion rates.

Two of the studies were the same study completed one year apart with different participants but with tasks that were almost identical. The four studies examined were of web applications and a consumer-focused information website and included data from 838 participants.

We used three verification methods:

Recordings: The first verification method involved watching a recording of users’ screens, a very handy feature that MeasuringU, UserTesting, and TryMyUI provide. A researcher then watched each session and coded the activity as success or failure based on the task-success criteria and checked it against the self-reported completion rate.

Validation question: The second method used in one study was a validation question where participants were asked to provide the address of a store they were tasked to find (a common verification question for unmoderated studies). This was asked after participants indicated whether they completed the task successfully. A researcher manually checked the accuracy of the addresses against the self-reported metric.

Email artifact: The third method involved having participants send information to a designated email account. A researcher then verified the information sent to the email account and compared it to the self-reported completion rate.

Results

Not surprising, self-reported task-completion rates were much higher than verified task-completion rates. On average, the self-reported task-completion rate was 93% as shown in Figure 1 below (min 63%, max 100%).

In contrast, the verified task-completion rate was 33% (min 4%, max 88%). That’s a difference of almost 3 to 1! The correlation between the verified and perceived completion rate was a rather low r= .24. Self-reported task-completion rates therefore explain only around 6% of verified completion rates in this dataset.

Figure 2 shows 34 paired bars; each one represents a task across the four studies (A, B1, B2, and C). The large gaps between the red and blue lines show the gulf between what participants reported and what the researchers observed.

Study C in Figure 2 has noticeably higher verified completion rates than the other studies. In this study, verification came from the validation question, which has a looser success criterion and may explain the smaller gap.

Because these are all competitive studies, we also wanted to see how well the relative standing of the competitors across the tasks is from the self-reported task-completion rates. The news gets a little better here. We ranked the competitors for each task for self-reported and verified task completion. The highest self-reported task-completion rate of the competitors received a 1, the second a 2, and so forth.

The Spearman’s rank correlation (rho) was .43. In other words, self-reported task-completion explains around 18% of verified task completion. Of the ten tasks, the self-reported task leader is the same as the verified task leader half the time (5/10)—not terribly compelling.

Discussion

The gulf between actual and reported behavior is the topic of many studies in the behavioral sciences, user research and that’s also the case here. It’s no wonder it’s a cliche to “watch what users do and not what they say.” However, in many cases, participants may feel like they’ve completed enough of the task even if the researcher determined it was only partially completed, which may explain much of the gap for some tasks.

So is there any value in using self-reported task-completion rates? The data from this modest sample of tasks and studies suggests self-reported task-completion rates by themselves aren’t a terribly accurate measure of task success. However, they aren’t useless. In general, we found most of the self-reported completion rates to be very high—28 of the 34 tasks (82%) were above 90%. So if you find yourself needing to use self-reported task completion and the rates fall below 80%, you likely have a more difficult task than average.

And while the task-completion rate metric may be difficult to measure, other data from the tasks likely makes the task worth the effort. Other task measures like perceived difficulty, task time, and task confidence are still valuable even if you are unable to verify task completion.

Future research we’re conducting is examining how unmoderated usability task-completion rates compare to moderated task-completion rates. We’re also exploring how different verification methods (e.g. validation question versus watching videos) affect the gap between verified and self-reported completion rates. Finally, a larger issue altogether is how effective the staple of unmoderated studies—verification questions and URLs—are at assessing task completion. But that’s a topic for a future article.

Summary

As expected, self-reported task-completion rates are much higher than verified task-completion rates. This analysis found:

Self-reported task completion rates are almost three times higher than verified completion rates.