05/02/2010

Analysis of usability data is the hardest step in the process because it is easy to jump to the wrong conclusions.

Over the last decade, I have sat with UI teams as we observed thousands of hours of user behaviors with websites and software. Countless times I have overheard product managers, UI designers, marketers, information architects, and even other usability consultants – some of them very well known and considered experienced – drawing conclusions about the UI implications of user behaviors that are just plain wrong.

The problem has to do with what I mean by “wrong.” As mentioned above in the entry about testing with more than 5 users, there is much misunderstanding in the industry about methods. Years ago, Jakob Nielsen and Tom Landauer published a paper that outlined a mathematical proof, under controlled conditions, suggesting that testing with a small number of users can result in as valid insights as much larger and carefully controlled quantitative tests.

The methods Jakob made famous, testing with very small numbers of users, were adopted by the broader industry of design agencies, technical consultants, information architects, and UI designers in organizations large and small. Overall, the results have been positive. More organizations are doing usability testing, and they have become convinced that rapid tests with few users can still help them improve their products -- without spending tons of money. As Jakob claimed in that original 1993 paper, the real insight was that testing with zero users results in zero insights (Nielsen, Jakob, and Landauer, Thomas K.: "A mathematical model of the finding of usability problems," Proceedings of ACM INTERCHI '93 Conference {Amsterdam, The Netherlands, 24-29 April 1993}, pp. 206-213.). He was right then, and he’s right today.

However, one problem with these rapid 5-10 user tests is that such a small N= places much higher responsibility on the investigator to make judgment calls and put observed user behaviors into a larger context before jumping to conclusions about what to do with the interface.

Watching a mere 5-10 users interacting with a UI is very much a qualitative study. You know a behavior or problem occurred, but you don’t have any objective, quantitative measurement of how bad the problem is.

If you had observed, say, 500 users, with a broad range of computer experience, literacy levels, education, domain knowledge, gender, age, and ethno-cultural background attempt to use the interface -- and had determined in advance clear definitions for task success, time-on-task, error rate, and satisfaction measures – you could do a much more accurate job of measuring “how bad” an individual UI problem is.

But in today’s world of tight budgets, when we do these tiny tests, investigators walk away with two high stakes judgment calls to make:

what is a “real” behavior and correlating UI problem, versus a minor issue that can be safely ignored, and

what is the right recommendation that will fix the problem?

The critical success factors for usability testing outlined in entries above -- watching real users, observing real user-directed tasks, testing at the right point in the process, and even communicating results effectively -- can all be accomplished by following careful checklists, or relying on the instruction of others.

However, there is no textbook, classroom presentation, degree, or even experience that will make someone able to do effective usability analysis, or help them formulate effective recommendations.

Good usability testing is about how you interpret your data in order to fix problems and improve websites. And lots of the most important interpretation of data is subjective.

I like how Jeffrey Rubin described it in his original (1994) “Handbook of Usability Testing.” He made a comparison between usability professionals and medical doctors. Any doctor can take a temperature, measure heartrate, blood pressure, etc. But as he explained: “interpreting those numbers and recommending the appropriate course of action for a specific patient is the true value of the physician.” And in the case of healing a sick UI, the process is only half science. It is also an art.

I first learned this lesson back in the late 90’s in New York City when I was Director of User Research at Scient. I hired some university researchers to consult at the height of the dot-com boom. Some of them had extremely sophisticated methods for analyzing UI. However, much of what they produced and concluded was of limited practical usefulness to our UI design teams. Their reports were pages and pages of measurements and calculations – but as I found out while Scient was re-designing the Major League Baseball website – calculations didn’t help advanced baseball fans figure out how to use the website to do custom pitcher matchups, or create a custom jersey without getting confused.

And that was because some of those researchers had no feel for information design, and their instincts for how to prioritize observed behaviors or solve real usability problems were just “off.”

Much of the usability profession gets mileage talking about how their purpose is to check “bad” designers. And I’ve certainly seen plenty of design that shows minimal understanding about how a broad consumer audience searches for information on a page, or attempts a task.

However, as much as I’m concerned about designers being “bad” at usability, I’m just as concerned about usability geeks being “bad” at design.

Because after all, when we watch real users doing real, unscripted tasks – the leaps of imagination and insight that we use to 1) decide what the real problem is, and 2) propose a solution – have as much to do with design, as they do with usability.

Usability is an outcome.

Creation. Design. These are some of the actions we take to obtain that outcome.

Which brings us full circle to the issue I raised in the beginning of this entry – watching user behaviors with a group of colleagues, and hearing them make conclusions that are “wrong.”

Jared Spool has complained (Spool, 2007, Journal of Usability Studies, Vol. 2, Issue 4, August, pp 155-161) about how subjective and lacking in standards professional User Experience practice has become – going so far as to suggest that any supposed “methodology” we claim is akin to Aesop’s fable of stone soup.

Spool is partially right. So how then does one define what is right or wrong? Or how does a usability professional differentiate themselves as more capable at analysis & recommendations than her peers? Or to reverse it, how does someone looking to hire a usability team pick among the many, many boutique or large consulting firms out there claiming they can analyze user behavior, suggest changes, and ultimately improve the usability of their products?

Throughout this entry I’ve used words or phrases like “having a ‘feel’ for information design,” “instinct,” “leaps of imagination,” or making recommendations that are just “off.” These aren’t exactly things one measures or evaluates objectively.

However, one way to gauge whether a usability consultant has the right “instincts’ for interpreting data and making effective recommendations that heal a sick UI is to look at verifiable outcomes.

My former business partner Kathryn Summers and I once handed over a re-designed website to a group of researchers at the University of Baltimore so they could conduct a double-blind performance comparison between our new version and the live site. We had re-designed the site by observing user behaviors, deciding each time “what was real,” and then making leaps of imagination about what specific changes to the UI might improve outcomes.

Across eight key tasks selected by the site owners, the new site measured a 52% improvement in success rate, a 164% reduction in time-on-task, and a 23% improvement in subjective satisfaction (2005, Summers). The “proof” wasn’t only that iterative usability testing is an effective method, it was also that the usability consultants involved in the project had the “right stuff” when it came to observing user problems in the lab, and translating that observation into valid and helpful revisions to the interface.

My current role conducting usability in an e-commerce retail environment has given me what most usability pros crave – a clear connection between our recommendations and measurable outcomes.

We recently worked with a major health performance retailer who had launched a new UI that was resulting in reduced sales. We observed user behaviors in the lab, and then made some recommendations. The recommendations resulted in improved sales – a verifiable outcome.

Last year, we did a usability study of a major fashion denim retailer. Again, we observed some user behaviors in the lab, and then made some recommendations. After adopting those recommendations the jean-maker found that they could measure verifiable improvements in outcomes.

Finally, we did a usability study of a major fashion design house’s retail website. They had redesigned their checkout process and were not seeing the completion rates they wanted. We brought the site into the lab, observed user behaviors, and made some recommendations. After adopting those recommendations the fashion brand measured fewer users abandoning their carts and more completing their purchases.

If you want these kinds of results – one way to improve the quality of your usability analysis is to get some experience creating UI. Usability professionals are more like editors than writers. But spending some time at the writing desk will help when it comes time to edit someone else’s work.

Whether as a UI designer, an information architect, or as a visual designer, getting some experience in the hot seat of having to create, will help you when you’re doing usability analysis and trying to decide how to come up with a recommendation that will solve a problem.

Another way to improve the quality of usability analysis is to watch many many hours of user behavior, observe the recommendations and changes made to UI that are intended to overcome problems, and see the outcomes. If you can watch others conduct iterative rounds of user observation, UI modification, and measure outcomes -- you can develop much improved instincts for what really works.

As they say in the investment business, past performance is no guarantee of future results. However, I’ve found that those who have real-world experience, either experimenting themselves, or watching others go through the process of observing, modifying, measuring – over and over – are far more likely to make useful and valid conclusions about usability in the future.

If you are just starting out, and don’t yet have the opportunity to watch the whole process over and over, or haven’t developed that much experience – there are a few strategies that can help you do better usability analysis.

Some practitioners only take notes “real-time” and then walk away with some written text against which they try to think up hasty recommendations. But during testing, there are often many things competing for a usability researcher’s attention, such as clients, other observers, and multiple portions of an interface. Like it or not, when we’re in the moment, we have a “filter” turned on in our brain that is constantly choosing what to pay attention to, and what to tune out.

However, video capture technologies such as Techsmith’s MORAE or Tobii’s Studio don’t have the same kind of filter as your brain. Their greatest weakness is that they can’t analyze a UI or make recommendations. But their greatest strength is that they are capture devices. They get it all.

Sometimes ideas about how to solve a UI problem can be sparked by watching a particular user interaction a second, or even a third time. Stepping away from the lab, sitting in a quiet place, and giving your brain a chance to watch what users do and think carefully about what you’re seeing can result in ideas that are more informed and nuanced than the obvious stuff your brain has time to come up with “live” in the lab.

Admittedly, it can be tough in today’s hectic UI development environments – with budget and time pressures – to have the time to review footage from testing. But even a partial review of especially important or perplexing interactions will result in 1) better understanding of what is wrong, and 2) more creative ideas about how to fix it.

Whether you are trying to do usability analysis yourself, or you are hiring someone to do it for you – don’t skimp. In today’s world of small sample sizes, good analysis is subjective. Use these guidelines to help:

Be independent. Make sure the person doing the analysis is objective. They should have no relationship, financial or otherwise with those who created the designs.

Experience matters. The person doing the analysis should have watched many hundreds of hours of user behaviors, not only on UI similar to what is tested, but on a wide range of interaction designs with a wide range of users including seniors, children, and those whose first language isn’t English.

Watch the tapes. Or ask if the person you’re hiring watches the tapes as part of their analysis process.

Be a “creator” not just an editor of someone else’s creations. Spend time designing, laying out pages, doing information design, technical writing, information architecture etc. Or make sure the person you hire has done more than edited the work of others.

Usability analysis is hard. The wrong conclusions and recommendations can result in the UI being worse than it was when you started. But experience, careful footage review, independence, and creativity can result in the kinds of user performance improvements referenced above, and ultimately deliver on the promise of usability testing – a more usable UI.