UserResearch.comtag:typepad.com,2003:weblog-780931812762969722010-05-02T10:34:36-07:00by Michael Summers, michael@UserResearch.comTypePadUsability Testing: Critical Success Factor #3, Effective Usability Analysistag:typepad.com,2003:post-6a01157155a85f970c01348050a9ea970c2010-05-02T10:34:36-07:002010-05-02T10:37:22-07:00Analysis of usability data is the hardest step in the process because it is easy to jump to the wrong conclusions. Over the last decade, I have sat with UI teams as we observed thousands of hours of user behaviors with websites and software. Countless times I have overheard product managers, UI designers, marketers, information architects, and even other usability consultants – some of them very well known and considered experienced – drawing conclusions about the UI implications of user behaviors that are just plain wrong. The problem has to do with what I mean by “wrong.” As mentioned above...Michael Summers

Analysis of usability data is the hardest step in the process because it is easy to jump to the wrong conclusions.

Over the last decade, I have sat with UI teams as we observed thousands of hours of user behaviors with websites and software. Countless times I have overheard product managers, UI designers, marketers, information architects, and even other usability consultants – some of them very well known and considered experienced – drawing conclusions about the UI implications of user behaviors that are just plain wrong.

The problem has to do with what I mean by “wrong.” As mentioned above in the entry about testing with more than 5 users, there is much misunderstanding in the industry about methods. Years ago, Jakob Nielsen and Tom Landauer published a paper that outlined a mathematical proof, under controlled conditions, suggesting that testing with a small number of users can result in as valid insights as much larger and carefully controlled quantitative tests.

The methods Jakob made famous, testing with very small numbers of users, were adopted by the broader industry of design agencies, technical consultants, information architects, and UI designers in organizations large and small. Overall, the results have been positive. More organizations are doing usability testing, and they have become convinced that rapid tests with few users can still help them improve their products -- without spending tons of money. As Jakob claimed in that original 1993 paper, the real insight was that testing with zero users results in zero insights (Nielsen, Jakob, and Landauer, Thomas K.: "A mathematical model of the finding of usability problems," Proceedings of ACM INTERCHI '93 Conference {Amsterdam, The Netherlands, 24-29 April 1993}, pp. 206-213.). He was right then, and he’s right today.

However, one problem with these rapid 5-10 user tests is that such a small N= places much higher responsibility on the investigator to make judgment calls and put observed user behaviors into a larger context before jumping to conclusions about what to do with the interface.

Watching a mere 5-10 users interacting with a UI is very much a qualitative study. You know a behavior or problem occurred, but you don’t have any objective, quantitative measurement of how bad the problem is.

If you had observed, say, 500 users, with a broad range of computer experience, literacy levels, education, domain knowledge, gender, age, and ethno-cultural background attempt to use the interface -- and had determined in advance clear definitions for task success, time-on-task, error rate, and satisfaction measures – you could do a much more accurate job of measuring “how bad” an individual UI problem is.

But in today’s world of tight budgets, when we do these tiny tests, investigators walk away with two high stakes judgment calls to make:

what is a “real” behavior and correlating UI problem, versus a minor issue that can be safely ignored, and

what is the right recommendation that will fix the problem?

The critical success factors for usability testing outlined in entries above -- watching real users, observing real user-directed tasks, testing at the right point in the process, and even communicating results effectively -- can all be accomplished by following careful checklists, or relying on the instruction of others.

However, there is no textbook, classroom presentation, degree, or even experience that will make someone able to do effective usability analysis, or help them formulate effective recommendations.

Good usability testing is about how you interpret your data in order to fix problems and improve websites. And lots of the most important interpretation of data is subjective.

I like how Jeffrey Rubin described it in his original (1994) “Handbook of Usability Testing.” He made a comparison between usability professionals and medical doctors. Any doctor can take a temperature, measure heartrate, blood pressure, etc. But as he explained: “interpreting those numbers and recommending the appropriate course of action for a specific patient is the true value of the physician.” And in the case of healing a sick UI, the process is only half science. It is also an art.

I first learned this lesson back in the late 90’s in New York City when I was Director of User Research at Scient. I hired some university researchers to consult at the height of the dot-com boom. Some of them had extremely sophisticated methods for analyzing UI. However, much of what they produced and concluded was of limited practical usefulness to our UI design teams. Their reports were pages and pages of measurements and calculations – but as I found out while Scient was re-designing the Major League Baseball website – calculations didn’t help advanced baseball fans figure out how to use the website to do custom pitcher matchups, or create a custom jersey without getting confused.

And that was because some of those researchers had no feel for information design, and their instincts for how to prioritize observed behaviors or solve real usability problems were just “off.”

Much of the usability profession gets mileage talking about how their purpose is to check “bad” designers. And I’ve certainly seen plenty of design that shows minimal understanding about how a broad consumer audience searches for information on a page, or attempts a task.

However, as much as I’m concerned about designers being “bad” at usability, I’m just as concerned about usability geeks being “bad” at design.

Because after all, when we watch real users doing real, unscripted tasks – the leaps of imagination and insight that we use to 1) decide what the real problem is, and 2) propose a solution – have as much to do with design, as they do with usability.

Usability is an outcome.

Creation. Design. These are some of the actions we take to obtain that outcome.

Which brings us full circle to the issue I raised in the beginning of this entry – watching user behaviors with a group of colleagues, and hearing them make conclusions that are “wrong.”

Jared Spool has complained (Spool, 2007, Journal of Usability Studies, Vol. 2, Issue 4, August, pp 155-161) about how subjective and lacking in standards professional User Experience practice has become – going so far as to suggest that any supposed “methodology” we claim is akin to Aesop’s fable of stone soup.

Spool is partially right. So how then does one define what is right or wrong? Or how does a usability professional differentiate themselves as more capable at analysis & recommendations than her peers? Or to reverse it, how does someone looking to hire a usability team pick among the many, many boutique or large consulting firms out there claiming they can analyze user behavior, suggest changes, and ultimately improve the usability of their products?

Throughout this entry I’ve used words or phrases like “having a ‘feel’ for information design,” “instinct,” “leaps of imagination,” or making recommendations that are just “off.” These aren’t exactly things one measures or evaluates objectively.

However, one way to gauge whether a usability consultant has the right “instincts’ for interpreting data and making effective recommendations that heal a sick UI is to look at verifiable outcomes.

My former business partner Kathryn Summers and I once handed over a re-designed website to a group of researchers at the University of Baltimore so they could conduct a double-blind performance comparison between our new version and the live site. We had re-designed the site by observing user behaviors, deciding each time “what was real,” and then making leaps of imagination about what specific changes to the UI might improve outcomes.

Across eight key tasks selected by the site owners, the new site measured a 52% improvement in success rate, a 164% reduction in time-on-task, and a 23% improvement in subjective satisfaction (2005, Summers). The “proof” wasn’t only that iterative usability testing is an effective method, it was also that the usability consultants involved in the project had the “right stuff” when it came to observing user problems in the lab, and translating that observation into valid and helpful revisions to the interface.

My current role conducting usability in an e-commerce retail environment has given me what most usability pros crave – a clear connection between our recommendations and measurable outcomes.

We recently worked with a major health performance retailer who had launched a new UI that was resulting in reduced sales. We observed user behaviors in the lab, and then made some recommendations. The recommendations resulted in improved sales – a verifiable outcome.

Last year, we did a usability study of a major fashion denim retailer. Again, we observed some user behaviors in the lab, and then made some recommendations. After adopting those recommendations the jean-maker found that they could measure verifiable improvements in outcomes.

Finally, we did a usability study of a major fashion design house’s retail website. They had redesigned their checkout process and were not seeing the completion rates they wanted. We brought the site into the lab, observed user behaviors, and made some recommendations. After adopting those recommendations the fashion brand measured fewer users abandoning their carts and more completing their purchases.

If you want these kinds of results – one way to improve the quality of your usability analysis is to get some experience creating UI. Usability professionals are more like editors than writers. But spending some time at the writing desk will help when it comes time to edit someone else’s work.

Whether as a UI designer, an information architect, or as a visual designer, getting some experience in the hot seat of having to create, will help you when you’re doing usability analysis and trying to decide how to come up with a recommendation that will solve a problem.

Another way to improve the quality of usability analysis is to watch many many hours of user behavior, observe the recommendations and changes made to UI that are intended to overcome problems, and see the outcomes. If you can watch others conduct iterative rounds of user observation, UI modification, and measure outcomes -- you can develop much improved instincts for what really works.

As they say in the investment business, past performance is no guarantee of future results. However, I’ve found that those who have real-world experience, either experimenting themselves, or watching others go through the process of observing, modifying, measuring – over and over – are far more likely to make useful and valid conclusions about usability in the future.

If you are just starting out, and don’t yet have the opportunity to watch the whole process over and over, or haven’t developed that much experience – there are a few strategies that can help you do better usability analysis.

Some practitioners only take notes “real-time” and then walk away with some written text against which they try to think up hasty recommendations. But during testing, there are often many things competing for a usability researcher’s attention, such as clients, other observers, and multiple portions of an interface. Like it or not, when we’re in the moment, we have a “filter” turned on in our brain that is constantly choosing what to pay attention to, and what to tune out.

However, video capture technologies such as Techsmith’s MORAE or Tobii’s Studio don’t have the same kind of filter as your brain. Their greatest weakness is that they can’t analyze a UI or make recommendations. But their greatest strength is that they are capture devices. They get it all.

Sometimes ideas about how to solve a UI problem can be sparked by watching a particular user interaction a second, or even a third time. Stepping away from the lab, sitting in a quiet place, and giving your brain a chance to watch what users do and think carefully about what you’re seeing can result in ideas that are more informed and nuanced than the obvious stuff your brain has time to come up with “live” in the lab.

Admittedly, it can be tough in today’s hectic UI development environments – with budget and time pressures – to have the time to review footage from testing. But even a partial review of especially important or perplexing interactions will result in 1) better understanding of what is wrong, and 2) more creative ideas about how to fix it.

Whether you are trying to do usability analysis yourself, or you are hiring someone to do it for you – don’t skimp. In today’s world of small sample sizes, good analysis is subjective. Use these guidelines to help:

Be independent. Make sure the person doing the analysis is objective. They should have no relationship, financial or otherwise with those who created the designs.

Experience matters. The person doing the analysis should have watched many hundreds of hours of user behaviors, not only on UI similar to what is tested, but on a wide range of interaction designs with a wide range of users including seniors, children, and those whose first language isn’t English.

Watch the tapes. Or ask if the person you’re hiring watches the tapes as part of their analysis process.

Be a “creator” not just an editor of someone else’s creations. Spend time designing, laying out pages, doing information design, technical writing, information architecture etc. Or make sure the person you hire has done more than edited the work of others.

Usability analysis is hard. The wrong conclusions and recommendations can result in the UI being worse than it was when you started. But experience, careful footage review, independence, and creativity can result in the kinds of user performance improvements referenced above, and ultimately deliver on the promise of usability testing – a more usable UI.

Usability Testing: Critical Success Factor #2, Observe "Real" Behavior that is User-Directed, NOT Moderator-Contrivedtag:typepad.com,2003:post-6a01157155a85f970c0120a5c061ee970c2009-09-13T09:44:12-07:002010-06-04T05:16:06-07:00Watching real behavior is a critical success factor for effective usability evaluation. Instead of tightly scripted, moderator-contrived tasks (sitting close to the user and breathing down their neck), assigning a broad goal and letting the user "do their thing" is more likely to uncover unexpected problems and give us confidence as to whether the design really works. Leaving the room can often help users relax and start "doing," instead of merely talking about doing.Michael Summers

The next critical success factor for successful usability testing is to observe real behavior.

Leading tasks, highly scripted sessions that place page or layout elements out of context, and lots of conversation with the moderator, are some of the more common mistakes I see teams making when they think they're evaluating the usability of the an interface objectively.

Instead, what they're doing is taking users on a guided tour of an interface, and in the worst cases even trying to convince the user their design is good. Even when they're doing a decent job being objective, those who use highly scripted, interview-based "usability" sessions, are at best gathering small sample qualitative preference data.

There's a big difference between preference data and behavioral data. Some may have heard a good illustration from NN/g's Kara Pernice. She uses the example of a cappuccino machine. Imagine you are standing in front of a cappuccino machine. It is brand new and the box and manual were thrown away. If you wanted to know if the design of the cappuccino machine was usable -- you could assign 20-30 people who had never seen or used it before to walk up and make themselves a cappuccino. If you did this in different regions of the country, or even around the world, you would find there is little regional variation in that behavioral data. You could have confidence that despite your small-sample qualitative methods, you had identified any usability challenges.

However, if your goal really were to find out what flavor of cappuccino people liked -- this one-on-one qualitative method would be all wrong for that kind of preference data. For one thing you would quickly discover that regional variations across the country, and throughout the world would become very important. You would also discover that 20-30 people recruited via a recruiting agency database are not a statistically valid "sample" of the broader population of cappuccino drinkers. You could over-react when 15 out of your 30 people told you they liked peanut-butter flavored cappuccino, and your boss would be angry when you went to market trumpeting a new Skippy flavored brew that didn't sell well.

The tricky part of "user testing" -- is that it is often funded by the marketing group, or other product managers who aren't exclusively interested in the time-on-task, error rate, or learnability of a cappuccino interface. Empathizing with their need to be reassured about how well the cappuccino machine is going to sell will be important to communicating with them effectively. But if the team is really focused on improving the product and repairing any usability flaws -- you'll need to educate them about methods.

In my current role with an e-commerce retail focus, I regularly have clients come to me with "comps" for a new product page layout, or homepage. They tell me they want to do a "usability test" of the new page, or see if their new images work well and contribute to a purchase decision. There are several "usability" consultancies out there that are happy to take the clients' money, and using the clients' very stilted "usability" test script, bring in a mere 5-10 users, and proceed to "lead the witness."

These consultancies will call it a listening session, or usability group, or whatever the moniker, but by plonking users in front of a single page, outside the context of a realistic behavior (in this case making a real purchase on the wider Web), pointing to a new element and asking users to talk about whether they like it, or if it is helpful, or to describe its usability, they're unfortunately not learning much that is valid or reliable.

The marketing or product leads will sit behind the mirror and furiously scribble down comments both positive and negative, but as with the cappuccino machine preference example, they're using a flawed methodology that has serious potential to not only fail to uncover real behavioral usability problems, but mislead researchers and teams into thinking users prefer one interface element over another.

So how do you avoid this problem?

You do it by watching real behavior that is as non-leading as possible. There are always going to be test effects and distortions caused by the fact that we most often study users outside the normal environment of their home or office, on a computer or browser they are not familiar with, in a situation where they know they are being watched, with a sometimes learned motivation to speak in the animated and adjective-laden style that they think will get them invited back to another focus group to make another 100 bucks, etc.

The potential sources of variance for lab-based testing are well known and well-documented. So I try not to pretend to myself, or to my clients, that the lab doesn't impact what we see. But by following some simple rules we can limit those effects as much as possible.

For starters, I try to set an overall goal for users, and then leave the room. It's a bit difficult for users to verbally describe behavior (instead of actually doing things and trying out the design), if there is no one else in the room to talk to. Second, if I'm interested in a particular part of an interface, such as a new informational element on the homepage of a pharmaceutical company's homepage, or a new larger, interactive, zoomable image module on a retailers product page, I'm much better off if I can observe user interactions with that element that are natural and un-scripted.

As I'm currently in the retail e-commerce space, I insist that users have the broad goal of making a purchase. While I do have to limit them to one particular website (broader studies of them purchasing, say, a pair of pants without any limit to where they can go would have obvious strengths in terms of learning user behavior patterns with search engines, comparison behaviors between sites, etc.), I don't sit next to them and tell them which pages to click, or stop and point out elements of the interface and say "ooh, what do you think of that? Do you like it? How much do you like it on a scale of say 1-10?"

Whenever possible, I like to see users interact with fully clickable, functional prototypes or live sites. Again, in an e-commerce context I like them to be using their own credit card, making selections and actually purchasing such that they know this stuff is actually going to get shipped through the mail and arrive on their doorstep.

You'd be amazed at the difference in behavior between a user who is "pretending" to shop, and one that knows this item they're evaluating will either have to be used, worn, or shipped back via the hassle of a return.

After only a few minutes, I find that users forget I'm even on the other side of the mirror or watching on a dual screen monitor.

As a result, when 20-30 or so users arrive on the homepage the team wanted tested, or land on a product page with the new image "zoom" functionality, I get to see 1) what other elements of the page or overall site they use to solve their problems, 2) at what point in their process they do interact with the new element, and 3) for how long.

Because we use eyetracking technology, I'm able to watch their eyegaze in real-time from behind the mirror and understand intra-page navigation.

Now, dear reader, I suspect you're going to ask -- what if they don't interact with my new interface element, the new whiz-bang thing that the person paying for the study is so desperately wanting feedback on?

Well, sometimes that does happen. And that of course is instructional in and of itself. If 30 folks come in with the goal of making a real purchase, and zero of them use the new zoom feature (that is supposed to help them choose between products), that should give the team pause. But despite my commitment to natural, non-scripted user behaviors, I'm of course a big fan of the good old-fashioned debrief. After we've observed a natural purchase, we then transition to having users talk us through what they've done -- we've already seen the natural behavior so we don't risk altering or influencing what they do by asking follow-up or probing questions.

And if, during the natural user-guided portion of the session the user didn't interact with an important element, I'm happy to assigned a moderator-contrived task during the debrief, or "prompt" them to notice something and try interacting with it so I can seek feedback. Although I know I'm leading the user at that point -- I at least am able to place their comments and behaviors in the context of their more natural behavior which I have just observed. Again, I am likely to get some preference data, but at least it's not all I'm getting out of the study.

So to sum up, watching real behavior is a critical success factor for effective usability evaluation. Instead of tightly scripted, moderator-contrived tasks (with me sitting close to the user and breathing down their neck), assigning a broad goal and letting the user "do their thing" is more likely to uncover unexpected problems and give us confidence as to whether the design really works. Leaving the room can often help users relax and start "doing," instead of merely talking about doing. As Jakob Nielsen has said, what users do, versus what users say they do, can be very different things.

Usability Testing: Critical Success Factor #1, Test with Real Users, Test with More than 5 Userstag:typepad.com,2003:post-6a01157155a85f970c0120a51bb486970b2009-08-25T07:59:02-07:002017-03-31T12:00:14-07:00Usability testing has become one of those flexible words that teams have stretched to include almost any activity that purports to assess usability. There may be users involved; there may not. There may be behaviors observed; there may not. I've seen focus groups masquerading as usability testing. I've seen ethnography masquerading as usability testing. I've seen phone interviews masquerading as usability testing. And worst of all, I've seen "expert opinions" masquerading as usability testing. While those of us in the business are gratified by how mainstream usability has become, we also get concerned when we see a wide variety of...Michael Summers

Usability testing has become one of those flexible words that teams have stretched to include almost any activity that purports to assess usability. There may be users involved; there may not. There may be behaviors observed; there may not. I've seen focus groups masquerading as usability testing. I've seen ethnography masquerading as usability testing. I've seen phone interviews masquerading as usability testing. And worst of all, I've seen "expert opinions" masquerading as usability testing.

While those of us in the business are gratified by how mainstream usability has become, we also get concerned when we see a wide variety of methods being used by inexperienced teams. In some cases, the wrong methods are applied in the wrong situations, and the result can be faulty conclusions and bad design decisions.

One of the most quoted phrases I hear from non-practitioners dates to a 1993 paper originally written by Tom Landauer and Jakob Nielsen. Intended as a mathematical means of illustrating the diminishing ROI of testing with lots and lots of users, most who refer to the paper only display the graph from Jakob Nielsen's original 2000 blog post.

The problem with this, is that most people merely look at the graph, hear that the paper was presented at some fancy-pants ACM CHI conference in Amsterdam, and walk away sure in the knowledge that they have discovered ultimate methodological truth. They never read the paper themselves - or even bother to visit the summary Jakob published on his website in 2000 - something a simple Google search could reveal in seconds.

This is a problem, because the original graph is most meaningful for a homogenous group of users. Obviously some fundamental usability issues, such as navigation bar confusion, can be discovered by users with very distinct differences, such as men and women, or seniors and tweens.

However, in many cases, I work with clients who have user groups with very distinct differences in terms of domain knowledge, familiarity with the web, familiarity with the sponsoring organization, etc.

In those cases, I strongly encourage teams to test with 6-8 users from each distinct group. Before I'm ready to get up in front of a bunch of stakeholders who are spending hundreds of thousands of dollars on a website and genuflect and pronounce the site "usable" or "unusable" I like to see 25-30 users.

Along with not testing with enough people, the other common mistake I see inexperienced teams make is that they don't test with real users.

Only one in 4 American adults has a college degree (Source: U.S. Department of Commerce, Economics and Statistics Administration, 2007 American Community Survey, Educational Attainment in the United State).

Nearly 60% of US households defined as a family of four make less than $50k in combined household income. (source: Source: U.S. Census Bureau, Census 2000 Summary File 3, Matrices P52, P53, P54, P79, P80, P81, PCT38, PCT40, and PCT41).

A full 20% of US adults reads at a 5th grade level or below, and the median reading level of US adults is 8th grade (source: 2003 National Assessment of Adult Literacy, NAAL).

Despite these facts, I frequently work with teams who recommend that they invite other employees from within the same company to come over and try out their design so they can conclude it is usable. Only slightly better are the suggestions to "test" a website or product with "friends and family."

Sociologists have demonstrated that most people's circle of friendships don't deviate from their own narrow range of education, income, or ethnicity.

So if you want to say that you're conducting usability testing, be sure to use broader recruiting methods, preferably by hiring a professional recruiting agency, especially if you're trying to assess a product or website that has a broad consumer audience.

Granted, if you're working on accounting software that is only used by CPAs, then you can probably get away with working with a group of 5 CPAs provided by the site sponsor as a user group. But if you're intending to assess a site with a broader consumer audience - get out of your office, away from your neighborhood, and test with a decent group of "real" users. Try 6-8 of each distinct group. You'll be glad you did.

Understanding userstag:typepad.com,2003:post-6a01157155a85f970c0120a5177e25970b2009-08-24T08:18:49-07:002009-10-02T05:57:50-07:00In a maturing information-based economy, usability has become a differentiating factor that consumers consider when making purchase decisions.Michael Summers

In a maturing information-based economy, usability has become a differentiating factor that consumers consider when making purchase decisions.

A web search about a potential new electronics purchase, such as a digital camera, blu-ray player, or television can uncover numerous reviews that include commentary not just on the physical specs (megapixels, screen resolution, etc.), but also on the ease of navigating the menu screens, usability, and overall user experience.

Those of us in the industry who get a view inside companies' process for creating products like electronics, software, or websites aren't surprised when their usability is criticized by reviewers.

This is because most organizations use the wrong methods to understand their users.

Although products like consumer electronics or software are used widely, the web has become so ubiquitous, and such a natural part of the fabric of getting things done in modern life, that methods used to understand web users deserve special attention.

Part of the reason is that unlike a digital camera or software, where a consumer buys first before actually experiencing usability, on the web, the consumer experiences usability first - and then makes the decision to

buy a product

Renew their insurance policy without calling the call center

Change their address on their own

Open an account with a new bank

Post a listing or classified to sell their car

Upload pictures and start an auction

Etc.

Is usability the only factor that contributes to the decision? Obviously not. I've watched users go to great lengths and overcome serious usability obstacles to accomplish something they're very passionate about, or to get a product at a significant discount.

However, in most cases, users have many options, and they will go elsewhere if they get the sense that a particular bank's websites are always going to be a disaster. And in the case of an impulse purchase, any added hassle or distraction caused by a tough-to-use website can easily interrupt or stall what was limited motivation anyways.

So the important question is: how do site owners avoid that outcome?

When bad web usability happens, part of the reason is that site creators use the wrong methods to understand users. Whether website creators have a formal "methodology" drawn up in a fancy diagram and posted on the wall or not, they are using an approach to understand what is going on.

One common method is to rely solely on analytics. While essential, analytics is inferential data that should not be your sole source of understanding users. With analytics you get the "what" (where did users enter the site, what pages did they visit, what links did they click), but you have to guess at the "why."

During a recent engagement, we observed users adding items to the cart of a major shoe retailer. A "mini-cart" deployed, but it retracted on its own without the user having to click a "CLOSE" button or otherwise consciously tell the system "OK, I've seen this; now make it go away." One outcome observed during that study was that users would then add the item to their cart multiple times, have trouble editing quantity, and then remove it altogether. Hardly the outcome the site owners were looking for.

The company had used analytics for years, but never knew that users were placing items in their cart without ever seeing the cart deploy and retract.

Had they relied solely on analytics and never done usability testing, they would never have known.

In the end, there is no substitute for watching actual users try out a product. Whether it is a printer, a digital camera, or a website, watching real people do real things with the product is key to discovering problems.