Large, Moderated & Remote: Tips for the Big User Test

Over the course of six weeks earlier this summer, my TBG teammates and I went on a journey. This adventure consisted of administering 50 one-on-one website usability tests to participants around the globe using a combination of WebEx and Morae Usability Software.

While a prototypical moderated user test for Web development consists of six to eight users, we found that the large user group supported a much richer qualitative study (almost an interview). This study went beyond providing the client with strict usability data; it helped paint a picture of the website, the brand identity, and a global view of the site’s content strategy.

A remote moderated user test is always an adventure (see Bahl and Fern’s recent article). Conducting a remote moderated user test internationally, with a user group of 50, was like crossing the Grand Canyon!

Here is a summary of our journey, broken out into a handy guide that lists lessons learned from before, during, and after the usability study.

Before

1. All good adventures begin with a leap into the unknown.

Hold, please! What’s ahead is unknown. All the more reason to prepare.

Have you conducted a user test in a lab? Yes? Great! Have you worked with participants of different ages, genders, and languages of origin? Absolutamente? Fantastico! Have you tested diverse media such as websites, games, and mobile apps? Indeed? Bonus!

In other words, divide the unknown into its components: remote; international; moderated; user test. Consider lessons learned from tasks completed that are like those components and build an action plan accordingly.

A wise explorer surveys where she’s going in relation to where she’s been.

Now leap.

2. Don’t go it alone.

I didn’t conduct a 50-person usability study, a team of project managers, Web strategists, and usability experts at TBG worked together to make it happen. At the outset, the team worked together tirelessly to write, test, and refine the test plan and moderator script. We also envisioned technical malfunctions that might occur during the test, and planned accordingly (see #1).

Of course, there were still unplanned technical hurdles during some tests, and the moderator script invariably needed tweaking. The team supported these needs once the test was under way, too, not only in their project roles, but with their unique aptitudes for those roles.

For example, our chief surveyor Maura H., created and maintained the schedule for dozens of participants in multiple time zones. Kristine W. was our Sam Gamgee (she’s going to love that one), solving technical challenges with finesse and keeping our spirits up despite the unavoidable exhaustion of conducting a large-scale user test. Maura and / or Kristine were also present in each session, using Morae Observer to collect valuable test data.

These and other team members contributed to the project in ways that I could not have, or certainly could not have done as well. We were ultimately successful because we didn’t go it alone and because we were a team of diverse roles, skills, and aptitudes.

3. Expect the unexpected.

Maura wisely allocated an additional two weeks to the end of the test session as a buffer in case participants would need to be rescheduled. This was prudent. For a remote test of 50, conducted via WebEx with participants in multiple countries, roughly 10 needed to be rescheduled and three were no shows. The two week buffer gave us ample time to reschedule those participants and find substitutes.

During

4. Improvise.

Specifically, be prepared to evolve the test script throughout the testing period. Despite copious preparation of the test script, a moderated user test will invariably elicit unexpected user responses that impact the test script.

In the early stages of the test, we learned a lot about the script and how it works with real users, in real time. This is a common testing experience with moderated user tests. We moderators find that some of the tasks need more descriptive language, while others need less.

A large-scale moderated test has additional parameters. For example, I found that having 50 subjects gave me the time to identify stable patterns in the data—and evolve those tasks so as to elicit richer qualitative feedback.

After the first 20 participants, I noticed eight overarching patterns in the test results. One of those patterns emerged in a task about the affordability of the search function. I expected users to breeze through it (they didn’t) and this became one of the most important findings of the test. While the task itself did not change, I was able to enrich the feedback by asking follow-up questions. These questions helped me understand a larger pattern in the data about task orientation.

5. Keep your eye on the prize.

Fifty hours of data is a lot of data. Consider that each test hour will require between two and four hours to analyze. This includes activities like parsing the quantitative findings, documenting these, and locating and documenting the qualitative data such as quotations or user flows. Compile as much of this daily as you can.

For example, a great benefit of a remote, moderated, user test is that the test subject cannot see the moderator; they are using their computer monitor to view the prototype. I used this test context as an opportunity to record the success rate of each task using paper notes. I originally thought of this as a protective redundancy in case there was an error with the technical data (which was being recorded via the Morae testing software). But it was far more helpful than that. I was able to use my hand-written notes following the test to rapidly record the success rate for each test into a spreadsheet. If I had time, I then parsed the technical data.

Heuristics like these are a lifesaver when it comes to producing the test report.

6. Prepare for exhaustion (no seriously, prepare for it).

Conducting a 50-person study is a complex task. It requires sustained attention over a period of two to three months, the capacity to interview diverse audiences several times a day, and the odd balancing act of scientific rigorousness and dexterity with test data.

The physical challenges, in particular, were a shock to me and the team. I am trained as a public speaker, and I have spoken to international audiences; nevertheless, it was a challenge to interview one to three individuals a day for six weeks. Not only was the physical act of speaking difficult, it was often discombobulating to reorient technically (the call quality was sometimes not the best), and to adjust my tone to the speaking styles, languages of origin, and Web expertise of the test subjects.

I recognized that the basic physical demands of the test would take a toll unless I did something. So I contacted a theater friend to learn more about voice acting. He recommended Be Heard the First Time by Susan D. Miller, which helped me better understand how to sit properly, breathe, and solve other physical constraints associated with the speech context of the user test.

After

7. Descend slowly.

Given the complexity of conducting the task, it’s crucial to approach the reporting phase as thoughtfully as you approached the planning stage. A study group of 50 elicits rich qualitative data (in addition to the quantitative data) and this type of data takes time to evaluate and report. I recommend a three-pronged approach to this stage: write preliminary findings, walk away from the data and do something else, and then return to the data and reporting (with ample time).

It is valuable to write a preliminary set of findings immediately following the conclusion of the test. This will benefit any stakeholder who is eager to learn any preliminary assessments. At this stage, it’s crucial to get some distance from the work. It was hard, complex, and a long haul. Time away from the test and test data following its conclusion will reinvigorate the reporting phase and help produce better research. I recommend one to two weeks. Finally, build in ample time to write the test report. For a study group of 50, and a 50 to 100 page report, three to four weeks is necessary to cull the data, organize it, write the report, and edit the report.

Conclusion

With the right preparation, conducting a large, moderated, and remote user test is a valuable research endeavor. It enriches the quantitative findings with nuanced qualitative data. This results in a more robust portrait of the user, the prototype, and the project goals moving forward. For these reasons, the big user test is an adventure well worth taking!