Do Usability Evaluators Do What We Think Usability Evaluators Do?

In this paper, I review the findings of ongoing research in usability and user experience analysis. In particular, I first discuss how real designers and usability evaluators in their own workplaces use findings from usability testing to drive design decisions within a decision-making space. Second, I investigate how designers and evaluators consciously or unconsciously alter raw usability findings when they develop their recommendations. Finally, I explore what these findings might mean for usability education. Ultimately, I ask if these usability evaluators and designers do what we think usability evaluators and designers should be doing.

Categories and Subject Descriptors

H.5.2 [Information Systems]: User-Centered Design

General Terms

Design, Human Factors

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

INTRODUCTION

Since the initial call for design based on the needs of the users rather than the anticipations of the designer in the early 1980s, designers have struggled with how to maintain “focus upon the characteristics and needs of the intended user population” throughout the designing process [1]. Many solutions to achieve a User-Centered Design (UCD) process have been proffered—including human-centered design, participatory design, contextual design, activity-centered design, and persona-driven design, among many others. Despite the prominence of UCD in the design and technical communication fields, much research on UCD is prescriptive, in that it tells how UCD should be done. Rarely have researchers taken a descriptive look at how designers and technical communicators actually do the tasks associated with UCD and how data collected from UCD processes is used in the designing process. In other words, little research has been done that determines if designers truly do what the prescriptions say they should be doing.

My ongoing research involves empirically and ethnographically investigating how designers follow and deviate from the standard UCD prescriptions. In this paper, I look specifically at the role of the usability evaluator in the assessment of data and in the implementation of data in the decision-making processes. I will discuss usability findings as a persuasive technique in decision-making sessions and the reporting of findings as evidenced in previous studies and in current research. I will also discuss the potential effect these studies have on the future of usability education.

USABILITY FINDINGS AS PERSUASION

The value of usability has long been lauded for its centralized focus on the reader, its potential for financial savings, and its ability to promote a successful design experience. But, in some ways, the goal of using usability findings to inform design has given way to the notion that simply the act of doing usability improves design. However, only after the data is collected, analyzed, and disseminated can it be used to influence design. But generally little is known about how data is processed and how that data is used to influence design. When designers have usability data, do they use it to influence decision-making? If so, how?

Use of Usability Findings by Novices

In order to address these question, in a previous study, I observed and recorded the bi-weekly team meeting of a group of novice designers for one year [2]. These novice designers were graduate students, but the project I observed was outside the classroom. These graduate students were paid but did not receive course credit for their work. Over the course of the year, I observed 25 student designers (along with their full-time project manager and assistant project manager) as they worked to revise documents for the United States Postal Service. The general purpose of the meetings I attended was to learn what each sub-group had completed in the interim, to come to a consensus on a way forward, and to decide what was to be done prior to the next meeting. The reports on usability testing were generally presented early in the meeting and took up 15-45% of the meetings’ agendas.

In their contract with the USPS, 35% of design team’s time was to be dedicated to researching user behavior and testing document prototypes via think-aloud protocols. The usability testing of the documents was conducted by the designers themselves, and such “do-it-yourself” usability testing has been praised by Krug and others as having the potential to improve design due to the designers intimate knowledge of the prototype and the users [3].

I assessed the data collected from this longitudinal investigation to determine how these novice designers used data collected in their usability sessions to inform the direction of their design. I found that although these designers claimed to be advocates of UCD, many different types of evidence were used to support their design decisions, including appeals to user data, authorities, storytelling, and designer opinion. I found that although 35% of the time and budget of the project was dedicated to user centered design processes, only 12% of the appeals used in the decision-making sessions actually invoked user data. Appeals that seemingly referenced only the designer’s own opinion were far more common at 20% of the appeals.

Therefore, despite claiming an adherence to UCD, the novices routinely argued for design decisions based not on the large amounts of usability data that they had collected, but instead based on their own intuition. According to Landauer, designers’ “intuitions about what will make a system useful and useable for the people who will use it are, on average, poor” [4]. Interestingly, the novice designers relied on their own opinions even when usability data that made the same point was readily available. Thus, it appears that for this group, evidence of opinion, and not evidence of data, was perceived as more persuasive. Therefore, the novice designers relied, perhaps strategically, on putting forth their own opinions for persuasive means.

Additionally, the final versions of the documents created by the novice designers were very well received by both the stakeholders and by the public, and the documents themselves won international awards by the Society for Technical Communication. Thus, it appears that using user-derived data to argue for or against a particular design decision is not necessary in order to create successful documents.

Use of Usability Findings by Professionals

It may be alluring to think that the reason the novice designers relied so heavily on personal opinion and shied away from data was due to the novice designers’ lack of experience and professional savvy in both the arena of usability testing and in the decision-making space. However, the initial findings from a subsequent study of professional designers shows that the patterns of evidence used by design professionals is quite similar to that of novice designers.

In order to determine how professional designers made decisions, I observed and recorded the meetings of a group of professional designers at a top tier design firm as they were mulling potential solutions for a client’s proprietary medical device. I embedded with this team for a week and observed over 40 hours of workplace interactions and recorded 17 hours of discussions over nine meetings. This group was highly dedicated to persona-driven design.

In the weeks prior to my arrival, the team had done several weeks’ worth of field work, which consisted of conducting interviews with stakeholders and observing how various people interacted with the current version of the device. Based on those findings, the designers developed 8 personas to guide them as they prototyped. The five members of team had varying amounts of design experience, ranging from two years to two decades.

While thorough analysis of this data is still ongoing, preliminary analyses suggest that the kinds of evidence employed by this team of professional designers were similar to the kinds of evidence employed by the novice designers—with appeals to user data, authorities, storytelling, and designer opinion being used with great regularity. The only major difference in the kinds of evidence resided in the appeals through storytelling. While the novice designers told hypothetical stories to support their evidence (“We don’t want to use that typeface because we don’t want to offend our readers”), the professional designers told anecdotal stories to support their evidence (“I remember working on this project where we were supposed to put everything in reverse type and everyone hated it”).

In addition to using similar kinds of appeals in the decision-making sessions, the breakdown of the types of appeals used for persuasive measures appears, at least in preliminary assessment, to mimic the usage findings of the novice designers. Indeed, the experienced professional designers appear to invoke their data-driven personas less often in the decision-making meetings than the novice designers invoking their usability data. Furthermore, the professional designers appear to invoke their own personal opinion more often than the novice designers.

Like the novice designers, the professional designers also received excellent feedback on their prototypes and later a member of the project team informed me that the final artifact was well received by the stakeholders and users.

Discussion of the Use of Usability Findings by Novices and Professionals

If these two groups were strictly adhering to the principles of UCD, then every design decision would be able to be attributed to a particular data-point from usability testing or a particular aspect of a persona. However, it appears that less than 20% of the decisions made within these two sets of decision-making meetings incorporated user data. Despite the lack of user-data in the decision-making sessions, these groups were able to create documents and products that were usable and well-liked.

Therefore, it may be that a strict adherence to UCD which provides an “objective record of what users do under…various circumstances,” is perhaps not needed to create useful, usable, and desirable products [5]. However, I am not suggesting that UCD methods are a waste of time or altogether unnecessary. It may be that these team members share a deeply connected common ground that allows more formal aspects of argumentation to be omitted. In order to establish that common ground, these designers must take part in the testing or the persona creation so that the argumentation gaps that are left open are filled in appropriate manners.

Furthermore, it may be that some information gathered in testing or in persona creation may be tacit knowledge—knowledge that is highly individualized, highly complex, and difficult or impossible to articulate [6]. For these groups, it may be that it is difficult or impossible for those who conducted the testing or the interviews to accurately convey critical information to their colleagues, but they individually have the information at their disposal.

Finally, the act of UCD and usability may be useful for designers since, at the very least, gives the designers a sense of audience [7]. It may be that these novice and professional designers gain critical knowledge related to human behavior and design that will aid them in the creation of their artifacts, even if the designers themselves never utter the specific results of that research in the decision-making sessions.

Future Research Related to Usability Findings

These two studies present several questions that may inform future studies in the use of usability findings by designers. First, if these designers rely on sources of evidence other than usability or user-derived data (such as storytelling and their own opinion), are they truly operating within a UCD framework? I believe that these two groups of designers put forth a good faith effort to keep users at the forefront of their design; therefore, I am reticent to say that they are not conducting UCD. Nonetheless, significant decisions about each design were made without the benefit of user data. Is there a threshold at which decisions must be made in order for the designing process to be deemed UCD? Does the act of doing usability testing or developing personas inherently make a design process UCD, regardless of whether the data from the fieldwork is ever invoked? Is a re-evaluation of the definition of UCD in order?

Second, previous research has suggested that corporations develop better artifacts when UCD processes are observed. However, both groups were able to create artifacts that were well received by the stakeholders and the intended audience, despite using user data less than 15% of the time to drive design decisions. Do the outcomes of products and artifacts truly suffer if the designers rely on their opinions rather than on usability findings?

Finally, I suggested earlier that even though user data is not always or even often made explicit, value still exists in the implicit aspects of the user data. Additionally, it may be that competing values subordinate the importance of the adherence to UCD protocol. It may be that these groups place importance on expedience, and therefore sacrifice invokations of user data for speed [8]. Or, it may be that these groups value politeness (in Brown and Levinson’s definition) and fear that repetition of previously discussed user data may be viewed as impolite by their colleagues. Therefore, it may be while UCD is valued, other aspects of communication are valued more.

REPORTING OF USABILITY FINDINGS

Another aspect of usability that has gone under-researched is how usability evaluators analyze and report their findings from user-centered studies. Any given usability session can produce dozens of potential findings, yet usually not all of those findings are reported. How are certain findings given preference over other findings? How do evaluators select which findings are included and excluded? Do biases filter the findings in the transition from raw data to reported usability finding?

Novice Evaluators Reporting of Usability Findings

In order to answer these questions, I compared the language used by usability study participants in a think-aloud testing session to the language used by the novice designers in their oral reports about those sessions [9]. The novice designers were the same novice designers from the previous study regarding the persuasion of usability findings in decision-making meetings.

In this discourse analysis study, I found that only about a quarter of all potential findings were reported to the group. Additionally, of all the findings that were reported to the group, 16% had no basis in the usability testing. In other words, the designer might report that a participant “really liked the chart…said that it made him feel confident in the document,” yet there was no point in the usability session in which the participant mentioned confidence or anything positive about the chart. Additionally, of the 84% of the findings that did have some basis in the usability sessions, nearly a third seemed to inaccurately describe what happened in the session. For example, a designer reported that a participant liked a particular chart, which was, indeed, the initial statement of the participant. But later in the same session, the participant state, “Wow, it looks good, but now that I’m trying to, ya know [make a decision with it], it’s kinda confusing.”

Additionally, it appears that certain biases affect what the novice designers choose to report. These novice usability evaluators appeared to seek out confirmation for issues that they had identified prior to the usability test. Further, these usability evaluators never presented a finding to the group that ran counter to an opinion or a claim offered by the evaluator prior to the usability test, although they had ample opportunities to do so. These biases, be it intentional or unintentional, may indicate that despite evaluators best (and perhaps not-so-best) efforts to remain objective and critical during usability testing, subjectivity and selectivity creeps in as the evaluators transform the raw data into reports and recommendations.

Professional Evaluators Reporting of Usability Findings

Again, like the usage of usability data, it is tempting to say that the less than ideal analyses of the data can be attributed to the fact that the observed usability evaluators were novices, and that more experienced professionals assess data with a more objective and less biased eye. However, a very small pilot does not seem to necessarily support this notion.

For this preliminary study, I was given the captured video of a usability test conducted by two different evaluators at the same company. I was also given the usability statement each evaluator wrote for that test. (In this company, a brief one page statement was written after every test and then a comprehensive report was compiled after the round of testing was completed. Each round consisted of 4-5 tests.) After analyzing the results, I had a brief interview with each evaluator.

In this study, the evaluator with less experience (2 years) had 10% of the overall findings that seemingly had no basis in the session she conducted, and of the 90% that had some basis in the usability test, only 15% seemed to inaccurately describe what happened during the session. Both of these percentages indicate that she relied on the usability data in her reporting more than the novices. This evaluator made extensive use of time stamping in her report and, in her interview, indicated that she tried to not interpret the results too much until she was ready to aggregate the entire report.

The second evaluator, who had more experience (15 years), had strikingly different results. Approximately 40% of the overall findings seemingly had no basis in the session she conducted, and of the 60% that had some basis in the usability test, 15% seemed to inaccurately describe what happened during the session. The 40% of the reported findings that have no basis in the test far exceed that of her colleague and the novice evaluators. In the interview, the experienced evaluator indicated that her job is to find “the problems with the software,” and that she has “been doing this long enough to know what is going to be a problem even if they [the user in the usability session] don’t have a problem.” In other words, the experienced usability evaluator routinely includes her own anticipations into the usability report along with the findings stemming directly from the usability session.

Clearly, a bigger population is needed to determine if the findings from these sessions are commonplace or idiosyncratic.

Discussion and Future Research on Usability Reporting

Although more research is needed, these two studies indicate that mitigating factors affect how findings from usability sessions are reported. For the novices, it appears that they will highlight in their reports findings that support claims they personally had made previously, while they will omit entirely findings that run counter to claims they had made in a previous meeting. For at least one professional, personal experience caused the professional evaluator to infuse her report with perhaps valid notions, but ones that were outside the realm of the usability test that she conducted. These results indicate that making reports and recommendations from raw data is not a particularly simple task and that important aspects of the testing can be dropped and personal preference can be included.

Several potential lines of research come out of these studies. First, do the evaluators themselves detect any problem in their analyses or in the analyses of others? If the evaluators are unable to detect these problems, then it is not surprising that they use verifiable and unverifiable findings in their reports. Do the evaluators see themselves as objective reporters of data, or do they envision themselves in another way?

Second, what considerations do evaluators consciously take into account when reporting on a usability session? Do they discount data that they know the stakeholders or others on the project team would object to? Do they filter the results through other considerations, such as the participant was not entirely an ideal candidate?

Finally, what is the role of heuristic evaluation and expert review in usability testing reporting? The experienced professional I interviewed stated that she would report potential findings, even if the user she observed did not have a problem with aspect she was reporting on. In her role as an expert on usability, she felt comfortable and even compelled to report the issue in the subsequent usability statement. Nielsen and others have indicated that heuristic evaluation is a valid part of usability assessment [10]. But should usability evaluators make a distinction between findings that come from data and findings that come from experience? Is a usability report following user-based usability testing a proper place to put expert evaluations, especially when there is no demarcation between findings that are user-based and findings that are evaluator-created?

The Future of Usability Education

Both of these tracks of research lend themselves to future considerations as to how we, as academics and educators, teach user-centered design and usability testing to our students.

Best Practices of Analysis

Usability and user experience testing have become standard aspects of technical communication and design. Yet little is known about how evaluators analyze data for reporting purposes. Textbooks and professional guides tend to have a major focus on the acts and tasks of doing usability and user experience research with little instruction or guidance on how to practically analyze the result and incorporate those findings into the design solution. It is easy to say that these guides should include instruction for novice evaluators as to how to carefully interpret their findings into meaningful recommendations and contributions. However, Molich’s Comparative Usability Evaluation (CUE) studies have shown that most evaluators approach analysis from different angles and produce sometimes wildly differing usability findings [11]. With many different approaches to usability and user experience analysis, it is difficult to ascertain what might be an idealized way to assess the data, which also makes it is not surprising that these professional usability and user experience guides give relatively short shrift on how to do analysis.

Therefore, in order to provide our students with the most helpful and practical instruction on how to analyze usability data, more research needs to be conducted to determine how data is indeed analyzed in situ. By analyzing how all evaluators assess data, a better determination of what is useful in reports and what is irrelevant can be ascertained. Further, only once a set of best practices is determined can we begin to assess whether “good” usability analysis inherently leads to “good” outcomes, or whether outcome is unrelated to the quality of the data assessment.

Data-Centric Design

While most previous studies examine usability effectiveness through the success of outcomes, these studies have paid particular attention to language used by the evaluators in their usability sessions, their reports, and their decision-making sessions. The language in these studies has shown that designers and technical communicators use all kinds of appeals and evidence to support their design decisions, including appeals to authority, appeals through storytelling, and appeals to their own personal opinions. In the academy, we have traditionally taught our students to use data-driven user-centered design to drive their decisions. Yet, these “real world” ethnographic studies have found that user-derived data is but one of many resources to be used by the designers for persuasive ends. Are we doing our students a disservice by enforcing data-driven designs in the classroom, only to ship them into a professional environment where such a strict approach to design might be considered inflexible or even obstinate? Do our students have to go through a period of “re-learning” once they leave the classroom to adjust their expectations of what it means to be a professional?

Additionally, given that these studies show that personal opinion does have a place (whether warranted or not) in the decision-making space, should we be helping our students identify when opinion is useful and when it should be avoided? Since some prominent designers have openly criticized an over-reliance on data, including user-derived data, should we help our students cultivate their experiences into opinions in ways that allows them to present opinions in justified, rather than haphazard, manners [12]? Or, would allowing our students to present solutions based on opinion rather than on user-derived data, are we no longer teaching UCD? Should we teach the strictest interpretation of UCD (designs driven solely by user data), knowing that interpretation is likely to be diluted once the student enters into the profession?

CONCLUSIONS

This paper has reviewed ongoing research regarding how designers and usability evaluators assess and use findings from usability and user experience testing. These studies have found that designers routinely use appeals other than appeals to user-derived data to support their claims in decision-making sessions, and that evaluators allow mitigating factors to alter their usability recommendations. In other words, these studies suggest that, no, usability evaluators are not, in fact, doing what we think usability evaluators are supposed to be doing.

Therefore, we as a field must do more research on how evaluators truly behave in usability analysis settings in order to more fully inform our students and to provide them with the best practices of usability and user experience data assessment. While there may never be a truly definitive set of best practices, identifying that there are many different usability and user experience paths possible that can lead to successful design solutions may enable our students to be better prepared for the challenges they face as design and usability professionals.

Method Madness: A Usability Testing Pilot Research Case Study

This case study was created to analyze the methodology and procedures used during a pilot study on mobile usability and preferences conducted at a small Midwestern state college. The pilot study set forth to test features of the pre-redesign University of Wisconsin-Stout website as seen through the screen of a mobile device and then ascertain what students wanted to see in a redesigned version of the mobile interface.

The findings of the pilot study were less surprising to the researcher than the problems encountered during the research itself. Future researchers would be well advised to attend to passing trends in mobile technology, as well as avoiding limitations on sample size caused by choice of delivery method and choice of user pool.

INTRODUCTION

Today’s technology continues to blaze forward at lightning speed, and one of these fast-growing trends is mobile internet access. Smartphones and other devices have enabled people to do things that were unimaginable just a few years ago. Phones are being used for everything from texting, to searching the web, to making movies. The more uses that people find for these devices, the more integral they become. They have changed the way we think about society, community, and even education [12]. Studies show that as of 2010, ninety-two percent of undergraduate students are wireless users – whether through laptops or cell phones. Ninety-six percent of undergrads own a cell phone and sixty-three percent of them use their cell phones over the internet [11].

The amazing rise in the use of mobile devices has had an impact on campuses around the country, including the subject of this case study: the University of Wisconsin-Stout. The campus had recently undertaken a complete redesign of its website. At a presentation updating faculty and staff on the status of the redesign, the executive director of enrollment management stated that online mobile computing had risen 800% in the preceding year on campus. She also reported that updating the campus’ mobile presence was one of the next items on the redesign list.

Since the information included in a mobile presence is usually derived from a “parent” websites, mobile landing screens are a good example of the need to pare complex information down to the barest of bones. In order to stay simple and easy to use, web developers must distill all the ideas and tools from a website down into a limited number of the most important. Depending on the mobile interface design, there is room above the fold for either approximately twelve icons, or eight menu bars on a mobile interface.

In the instance of a mobile interface for the University of Wisconsin-Stout, 70,000 pages will need to have been pared down to a handful of the most important features that students will want to see first on a mobile landing screen. This is an incredible feat, and one not to be taken lightly. Surveying students to find out what they deem most important is paramount to having a successful mobile presence. This pilot study was a first step in the process.

This three-part pilot featured in this case study was created to gather information regarding student usability of the then-current University of Wisconsin-Stout website on mobile devices, and to ascertain students’ opinions on features, organization and layout possibilities for a future mobile interface based on the upcoming redesign.

First, the pilot study asked users to perform scenario-based tasks to test the usability of some of the functions of the current University of Wisconsin-Stout website before the redesign was undertaken. Second, the pilot set out to determine what features were most important to students when asked their opinions concerning a redesigned mobile interface for University of Wisconsin-Stout. Third, the pilot gave students the opportunity to examine and rank other universities’ mobile web presences to discover their preferences in regards to information organization and overall attractiveness of the interface.

Although the pilot study yielded answers to the research questions, the outcomes of the study itself are less important than the lessons learned during the execution of the experiment. This case study was undertaken because so many things went wrong during the pilot. An analysis of methodologies and procedures was warranted because the pilot could be further developed for actual use at the University of Wisconsin-Stout.

This case study reviews some of the literature that pertains to mobile usability testing and how it affects higher education, the pilot study methodology, the problems encountered during the pilot, and suggests alternate methodologies that could have made the pilot more relevant.

LITERATURE REVIEW

Mobile usability is not a new concept. Studies like those of Coursairs and Kim (2011) track 100 empirical studies of mobile usability all the way back to the year 2000. Although not as sophisticated as today’s devices, personal digital assistants (PDAs) met their qualifications for study of mobile devices. They set forth a plausible framework for future usability studies that take into consideration contexts of use as well as usability dimensions.

Kukulska’s (2007) work outlines many of the limitations of mobile technology and some of the current approaches researchers have used to overcome some of these obstacles including consideration for context and content adaptability.

Pawson and Greenberg (2009) shed light on a research approach called Extremely Rapid Usability Testing which uses the context of a trade show to test functionality for a company’s product. This method has potential for testing usability within educational contexts.

In their study of comparing usability testing in a lab to field testing, Kaikonnen et al. (2005) discovered that the same problems were identified by both sets of participants. In other words, context did not affect testers’ ability to perform the tasks asked for in the experiment.

Studies show that there is a growing desire for training using mobile devices. Norman (2011) lays out the benefits which include the ability to work on studies during downtime, enhanced content retention, convenience, and improved confidence in users.

O’Bryan et al., (2010) tell us that in order to design an effective mobile interface, designers must understand their audience and how mobile users employ their devices. However, according to El-Hussein and Cronje (2010), those put in charge of deciding on designs are often basing their ideas on what they experienced in an environment rather than what the users themselves need.

No study of usability would be complete without mention of Nielsen’s concept of discount usability testing, which is given a more current treatment in Ghanam and Maurer’s (2007) update on contextual uses for the methods.

METHODOLOGY AND PROCEDURE

Product Tested

The pilot study was designed to test features of the then-current University of Wisconsin-Stout website as seen through the screen of a mobile device. The product being tested was the web software; however, Apple iPhones and iPod Touches were the hardware through which the interface was tested. There are many different software systems being employed on the University of Wisconsin-Stout website. Knowing their compatibility for use on a mobile interface is a must as the university moves forward with new initiatives to keep up with evolving trends affecting student users.

Background and Objectives

As students come to rely more and more on mobile devices, the importance of examining student use and preference increases. For this study, students were asked to partake in a usability test of the then-current University of Wisconsin-Stout website – in its pre-redesign form – on iPhones or iPod Touch mobile devices.

Since the University of Wisconsin-Stout was redesigning its website during the time of the pilot study, and the redesign team stated that they were going to redesign the school’s mobile interface, the pilot was also designed to find out, hypothetically, what types of information would be most important for students to be able to access. In addition, the pilot set out to determine what type of interface University of Wisconsin-Stout students preferred in terms of information organization and aesthetic.

Target Users and Demographics

For the pilot study, the user pool was initially planned to be graduate students who owned iPhones or iPod Touches. Rather than provide users with a device they were unfamiliar with, they used their own devices, which eliminated the technology barriers inherent in testing on an unfamiliar device [8].

The reasoning behind the choice was the fact that, as an undergraduate laptop campus, the students who were more likely to use mobile interfaces would be graduate students who almost universally live off campus, and who were not furnished with laptop computers as a part of their tuition.

Methodology Overview

The pilot study being reviewed in this case study included a screener, a pre-test questionnaire and a post-test questionnaire; however, the main sections containing the questions to be studied were separated out into three different sections.

First, usability testing was conducted using an iPod Touch or iPhone mobile device, in order to learn about users’ experiences and perceptions using the then-current, pre-redesign University of Wisconsin-Stout website. This was actually the only portion of the experiment that truly adhered to the tenets of usability testing, although the results of the next two areas were more relevant for the future of the mobile interface at the institution.

Next, surveys were used to identify which features students wanted to have included on a new mobile version of the University of Wisconsin-Stout website. As previously mentioned, the site has tens of thousands of pages. Asking users what is most important to them is the first step in making the complex tangle of these pages into a usable interface that caters to the needs of the users, whether experienced or not.

Finally, participants were shown paper prototypes of five different universities’ mobile websites to pinpoint their organizational and aesthetic preferences in the context of a higher education mobile website interface. The five sample sites were chosen for their variety of layouts: some with icons, some with lists, etc. The purpose was to ensure student respondents had a wide enough variety of interfaces to choose from. In summary, the components used for the pilot study were comprised of:

Pre-test questionnaire administered at the beginning of the in-person test to determine device usage and experience.

Usability testing to determine usability of the pre-redesign University of Wisconsin-Stout website as accessed by an iPhone or iPod Touch, and establish the need for improvements when the site goes mobile.

Survey to determine which information sets/features are most important to students so those features can be included on the mobile version of the website.

Paper prototyping combined with surveys to determine which other universities’ mobile websites have a desirable organizational structure and attractive appearance.

Post-test survey to rate the experiences using the current site and exploring possible future sites, as well as capture any impressions not covered in the administered questions.

FINDINGS

It is said that even the best-developed plans are bound to run into problems upon execution [9]. Although the pilot study was able to get answers to the questions asked by the hypotheses, this case study focuses on the methodological obstacles that were discovered during the course of the pilot. These include sampling problems, miscalculations about distribution channels, and unnecessary technological constraints.

Sampling

Initially, 100 graduate students were e-mailed a screener which would qualify them for the pilot. There were five respondents, none of whom qualified because they did not own iPhones or iPod touches, although they did own different devices. As there was such a low response rate, a new mail list was requested and the screener was sent to all graduate students enrolled at the University of Wisconsin-Stout. There were no more responses to the e-mailed screener.

The pilot was conducted as the final project for a graduate course, so there were time constraints that created urgency that prompted the next decision.

Due to the lack of responses, it was decided to open the pilot up and expand the user pool to both graduate students and undergraduate students. Changing the participant pool from the Institutional Review Board’s approved group (graduate students), rendered the results of the study itself unusable due to IRB regulations. After the decision was made to open the pilot participant pool, it was then advertised using signs posted in one of the academic buildings – Applied Arts – announcing the incentive and giving students contact information for the pilot.

The aid of faculty within the Applied Arts building was also enlisted and they were able to send a few students to the testing lab to participate.

In all, there were seven qualified respondents. When taking into account that the campus has 8,000 students, the final sample size of seven was insignificant for gaining a sense of what students would want with a new mobile web site. A more robust sample would be more statistically significant and better represent the population as a whole rather than the few recruits who were mainly gathered from one academic building and reflected the aesthetic of those being trained in the Fine Arts.

Distribution Channels

Other constraints inhibited a rich response rate for the pilot. First, the study was designed to have respondents come to the testing site. Second, during the time that the study was conducted, most students on campus were beginning their own final projects as well. Making the students come in to the test site interrupted their day and wasted time.

As the respondents trickled in, they were directed to the testing center. In a one-on-one setting, students were greeted, introduced to the purpose of the study and were given an overview of what was being asked of them. They were told that they would be able to ask the moderator questions at any point during the study.

All participants who qualified for the pilot ended up filling out the screener at the same time as they filled out the rest of the study materials. Participants were left to perform the tasks and fill out the questionnaires at their own pace, with minimal interruptions. All students employed all the methods stated in the survey.

All respondents were given designated space within the academic building to perform the tasks and answer the questionnaires.

Students were given their choice of an eight-ounce bag of candy to compensate them for their time. Two students declined compensation due to dietary restrictions.

Technology Choice

Perhaps the most egregious oversight made during the pilot was the insistence that Apple products were used as the hardware on which to perform the tests. Technology is a fast-moving target. Those who do not heed the trends will be left in the dust.

Not only was the pilot carried out during one of the peak weeks of activity during the semester, it was also conducted during the outbreak of the Android operating system fad. The experimenter’s familiarity with Apple products was the driving force for the choice of hardware.

As designed, the pilot had no Apple-specific directions or exercises. Although peripherally aware of the Android overture into the mobile market, failing to connect that technological advancement to the execution and success of the pilot was shortsighted and severely limited the results.

Consequently, the study participants were comprised of University of Wisconsin-Stout undergraduate and graduate students who owned iPhones or iPod Touches.

DISCUSSION

This pilot set out to test the usability of the then-current, pre-redesign University of Wisconsin-Stout mobile web interface, rank student preferences for topics to be considered for a future University of Wisconsin-Stout mobile site, and to gather student opinions regarding different choices when it came to information organization and appearance.

Although answers were found for all of these questions posed in the pilot, further study with a larger sample should be conducted to challenge or affirm the current findings.

In addition, the pilot was conducted as part of an assignment for a class, within the timeframe of an academic semester. With a classroom context as the background, rather than a planned need for the research results, there were different expectations and criteria than if the study were actually being conducted in earnest. Many of the time constraints would probably not be of concern to researchers conducting the study for information’s sake rather than for a grade.

If a future study were conducted, I would recommend the following to future researchers.

Sampling

The proper number of participants to provide a statistically significant number of participants in a usability study is hotly contested [4], [7], [9]. Nielsen tells us that 85% of usability problems will be found by as few as five users [9]. For the portion of the pilot that involved usability testing of the then-current website on a mobile device, that number could well have provided the information needed to improve the tested functions. The pilot, however, featured two additional sections that asked student opinion on future features and design for a new mobile site.

Consequently, one of the first difficulties to address is ensuring that the sample size is large enough and is representative of a wider variety of students – both graduate and undergraduate. Restricting the potential user pool to a fraction of the campus, while well reasoned, was an unnecessary constraint and made the pilot much more challenging than it needed to be.

In addition, during the course of an academic semester, timing is important when it comes to recruiting participants. Delivering the pilot earlier in a semester, when students aren’t under their own deadlines, could be a better choice. Earlier creation and delivery of at least screeners, if not the whole pilot, would have increased the chances that students would take the time to engage.

If researchers were to experiment with a new method, the Extremely Rapid Usability Testing [10] mentioned in the literature review is one of the methods that could adapt very well to the academic environment. Whether at a booth during orientation, at on-campus job fairs, or other campus activities, this type of testing could be extremely beneficial for sampling. When students attend these events, they are there because they want to be, they are focused on the event, and don’t have other immediate concerns that would prevent them from participating in the study.

Delivery Method

The pilot could have been delivered electronically, but it was reasoned that the study would be more authentic if the surveys were delivered in-person in a monitored environment. Since no part of the pilot involved an interactive testing method (think aloud, card sorting, etc [4].) there was no reason to have participants come to the lab. Students would have been far better served to have been allowed to participate via electronic means.

In addition to convenience for the participant, the use of electronic delivery – when using survey software – allows for easier collection and interpretation of results. Not only would the study possibly attract more participants, there would be less hand-tallying and less room for human error when ascertaining which features were most important to them, and which design they liked best.

The whole point of usability testing is to focus on the user experience [9]. If researchers are to gain the knowledge they seek, they must make sure to focus on the experience of the users during the testing phase as well.

Technology Choice

Mobile devices are continuously being updated and improved to keep up with market demand. Designers face the challenge of taking complex information usually spread out on a series of web pages and making them usable, legible, and attractive to users of mobile devices. They are faced the constraints of small screens, a small number of input controls, and different operating systems. They are also under time constraints that demand they turn out innovative new products faster than their competitors [2].

This speedy evolution has important implications for those conducting studies of mobile devices. Researchers who are studying usability – especially of quickly developing technologies – would be remiss to not be especially observant of the trends around them. Expanding the criteria to include all types of mobile devices, instead of limiting the devices to Apple operating systems, would have prevented limitations on sample size, and would have been technologically responsible. Other systems are gaining in popularity and researchers would be negligent to not include them.

Investigating trends in technology on the many websites that are dedicated to these issues would be of use prior to the creation and delivery of the study. In addition, being more inclusive of the different types of devices as a whole would be prudent for a healthy and representative sampling.

CONCLUSION

This case study investigated the methods and delivery of a pilot study that gauged the usability of the University of Wisconsin-Stout’s then-current website, and also surveyed student choices regarding a future mobile presence.

There were many challenges faced and overcome during the course of the pilot. Most of them were preventable.

From a usability standpoint, the most relevant portion of the pilot was the first section of the study wherein students tested the functionality of the then-current University of Wisconsin-Stout website through the medium of the device hardware. Although the information was interesting, it was less significant than the portion that asked student preference for future features.

By allowing students to express their opinions, the development team would have been given answers to the puzzle of how to distill the complex information, tools, and features contained within 70,000 web pages down into the dozen most important to their users.

If institutions of higher education are to embrace the task of creating a mobile interface that the students will respond to and use, they would be well advised to provide them with the means of expressing themselves and having their opinions heard. This can only be done if future studies are designed in a way that is successful and garners a significant representative sample of the student body.

[6] Kukulska-Hulme, A. 2007. Mobile usability in educational contexts: what have we learnt? The International Review of Research in Open and Distance Learning 8, 2 (June 2007). DOI: http://www.irrodl.org/index.php/irrodl/article/ view/356/879.