Ann Blandford

Professor

Personal Homepage

http://www.ucl.ac.uk/uclic/people/a_blandford/

Employer

University College London
()

Email

a.blandford@ucl.ac.uk

Ann Blandford is Professor of Human-Computer Interaction in the Department of Computer Science at University College London, and served as Director of UCL Interaction Centre (UCLIC) (2004-2011). Her teaching includes User-Centred Evaluation Methods on the MSc in HCI with Ergonomics at UCL. She started her career in industry, as a software engineer, but soon moved into academia, where she developed a focus on the use and usability of computer systems. Ann leads research projects on human error and on interacting with information, with a focus on modelling situated interactions. In particular, she leads an EPSRC Platform Grant on Interactive Systems in Healthcare, and an EPSRC Programme Grant, CHI+MED, on Human-Computer Interaction for Medical Devices. She has been technical programme chair for several conferences, the most recent being NordiCHI 2010. See http://www.ucl.ac.uk/uclic/people/a_blandford/ for more detail.

15.12 Commentary by Ann Blandford

Gilbert Cockton’s article on Usability Evaluation does a particularly good job of drawing out the history of “usability” and “user experience” (UX), and highlighting the limitations as well as the importance of a classical “usability” perspective. For several years, I taught a course called “Usability Evaluation Methods”, but I changed the name to “User-centred Evaluation Methods” because “usability” had somehow come to mean “the absence of bad” rather than “the presence of good”. Cockton argues that “user experience” is the more positive term, and we should clearly be aiming to deliver systems that have greater value than being “not bad”.

However, there remains an implicit assumption that evaluation is summative rather than formative. For example, he discusses the HEART measures of Happiness, Engagement, Adoption, Retention and Task success, and contrasts these with the PULSE measures. Used effectively, these can give a measure of the quality (or even the worth) of a system, alone or in the product ecologies of which it is a part. However, they do not provide information for design improvement. A concern with the quantifiable, and with properties of evaluation methods such as reliability (e.g. Hertzum & Jacobsen, 2001), has limited our perspective in terms of what is valuable about evaluation methods. Wixon (2003) argues that the most important feature of any method is its downstream utility: does the evaluation method yield insights that will improve the design? To deliver downstream utility, the method has to deliver insights not just about whether a product improves (for example) user happiness, but also why it improves happiness, and how the design could be changed to improve happiness even further (or reduce frustration, or whatever). This demands evaluation methods that can inform the design of next-generation products.

Of course, no method stands alone: a method is simply a tool to be used by practitioners for a purpose. As Cockton notes, methods in practice are adopted and adapted by their users, so there is in a sense no such thing as a “method”, but a repertoire of resources that can be selected, adapted and applied, with more or less skill and insight, to yield findings that are more or less useful. To focus this selection and adaptation process, we have developed the Pret A Rapporter framework (Blandford et al, 2008a) for planning a study. The first important element of the framework is making explicit the obvious point that every study is conducted for a purpose, and that that purpose needs to be clear (whether it is formative or summative, focused or exploratory). The second important element is that every study has to work with the available resources and constraints: every evaluation study is an exercise in the art of the possible.

Every evaluation approach has a potential scope — purposes for which it is and is not well suited. For example, an interview study is not going to yield reliable findings about the details of people’s interactions with an interface (simply because people cannot generally recall such details), but might be a great way to find out people’s attitudes to a new technology; a GOMS study (John and Kieras, 1996) can reveal important points about task structure, and deliver detailed timing predictions for well structured tasks, but is not going to reveal much about user attitudes to a system; and a transaction log analysis will reveal what people did, but not why they did it.

Cockton draws a distinction between analytical and empirical methods, where analytical methods involve inspection of a system and empirical methods are based on usage. This is a good first approximation, but hides some important differences between methods. Some analytical methods (such as Heuristic Evaluation or Expert Walkthrough) have no direct grounding in theory, but provide more or less support for the analyst (e.g. in the form of heuristics); others (including GOMS) have a particular theoretical basis which typically both constrains the analyst, in terms of what issues can be identified through the method, and provides more support, yielding greater insight into the underlying causes of any issues identified, and hence a stronger basis to inform redesign. In a study of several different analytical methods (Blandford et al, 2008c), we found that methods with a clear theoretical underpinning yielded rich insights about a narrow range of issues (concerning system design, likely user misconceptions, how well the system fits the way users think about their activities, the quality of physical fit between user and system, or how well the system fits its context of use); methods such as Heuristic Evaluation, which do not have theoretical underpinnings, tend to yield insights across a broader range of issues, but also tend to focus more on the negative (what is wrong with a system) than the positive (what already works well, or how a system might be improved).

Cockton rightly emphasises the importance of context for assessing usability (or user experience); surprisingly little attention has been paid to developing methods that really assess how systems fit their users in their various contexts of use. In the context of e-commerce, such as his van hire example, it is widely recognised that the Total Customer Experience matters more than the UX of the website interface (e.g. Minocha et al, 2005): the website is one component of a broader system, and what matters is that the whole system works well for the customers (and also for the staff who have to work within it). The same is true in most contexts: the system has to perform well, it has to be usable and provide a positive user experience, but it also has to fit well into the context of use.

In different contexts, different criteria become prominent. For example, for a banking system, security is at least as important as usability, and having confidence in the security of the system is an important aspect of user experience. A few days ago, I was trying to set up a new standing order (i.e. regular payment from my bank account to a named payee) to pay annually at the beginning of the year ... but the online banking system would only allow me to set up a new standing order to make a payment in the next four months, even though it would permit payment to be annual. This was irritating, and a waste of time (as I tried to work out whether there was a way to force the system to accept a later date for first payment), but it did not undermine my confidence in the system, so I will continue to use it because in many other situations it provides a level of convenience that old-fashioned banking did not.

Cockton points out that there are many values that a system may offer other than usability. We have recently been conducting a study of home haemodialysis. We had expected basic usability to feature significantly in the study, but it does not: not because the systems are easy to use (they are not), but because the users have to be very well trained before they are able to dialyse at home, their lives depend on dialysis (so they are grateful to have access to such machines), and being able to dialyse at home improves their quality of life compared to having to travel to a dialysis centre several times a week. The value to users of usability is much lower than the values of quality of life and safety.

Particularly when evaluating use in context, there doesn’t have to be an either-or between analytical and empirical methods. In our experience, combining empirical studies (involving interviews and observations) with some form of theory-based analysis provides a way of generalising findings beyond the particular context that is being studied, while also grounding the evaluation in user data. If you do a situated study of (for example) a digital library in a hospital setting (Adams et al, 2005), it is difficult to assess how, or whether, the findings generalise to even a different hospital setting, never mind other contexts of use. Being able to apply a relevant theoretical lens (in this case, Communities of Practice) to the data gives at least some idea of what generalises and what doesn’t. In this case, the theory did not contribute to an understanding of usability per se, but to an understanding of how the deployment of the technology influenced its acceptance and take-up in practice. Similarly, in a study of an ambulance dispatch system (Blandford and Wong, 2004), a theory of situation awareness enabled us to reason about which aspects of the system design, and the way it was used in context, supported or hindered the situation awareness of control room staff. It was possible to apply an alternative theoretical perspective (Distributed Cognition) to the same context of use (ambulance dispatch) (Furniss and Blandford, 2006) to get a better understanding of how the technology design and workspace design contribute to the work of control room staff, including the ways that they coordinate their activity. By providing a semi-structured method (DiCoT) for conducting Distributed Cognition analyses of systems (Blandford and Furniss, 2006), we are encoding key aspects of the theory to make it easier for others to apply it (e.g. McKnight and Doherty, 2008), and we are also applying it ourselves to new contexts, such as an intensive care unit (Rajkomar and Blandford, in press). Even though particular devices are typically at the centre of these studies, they do not focus on classical usability of the device, or even on user experience as defined by Cockton, but on how the design of the device supports work in its context of use.

Another important aspect of use in context is how people think about their activities and how a device requires them to think about those activities. Green (1989) and others (Green et al, 2006) developed Cognitive Dimensions as a vocabulary for talking about the mismatch between the way that people conceptualise an activity and the way they can achieve their goals with a particular device; for example, Green proposes the term “viscosity” to capture the idea that something that is conceptually simple (e.g. inserting a new figure in a document) is practically difficult (requiring each subsequent figure to be renumbered systematically in many word processors). We went on to develop CASSM (Blandford et al, 2008b) as a method for systematically evaluating the quality of the conceptual fit between a system and its users. Where there are different classes of users of the same system, which you might regard as different personas, you are likely to find different qualities of fit (Blandford et al, 2002). CASSM contrasts with most established evaluation methods in being formative rather than summative; in focusing on concepts rather than procedures; in being a hybrid empirical-analytical approach; and in focusing on use in context rather than either usability or user experience as Cockton describes them. It is a method for evaluating how existing systems support their users in context, which is a basis for identifying future design opportunities to either improve those systems or deliver novel systems that address currently unmet needs. Evaluation should not be the end of the story: as Carroll and Rosson (1992) argue, systems and uses evolve over time, and evaluation of the current generation of products can be a basis for designing the next generation.

This commentary has strayed some way from the classical definitions of usability as encapsulated in many of the standards, and cited by Cockton, to focus more on how to evaluate “quality in use”, or the “extent to which a product can be used by specified users to achieve specified goals” within their situated context of use. Cockton argues that “several evaluation and other methods may be needed to identify and relate a nexus of causes”. I would argue that CASSM and DiCoT are examples of formative methods that address this need, focusing on how products are used in context, and how an understanding of situated use can inform the design of future products. Neither is a silver bullet, but each contributes to the agenda Cockton outlines.