Part I: PROBLEMS WITH THE GUIs WE HAVEWhen we set about learning any interface feature that is new to us, we proceed in two phases, the first of which gradually grades into the second. In the first, or learning, phase we are actively aware of the new feature, and seek to understand and master it. If that feature is well–designed, and if we use it repeatedly, we eventually enter the desirable second, or automatic, phase, in which we have formed a habit, and use the feature habitually, without thought or conscious effort. Interface features are created to help you accomplish some task. If a feature forces you to stop thinking about your task and begin paying attention to the feature (an egregious case is where the software crashes, but even a momentary difficulty can derail your train of thought) then it is said to interfere with the task, and you have not entered the automatic phase with respect to that feature. Creating interfaces that allow users to develop automaticity across all tasks should be a primary goal of interaction designers. Such interfaces will be easier to learn and use, more productive, and far more pleasant than what we have today. In spite of a commonly-believed myth to the contrary, we are not novices or experts with regard to whole systems or applications, but go through the learning and automatic phases more or less independently with regard to each feature or set of similar features. If learning one of a set of features makes you automatic on the entire set, or greatly decreases the time it takes to become automatic on the rest of the set, we say that the set of features exhibits consistency. In keeping with an industry that has, until recently, been primarily concerned with introducing computer applications to a rapidly widening audience, most interface design work has concentrated on facilitating the learning phase. The current trend that culminated in Macintosh OS– and Windows–style graphic user interfaces (GUIs) began when designers at Xerox PARC made operating systems more comprehensible by introducing a desktop metaphor, giving a graphic representation to previously invisible system features, and making their relationship to your task more understandable. Programs became task-oriented, and turned into applications, which were presented each in its own separate region of the display, or window — and each with its own characteristic behavior. During this period, not much attention was paid to those qualities of interface features that allow you to enter the automatic phase. The design principles that encourage the development of automaticity in a user are quite different from, though not incompatible with, those for learnability. Unfortunately, some of the methods that have been used to enhance learnability in GUIs make it impossible to achieve automaticity throughout the interface. The present paradigms cannot be evolved or reworked to solve this problem; novel approaches to interface design at the system level as well as in the details are required. Because some of the present methods for promoting learnability cannot be used, they must be replaced by other methods equally or more learnable, methods which are also compatible with automaticity. For an interface feature to be humane it must be easily learned and it also must become automatic without interfering with the learning of or habituating to other features. The present blend of hard-to-learn keyboard shortcuts and difficult-to-automatize menu choices fails on both counts. Adaptive menus and other features that are changed by the system in response to your patterns of use defeat habituation (controls suddenly shift from where you have learned to expect them to be). Normal human inertia makes it difficult to effect sweeping changes, even when the need for them is clear. It is widely recognized by users and commentators that present-day interfaces and their supporting software systems are not satisfactory, yet to many people inside the industry the need to revise our present interface protocols does not seem pressing. Evidence from research in cognitive psychology, user testing, and user complaints should be heeded. This evidence needs to be augmented with empirical and objective measures based on demonstrated productivity; we underuse time analysis tools such as GOMS and its successors, formal measures of complexity, and measures of interface efficiency. To develop an interface that can be operated automatically by a human places constraints on the design, constraints that we learn about from cognitive psychology’s studies of habit formation. For a feature to be habituating, for example, it must be usable without requiring that the user make any decisions. It is better to provide only one way of accomplishing a task when the time lost in deciding which method to use is greater than the time lost by choosing the slower of the methods. This is often the case, and a system that is designed so that the user does not have to make method decisions is called a monotonous system. In operating an interface we combine or “chunk” sequences of actions into gestures, which, once started, proceed automatically. Because we form gestures, techniques such as having a user respond Y or N to an “Are you sure?” verification do not provide safety: The typed “Y” becomes part of the gesture. Interfaces must be designed to accommodate our ability to pay conscious attention to only one object or situation, called our locus of attention, at a time. When we perform multiple tasks simultaneously, unless all but one of them are being performed habitually, they will interfere with each other and we are more likely to make errors. We cannot routinely pay attention to both system state (or mode) and our task at the same time, which leads to performing a gesture appropriate to one mode (for example, in one application) while we are actually in another mode (say, at the desktop). To eliminate this difficulty we must abandon the inherently modal desktop-and-applications style interface that is now prevalent. The window metaphor introduced to isolate applications also has exacerbated the difficulties of navigation. A windowed or paged environment, as found on the web, information appliances, and computers, is an example of how to lay out a maze: You are in a little room with many narrow doorways, through which you can access other rooms whose contents you can see but dimly — say, in the form of a tab, link, or menu label. Our evolved abilities to navigate positionally and by landmarks are of little use. Text searches are an aid to navigation but their most commonly available form, with Boolean combinations of patterns, does not work well for most users, as research has shown. Text searches do not work at all for finding non-textual objects, and they are usually launched by means of dialog boxes, which are usually modal (including the so-called “nonmodal dialog boxes”). Another common feature of present systems, file names, causes difficulties in that it is vexing to have to come up with unique file names (within a limited number of characters) whenever you wish to save your work; it is even more difficult to try to remember file names at some later date. It is possible to eliminate file names altogether. In addition, a user should never have to explicitly save or store work. The system should treat all produced or acquired data as sacred and make sure that it does not get lost, without user intervention. The user may, of course, deliberately delete anything. The ability to delete (or make any changes) means that universal, unlimited-level undo and redo should be inherent to all systems; this is so fundamental that a dedicated key should be devoted to this pair of functions (perhaps replacing the troublesome Caps Lock key). In short, the major paradigms that underlay most of today’s interfaces, plus many other methods and details not mentioned in this summary, are far from helpful to good human-machine interaction. Available interface building tools invariably embody these ineffective paradigms and methods, and condemn those who use them to create interfaces that do not advance the state of the art. The problems cited are not, in general, alleviated by alternative or additional human-machine channels, such as speech recognition, the simultaneous use of two mice or additional controls on mice or pens, or eye and head motion detection (although such innovations do address other interaction problems). Even given perfect speech recognition or direct mental input the question of what you say (or think) to accomplish a task and how the system responds to your words (or thoughts) brings us back to the fundamental questions that have been raised.

Part II: WHAT INTERFACES SHOULD HAVEA useful starting set of solutions to the problems outlined above includes

A better text search methodology, effective both within a local document or system and with respect to extremely large data spaces such as the web

A method of eliminating all modal aspects of the basic human-machine interface, a method that is readily learned by newcomers and which is habituating

An improved navigation method, as applicable to finding your way around within a picture or memo as within a collection of images, documents, or networks; a method which makes use of inborn and learned human navigational skills

A set of detail improvements to some existing mechanisms that make them consistent with the goals and principles of the rest of the design.

Better text searching requires that the search be extremely fast (the next instance appears within human reaction time), interactive at the typed character (or spoken morpheme) level, and not based on dialog box interaction. You should be able to change the pattern (what you are seeking an instance of) at any time, including during a search. The results should be shown in context and not as a list of documents or sites. A search mechanism that is sufficiently fast and powerful also can serve as a cursor positioning mechanism in text. Such a cursor positioning tool can be significantly faster than graphical pointing devices and can unify local and internetworked information retrieval. In present systems, work gets done in applications (which are sets of commands that apply to certain kinds of objects). Tasks are not accomplished at the desktop, and desktops (or launching areas in general) should disappear as interfaces improve. The idea of an application is an artificial one, convenient to the programmer but not to the user. From a user’s point of view there is content (a set of objects created or obtained by the user) and there are commands that can operate on objects. Commands should be independent of applications and be applicable at any time and to any object. If there are times when a gesture that used to evoke one command now invokes another (or evokes no command), then the system’s interface suffers from being modal. If applying a command to an object does not make sense, then the object should be automatically transformed so that the command can apply to it — for example, a spelling check applied to an incoming fax requires that the fax first be run through an OCR program to change it from being a bitmap to a sequence of characters. If nothing can be done, then the system should do nothing. The present paradigm of desktop, applications, and documents can be replaced by a simpler, modeless concept of content and commands. In such an environment, vendors will sell command sets and transformers rather than applications, and a user may not have to deal with a huge application when all he wants is a few new abilities. Such a reorganization will also eliminate much redundant code now present in the multiple applications we use (consider how many different text editors reside on a typical personal computer: There’s one for the word processor, one for the file name editor, one for dialog boxes…. But there need only be one set of commands for word processing functions). The twin problems of navigation and limited display size can both be ameliorated by using a video camera paradigm, where the user can zoom in and out and pan horizontally and vertically over a universe of objects. Objects (documents, pictures, games, anything that has a visual representation) can be grouped into visible clumps and clusters, which can be marked with colors and shapes, and left in locations that are in themselves memorable (the address book is in the upper left corner of the world). Zooming out from your computer can give you a view of your local network, and going still farther, the web comes into view, as organized by a universe vendor (comparable to today’s portal vendors). This is a summary of a longer work, and can therefore touch on only a fraction of the the book’s topics. Even a book of encyclopedic dimensions cannot cover so vast a field as interface design. The message that The Humane Interface brings, aside from its methodological specifics, is that major improvements in interface design are both profitable and moral — profitable because a good interface is cheaper to implement, is more productive, is easier to maintain, has lower training costs, and requires less customer support than a bad interface — moral because it brings smiles to the faces and erases furrows from the brows of users. One can do good and yet do well by rethinking interface design.