Why type systems matter for UX: an example

Applications are bad enough in that they trap potentially useful building blocks for larger program ideas behind artificial barriers, but they fail at even their stated purpose of providing an 'intuitive' interface to whatever fixed set of actions and functionality its creators have imagined. Here is why: the problem is that for all but the simplest applications, there are multiple contexts within the application and there needs to be a cohesive story for how to present only 'appropriate' actions to the user and prevent nonsensical combinations based on context. This becomes serious business as the total number of actions offered by an application grows and the set of possible actions and contexts grows. As an example, if I just have selected a message in my inbox (this is a 'context'), the 'send' action should not be available, but if I am editing a draft of a message it should be. Likewise, if I have just selected some text, the 'apply Kodachrome style retro filter' action should not be available, since that only makes sense applied to a picture of some sort.

These are just silly examples, but real applications will have many more actions to organize and present to users in a context-sensitive way. Unfortunately, the way 'applications' tend to do this is with various ad hoc approaches that don't scale very well as more functionality is added--generally, they allow only a fixed set of contexts, and they hardcode what actions are allowed in each context. ('Oh, the send function isn't available from the inbox screen? Okay, I won't add that option to this static menu'; 'Oh, only an integer is allowed here? Okay, I'll add some error checking to this text input') Hence the paradox: applications never seem to do everything we want (because by design they can only support a fixed set of contexts and because how to handle each context must be explicitly hardcoded), and yet we also can't seem to easily find the functionality they do support (because the set of contexts and allowed actions is arbitrary and unguessable in a complex application).

Today I was forced to edit a Microsoft Word document, containing comments (made by me, and by others) and tracked changes. I found myself wanting to delete all comments, and accept all tracked changes. It took a few minutes to figure out, and I very quickly gave up trying to actually discover the functionality within Word's actual UI and resorted to using Google. God help me if I wanted to, say, delete only comments made by me within the last ten days.

This problem isn't at all unique to Word, Word just happens to be a large application, and like all large apps it has no cohesive story for organizing all its functionality. There are menus with hundreds of entries, and toolbars. Of course, the designers of Word have tried to create some reasonable taxonomy for grouping these functions, but on some level the location of any one function is arbitrary and unguessable. It's the same thing in any complex application: Eclipse, IntelliJ, Photoshop, Illustrator, and so on, which is part of the reason why there's an entire subdivision of the publishing industry devoted to books explaining how to get things accomplished with these applications. Every single complex app is like a foreign language, with its own unique and arbitrary vocabulary and grammar, which must simply be memorized in order to become productive.

Nowadays, I think people try to write simpler applications than Word, with less functionality. This makes the UX problem more tractable while throwing the baby out with the bathwater. I want to be able to do complex things when I interact with some piece of software, I just also want all this complex functionality to be actually discoverable! Furthermore, it should be as obvious how to assemble functionality that didn't previously exist, using existing functions.

I click on a comment. A status bar indicates that I have selected something of type Comment. Now that I have a handle to this type, I then ask for functions of accepting a List Comment. The delete comment function pops up, and I select it.

The UI asks that I fill in the argument to the delete comment function. It knows the function expects a List Comment and populates an autocomplete box with several entries, including an all comments choice. I select that, and hit Apply. The comments are all deleted.

If I want, I can insert a filter in between the call to all comments and the function to delete those comments. Of course, the UI is type directed--it knows that the input type to the filtering function must accept a Comment, and prepopulates an autocomplete with common choices--by person, by date, etc.

What I (most likely) don't want is full text search through the documentation. That is a much too sloppy way of querying for the functions I want, and it's not nearly as seamless as the above.

I'm not necessarily advocating any particular presentation of the above interaction, though that is an interesting UX problem to solve. I'm just saying, this is the sort of discoverable, pleasing, logical interaction I want to have with software I use. The goal of software is to provide a kind of programming environment, and we ought to use the powerful tools of programming--namely type systems and type directed editing, to organize and unify that functionality that software allows users to interact with.

9 comments:

I like your idea here. But I think you're not addressing the issue that designing the typed API is still difficult and possibly limiting in similar way to Word's interface. For instance, in your example of clicking on a comment, you make the implicit jump from Comment to [Comment]. How are we supposed to know that when you click on a single comment we actually want functions of a list of comments? Or, coming from the other direction, what if the delete function was a function of a single comment instead of a list of comments?

Well, you could just search all type signatures in your API and return everything that has Comment anywhere in the signature. But then what if the delete function was a function of CommentId? Or Int64?

I think this is a really interesting idea, and would love to see it implemented. But part of me thinks that the idea is roughly isomorphic to graphical programming languages. And we all know how those have historically worked out.

I think of those as fun UX problems to solve, rather than fundamental limitations :) And I have ideas in this area that I'll write up at some point. But beyond working out a nice way to present this sort of functionality, the more important thing to me is that 'applications' get reconceptualized as programming environments rather than appliances, and the focus becomes how to make a pleasing, logical, interactive, and discoverable interface to that programming environment. Current apps take the opposite approach, of starting with a fixed appliance, and then bolting on features and ad hoc programmability after the fact. The result is a confusing, undiscoverable mess.

I want to respectfully and cheerfully call out the awfulness of this idea. The only thing harder to change than an application the size of Word is a type system the size of Word. I detect in your post the confidence of someone who has no plans to implement their idea, and I think that's for the best!

However, I do like how you're looking for fundamental application improvement -- at the highest level of what an application is like -- by starting over at the lowest level of how it's built. The connection doesn't feel strong or necessary in this case, but there are such connections to be forged, they're needed, & I applaud the search.

> The only thing harder to change than an application the size of Word is a type system the size of Word.

I don't actually understand what you're saying here. I am not suggesting incrementally modifying Word to work like how I envision. Word is too big and too old a codebase for that to be possible. Furthermore, I have fundamental problems with the whole concept of large applications so I don't really think applications like Word need to exist *as such*. In my hypothetical world, Word does not exist; instead *word processing functions and capabilities exist*, and they are accessed within a unified programmable, interactive environment. See my earlier post.

"A type system the size of Word" - not sure what you mean here. The type system is written once, and it's small, and ignorant of the domain. Just like the type system for a programming language. It would not know anything about Word, drawing, email, Twitter, or anything else. The type system is just the formal language for describing possible values, it does not need to know anything about what sort of types users or developers of functionality will create.

Okay, but maybe your argument is just that it would be impossible or difficult to provide types for all the data and functionality of Word? But we know this is possible, because that is what the programmers who wrote Word did! They came up with types for all the data and functionality in the application. That's how they implemented the application in a typed programming language... they then threw out all this type information and composability and left users with a fixed appliance.

Okay, so the type system and API of Word's implementation is probably not something we'd ever want users to have to look at, but there's nothing fundamental here. Yes, good API design is hard, and there are plenty of shitty APIs, but we know a lot more about type systems and API design than we used to, and I don't see fundamental barriers.

If you do have an actual, precise argument about why what I imagine is fundamentally impossible, though, well that would be very interesting to hear!

Yes, as Pete and Rafael hav mentioned, you are revisiting some of the ideas of naked objects.

The idea of selecting an object and then asking for actions to perform upon it is called "contributed actions" in Apache Isis... because the actions may be on the object itself (OO as your mom taught you), or might have been 'contributed' by other domain services.

In Isis we've recently extended the notion to contributed properties and collections; ie tell me about other related objects to this one.

Contributed actions etc also are of interest because they help keep the model decoupled; rather important for more complex domains.

More broadly, your post is really about the old nouns vs verbs debate; where do you want to start? For a long while now most UIs have put the emphasis on verbs (eg File>New). But the recent Win8 UI tends to put more emphasis on nouns, so perhaps things will balance up a bit.

The Isis website has just been updated some screenshots if you want to take a look-see of the current state of the art on this.