Here’s what you need to know about the future of gesture-based UI design

Smartphone screens across the globe are embracing clean design in an effort to feature content on high-end pixel real estate. Buttons, both physical and their onscreen counterparts, are being thrown overboard; clutter is getting banished. And that means gestures — scrolling, swiping, tapping, pinching, flicking — are becoming the dominant form of the smartphone UI, and voice and facial commands are starting to follow along, too.

Don Norman, a renowned cognitive scientist and author of The Design of Everyday Things, would likely say if most humans are not using the gestures on smarthphones to their potential — and smartphones were made to be used by humans — we should consider it a flaw in the phones’ design (because there’s very little we can do about the flaws in humans).

So what determines a solid gesture that will remain in the lexicon for years to come and will be universally used across cultures? User interface designers, researchers and engineers have to take into account human flaws, as well as a host of other considerations, when they’re thinking up the gestures that they hope will become as commonplace as the pinch to zoom, or pull to refresh. We’ve interviewed several UI designers and this is our look at some of the issues they face, and what they think about the future of gesture design (we’ll be focusing on some of these issues at our RoadMap conference in November in San Francisco).

Designing a touch screen gesture

Your phone has a range of sensors — capacitive touch sensors, optical sensors, accelerometers and image sensors — to interpret multiple forms of human input. For touch gestures, your phone measures where your finger is and where it’s moved as well as its speed, timing and angle to determine which task to execute. Since humans, unlike phones, don’t all work in the same way, these gestures don’t have to be exactly the same but, rather, should fall on a range of acceptable motions.

In order to teach people how to use their phones, designers rely on a number of visual, audio and sometimes tactile cues. A visual signifier would be, say, something peeking out from the corner of the screen to tip off users that there’s an action they need to do to get the information. The phone’s positive or negative feedback — buzzing, loading symbol, working or not — feeds the learning experience.

Phones rely on prompts that are supposed to teach you gestures step by step. But the issue is that human beings only have a limited capacity, depending on their needs, for this sort of tutoring. If we don’t learn the gesture quickly, we’ll quickly shut off the annoying prompts, and fail to learn the gesture, says Yaro Brock, cofounder of Cookie Jar UX and a longtime user experience researcher.

Quick capture gesture on Moto X

Your phone’s gesture prompts are based on what user experience engineers and designers know (or think they know) about you. They usually do go through a testing process on many regular people and this usually involves handing over the device and stepping in with hints when the users are stumped. Amendments are then made so that phone gestures work more intuitively.

The reality is that human beings are often juggling many things while using phones, often times using a single hand. This greatly restricts the range of motions that the user can execute. Essentially, humans have limited cognitive space for retaining gestures. Of course we can learn, but we’ll only do it to an extent.

Twitter patent for Loren Brichter’s pull-to-refresh gesture

According to David Winkler, senior user experience manager at T-Mobile, “If it’s too hard or if fails too many times people stop trying.” Take for example the tap-to-transfer mechanism on the Samsung Galaxy S III. While it looks cool, it’s difficult to set up and the phones that work with it are limited. Long story short: People aren’t really happily tapping phones together to share videos like they were in those commercials.

But after we experience a gesture on enough devices and apps, it starts to become part of our gestural language. Once that happens we even come to expect that such a gesture will have a similar result across platforms and apps.

What makes a good gesture?

The best gestures are simple and can be made with a free hand. Think of Loren Brichter’s pull to refresh — it just makes sense, and it suits a need many people have. Or consider Flipboard’s flipping motion — it’s natural and easy. Here’s a non-exhaustive list of features a good gesture will likely have.

It’s easy to do. This means it’s not only a simple movement, but one that we can do one-handed while on a bumpy subway or with a bag of groceries.

It’s easy to remember. Easy gestures are easy to remember but you also have to feel like it’s important enough to want to remember.

It’s intuitive. This means it feels how you would behave and that the motion, on a subconscious level, corresponds with the action it does.

It serves a useful purpose. This might seem obvious, but it’s easy for designers to get carried away with a gesture whose worth isn’t commensurate with the effort it takes to make it happen. Users aren’t going to go out of their comfort zone — they’re certainly not going to tap twice and twist — for a result that isn’t going to make their lives easier.

It is a joyous experience. Winkler says a good gesture feels “like magic.” The user experience is the X factor and it’s not easy to explain. But it is part of the way we think about—and enjoy—technology. According to Brock, “We’re in the era of actually living with things, and they have meaning in our lives. Like a pair of old shoes or jeans that aged well, you live with these things and they conform to you.”

The future of gestures

The future will likely bring the introduction of more gestures, but the reality is that most will fail. Eventually human beings will reach a ceiling of how many gestures they can remember, or want to remember.

Down the road gestures will start to combine with automated information from sensors, from your location, and from other personal data.

Google Now tracks where you are and provides information accordingly.

Already the Galaxy S 4 tracks your eyes so it automatically scrolls down when you hit the bottom of the page, and Moto X is always listening, ready to activate when you say, “OK Google Now.” The Google Now app itself gives you card suggestions based on where you’re located and accommodates changes in behavior like vacations. Systems like Flutter use facial expressions and non-touch hand gestures (say, waving your hand in front of the phone instead of putting your greasy fingers on it) to command phones, but bigger movements aren’t ideal for cellphone use in public.

These might not broadly be considered gestures, but they make it so that we have to rely less on what is often an imperfect command.

“Devices and phones are going to start to step up and meet users half way,” Winkler said. “The phone will be more context-aware — it will know you’re talking, know where you are — and it will become a little bit smarter.”

Alternatively, this means that we’ll have to remember fewer gestures, allowing phone users to be a little dumber.