I make the obscure reference jokes so you don't have to. Tech and mobile and music.

Oct 27, 2015

Siri’s Next Trick Needs To Be Multitasking

Siri was released to great fanfare in 2011 as the exclusive headline feature of a tock- year iPhone — the iPhone 4s (née S). The device itself was a great rev — the first serious camera on an iPhone and a huge processor speed bump meant it was one of the years that iOS really flew. If you’re going to be on a two year iPhone cycle, people in the know will tell you the s models are the ones to get. While they lack the wow factor of a new visible hardware design, they tend to be much better equipped in terms of processing power-to-OS-feature ratio, and the physical designs themselves are tweaked to perfection thanks to the lessons learned from the previous one model (think: antennagate, scuffgate, bendgate, kill me now).

That year, Siri was indeed impressive, at least compared to other digital assistants or to people who hadn’t used voice recognition since DragonDictate circa ’93. The voice recognition was up there with the best people had previously seen (thanks to it basically beingthe best people had previously seen). Siri managed what seemed like a great leap forward by having the cloud do the heavy lifting: most of the processing is done on remote servers rather than locally, which helps both in terms of speed accuracy, and long term improvement, which we’ll get to in a moment. So Siri was a short term success, but after a few weeks the initial excitement and intrigue died down and Siri use reduced down to mostly starting timers and reminding us to put the bins out.

This has been the path of digital assistants many times over, but this time, unlike many before, it seemed Siri had done just enough to make sure this time they wouldn’t go away. By 2014 Google and Microsoft enter the space (with Google Now and Cortana respectively), and this year has seen news of Facebook’s upcoming assistant ‘M’.

Apple didn’t stand still though, and between 2011 and 2015 Siri has improved considerably. Firstly, all that processing happening on remote servers was improving Siri’s recognition and understanding. Secondly, Siri now has access to the current context of the device it’s running on, such as the time of day, the apps running and the content on the screen. Thirdly, the iPhone 6s is over ten times faster than the iPhone 4s. Finally, on new hardware Siri is always listening.

Fast-forward four years then, and Siri is clearly the best way to carry out a whole variety of tasks on your iOS device. And so what’s next? Well, this autumn Siri is making the jump to the living room with the 4th generation Apple TV. If there’s an input mechanism that Siri can beat, it’s an on-screen keyboard controlled by a D-pad. On the Apple TV, Apple have allowed Siri to use previous commands as context, as a way to hone a search. Think “Show me all the Woody Allen films. Just the funny ones”. Siri on Apple TV is going to be huge.

The next logical step for Siri will be the Mac. OS X Mavericks brought dictation: using the same cloud based voice recognition but without full-Siri experience, you could now dicatate instead of type. Many are expecting Siri to come to the Mac platform soon, but how will Siri need to adapt on this platform that has such a different set of input methods?

I think the answer is this: Siri needs to stop being modal and start multitasking.

Siri on iOS is used in two main situations: when touch is not possible as an input mechanism, and when voice is more efficient, convenient or intuitive. Up until very recently, only the first of these really applied: there were very few cases where it was more efficient to use your voice if you were already holding an unlocked device in your hand. As of iOS 9 though, Siri is context-aware, greatly opening up the times and places voice can be the smarter input choice over touch.

One thing that doesn’t seem to have changed though is the way use of Siri interrupts your flow: you stop using touch, you use voice to interact and achieve a goal, then you go back to using touch. In developer-speak, Siri acts modally. Like a settings dialog or an alert view. While Siri is active you are blocked, deliberately, from interacting in any other way with your device. On iOS, currently, this doesn’t matter too much: generally you will only be doing one thing at a time anyway. But with multitasking now available on the lastest iPads and the iPad Pro launching next month — which really showcases the multitasking use-cases — this modal paradigm will become a sticking point. And when Siri makes it to the Mac, it will very noticeable — I would go as far to say unworkable — if Siri blocks the UI and everything stops in its tracks while you talk. The answer is that Siri’s interface needs to become modeless so that it can listen to commands as we carry on interacting the way we always do.

Imagine this: you are browsing recipes in Safari and want to save one to your recipes collection. Right now, you can say: “Hey Siri, add this to my recipes note” and the link will be appended to the end of your note entitled Recipes. While this is, let’s be honest, pretty impressive, why stop there? Why should you not carry on scrolling through the website while you carry out this task? You can multitask, your touch-input methods can multitask: why not your voice input?

Another example: you’re writing in a text editor on your iPad, and you remember something for later: “Hey Siri, remind me to take the recycling out when I leave the house later”. But why stop the flow of writing while Siri listens and acts?

One more: first thing in the morning and you want to open a few documents that you’re going to be working on, and you want to check your calendar to see what time your first interruption/meeting will be. Two actions that can be carried out with your keyboard and mouse or trackpad, or with voice: “Open the last three documents from yesterday”, “What’s my schedule like today?”. But surely the most efficient path here would be carrying out one with your hand and the other with your voice.

If and when voice input mechanisms and the digital assistants they drive can be always on and modeless like this, we will have input multitasking. Don’t underestimate how powerful this will be, and how much it will change the way we interact with our devices.