Menu

Can Siri go deaf, mute and blind?

Earlier in “Is Siri really Apple’s future?” I outlined Siri’s strategic promise as a transition from procedural search to task completion and transactions. This time, I’ll explore that future in the context of two emerging trends:

Internet of Things is about objects as simple as RFID chips slapped on shipping containers and as vital as artificial organs sending and receiving signals to operate properly inside our bodies. It’s about the connectivity of computing objects without direct human intervention.

The best interface is no interface is about objects and tools that we interact with that no longer require elaborate or even minimal user interfaces to get things done. Like self-opening doors, it’s about giving form to objects so that their user interface is hidden in their user experience.

Apple’s strength has always been the hardware and software it creates that we love to carry, touch, interact with and talk about lovingly — above their mere utility — like jewelry, as Jony Ive calls it. So, at first, it seems these two trends — objects talking to each other and objects without discernible UIs — constitute a potential danger for Apple, which thrives on design of human touch and attention. What happens to Apple’s design advantage in an age of objects performing simple discreet tasks or “intuiting” and brokering our next command among themselves without the need for our touch or gaze? Indeed, what happens to UI design, in general, in an ocean of “interface-less” objects inter-networked ubiquitously?

Looks good, sounds better

Fortunately, though a star in her own right, Siri isn’t wedded to the screen. Even though she speaks in many tongues, Siri doesn’t need to speak (or listen, for that matter) to go about her business, either. Yes, Siri uses interface props like fancy cards, torn printouts, maps and a personable voice, but what makes Siri different is neither visuals nor voice.

Despite the knee-jerk reaction to Siri as “voice recognition for search,” Siri isn’t really about voice. In fact, I’d venture to guess Siri initially didn’t even have a voice. Siri’s more significant promise is about correlation, decisioning, task completion and transaction. The fact that Siri has a sassy “voice” (unlike her competitors) is just endearing “attitude”.

Those who are enthusiastic about Siri see her eventually infiltrating many gadgets around us. Often seen liaising with celebrities on TV, Siri is thought to be a shoo-in for the Apple TV interface Oscars, maybe even licensed to other TV manufacturers, for example. And yet the question remains, is Siri too high maintenance? When the most expensive BOM item in an iPhone 5 is the touchscreen at $44, nearly 1/4 costlier than the next item, can Siri afford to live outside of an iPhone without her audio-visual appeal?

Well, she already has. Siri Eyes Free integration is coming to nine automakers early this year, allowing drivers to interact with Siri without having to use the connected iPhone screen.

Given Siri Eyes Free, it’s not that difficult to imagine Siri Touch Free (see and talk but not touch), Siri Talk Free (see and touch but not talk) and so on. People who are impatient with Apple’s often lethargic roll out plans have already imagined Siri in all sorts of places, from aircraft cockpits to smart wristwatches to its rightful place next to an Apple TV.

Over the last decade, enterprise has spent billions to get their “business intelligence” infrastructure to answer analysts’ questions against massive databases from months to weeks to days to hours and even minutes. Now imagine an analyst querying that data by having a “natural” conversation with Siri, orchestrating some future Hadoop setup, continuously relaying nested, iterative questions funneled towards an answer, in real time. Imagine a doctor or a lawyer querying case histories by “conversing” with Siri. Forget voice, imagine Siri’s semantic layer responding to 3D gestures or touches on glass or any sensitized surface. Set aside active participation of a “user” and imagine a monitor with Siri reading microexpressions of a sleeping or crying baby and automatically vocalizing appropriate responses or simply rocking the cradle faster.

Scenarios abound, but can Siri really afford to go fully “embedded”?

There is some precedence. Apple has already created relatively successful devices by eliminating major UI affordances, perhaps best exemplified by the iPod nano ($149) that can become an iPod shuffle ($49) by losing its multitouch screen, made possible by the software magic of Genius, multi-lingual VoiceOver, shuffle, etc. In fact, the iPod shuffle wouldn’t need any buttons whatsoever, save for on/off, if Siri were embedded in it. Any audio functionality it currently has, and much more, could be controlled bi-directionally with ease, in all instances where Siri were functional and socially acceptable. 3G radio plus embedded Siri could also turn that tiny gadget into so many people’s dream of a sub-$100 iPhone.

Grounding Siri

Unfortunately, embedding Siri in devices that look like they may be great targets for Siri functionality isn’t without issues:

Offline — Although Siri requires a certain minimum horsepower to do its magic, much of that is spent ingesting and prepping audio to be transmitted to Apple’s servers which do the heavy lifting. Bringing that processing down to an embedded device that doesn’t require a constant connection to Apple maybe computationally feasible. However, Apple’s ability to advance Siri’s voice input decoding accuracy and pattern recognition depend on constant sampling of and adjusting input from tens of millions of Siri users. This would rule out Siri embedded into offline devices and create significant storage and syncing problems with seldom-connected devices.

Sensors — One of the key reasons why Siri is such a good fit for smartphones is the number of on-device sensors and the virtually unlimited range of apps it’s surrounded with. Siri is capable of “knowing” not only that you’re walking, but that you’ve also been walking wobbly, for 35 minutes, late at night, in a dark alley, around a dangerous part of a city, alone… and send a pre-designated alert silently on your behalf. While we haven’t seen examples of such deep integration from Apple yet, Siri embedded into devices that lack multiple sensors and apps would severely limit its potential utility.

Data — Siri’s utility is directly indexed to her access to data sources and, at this stage, third parties’ search (Yelp), computation (WolframAlpha) and transaction (OpenTable) facilities. Apple does and is expected to continue to add such partners in different domains on a regular basis. Siri embedded in radio-lacking devices that don’t have access to such data and processing, therefore, may be too crippled to be of interest.

Fragmentation — People expect to see Siri pop up in all sorts of places and Apple has taken the first step with Siri Eyes Free where Siri gives up her screen to capture the automotive industry. If Siri can drive in a car, does that also mean she can fly on an airplane, sail on a boat or ride on a train? Can she control a TV? Fit inside a wristwatch? Or a refrigerator? While Siri — being software — can technically inhabit anything with a CPU in it, the radio in a device is far more important to Siri than its CPU, for without connecting to Apple (and third party) servers, her utility is severely diminished.

Branding —Siri Eyes Free won’t light up the iPhone screen or respond to commands that would require displaying a webpage as an answer. What look like reasonable restrictions on Siri’s capabilities in this context shouldn’t, however, necessarily signal that Apple would create “subsets” of Siri for different domains. More people will use and become accustomed to Siri’s capabilities in iPhones than any other context. Degrading that familiarity significantly just to capture smaller markets wouldn’t be in Apple’s playbook. Instead of trying to embed Siri in everything in sight and thus diluting its brand equity, Apple would likely pair Siri with potential NFC or Bluetooth interfaces to devices in proximity.

What’s Act II for Siri?

In Siri’s debut, Apple has harvested the lowest hanging fruit and teamed up with just a handful of already available data services like Yelp and WolframAlpha, but has not really taken full advantage of on-device data, sensor input or other novel information.

As seen from outside, Siri’s progress at Apple has been slow, especially compared to Google that has had to play catch up. But Google must recognize a strategically indispensable weapon in Google Now (a Siri-for-Android, for all practical purposes) as a hook to those Android device manufacturers that would prefer to bypass Google’s ecosystem. None of them can do anything like it for some time to come, Samsung’s subpar attempts aside.

If you thought Maps was hard, injecting relationship metadata into Siri — fact by fact, domain by domain — is likely an order of magnitude more laborious, so Apple’s got her work cut out for Siri. It’d be prudent not to expect Apple to rush into embedding Siri in its non-signature devices just yet.

The iPod Shuffle as a model for a sub-$100 iPhone is the most fascinating angle I have come across. From a hardware and UI perspective, a properly executed “iPhone Shuffle” would be ingenious. What I am unsure of is how iCloud could be integrated into display-less system. And without iCloud, a cellular radio would be wasted potential.

So while I agree that Siri is not about voice or visuals (without her current interface facilities, to quote Borges quoting Milton, she would “lose no more than the inconsequential skin of things”), I believe her may be greatest potential may be integrating her fully into iCloud as opposed to embedding her in devices and things.

Today, Siri is unable to retrieve, correlate, or transact with the majority of data that I generate for myself using my iOS devices (documents, sent emails, the song I’ve played the most this month, the longest bike ride I’ve logged, to name a few). So while she lives on Apple’s servers, she doesn’t seem to know much about anything in my iCloud. Let the tailor-made apps take advantage of iOS hardware/sensors and let Siri help me help make sense of the data I generate when using those apps. She could be my iMovie for all the billions of bytes I record on my iOS “camcorders” that are so far stuck in iCloud.

That said, I am highly skeptical of “intuiting,” especially the sort espoused by Google Now. As the saying goes: you should not assume, for when you do you make an ass out of u and me. Or as Ricky Roma said it best in Glenngary Glen Ross (http://www.youtube.com/watch?v=GbcfyLfPdlc): you should “never open your mouth until you know what the shot is.”

Siri is a bag of tricks that gives the illusion of an agent. It’s not a do-anything AI. “Reading microexpressions of a sleeping or crying baby and automatically vocalizing appropriate responses or simply rocking the cradle faster” would be an entirely different piece of software. It’s not “interfaceless” either. The “illusion” of agency is the interface. Siri is nothing more than the carefully crafted software that appears to (but does not) use language in a natural way, engage the user conversationally, have a personality, etc. It’s a depiction of agency, an agent user interface, not an agent. If you start stripping aspects of the UI away it really won’t amount to much.

Nothing personal but I think with this, and the last articles on Siri, you have completely lost the thread.

It would take an essay to explain, so I won’t, but generally I think you are making the classic futurist mistake here.

To me this is just a lot of mundane and highly unlikely, “supposin,” based on faulty information. It seems that (like most of the public), you’ve been sold a lot of magic beans on the actual capabilities of AI.

I just noticed this the other day when I switched my iPhone to UK English — I was shocked that Apple didn’t keep gender the same across regions (at least where culturally possible) for branding/marketing’s sake.

I must say, the UK male voice is much smoother and less robotic sounding than the American Siri voice to my ear. Though, it may just be that the addition of an accent (as I hear it) lends a human quality. How do the two voices compare from your perspective?

To my UK ears it still sounds pretty robotic, with weird emphases that presumably the US one also has. It’s a quite well-spoken voice, sort of like a BBC newsreader.

Funny you say the accent makes it sound more human – now that I think about it, the American Siri sounds more human to my ears. Perhaps it’s just that the novelty is distracting, and would fade with increased familiarity.

Regarding updates, it seems to me that this is a result of Apple’s recent push to obsolete devices by disabling functionality, for example, an iPad mini has Siri but the iPad 2 doesn’t, even though the devices are pretty much identIcle bar the screen size, an iphone4 on iOS6 doesn’t get spoken turn by turn, although it gets live routing, you just have to look at the screen, and just the other day the Mac App Store informed me that a Mac Pro with 16GB of RAM, and an ATI 5870 graphics upgrade isn’t eligible for 10.8, and worse than that 10.7 wasn’t offered as an alternative.
This practice compares poorly with everybody else, especially Google that iterates all the time and while there’s a lot of fragmentation across android OS versions, there’s no fragmentation within those versions, if you can run the latest OS you get all the features.
I think this is going to blow up in Apple’s face before long.

I can think of a few more hurdles Apple will face that will likely surround many of the use cases you list:

1.) Siri is far from magic right now. I won’t let her buy me movie tickets, let alone drive a car for me. In real life, the only thing I use it for is to text my partner so we can crack up at the way Siri butchers the messages so fantastically. To Apple’s credit, they marked the implementation as “beta,” but that doesn’t change the fact that it’s really quite poor, to the point of becoming a regular punchline on Primetime TV.

2.) Apple’s sloth-like rollouts (related to #1). I think this has hampered all of Apple’s services. Their desire to wait until something is “perfect” and release it as a “feature” of a major update, has slowed down progress or outright killed services. Quick, iterative updates based on today’s data and real-world usage will be necessary to get Siri (Maps, iCloud, etc.) where people need them to be (this is an area that Google is more culturally adapted for and why their voice recognition, maps, etc. are often more accurate). These products should be improving every day, not every 6-12 months.

3.) Emotion. The more Siri knows, the more potential there is for repulsion, similar to the uncanny-valley effect in computer animation. Will there be a mass acceptance of objects learning behavior at a granular, personal level or will there be a gut-level “ew, turn-it-off” reaction. Which objects will be accepted and which won’t? I think that depends largely on the perceived usefulness of the functionality (and the individual), but, I just think of the many brouhahas over Facebook privacy issues and laugh at how minimal those will seem in a world where my wristwatch knows when I’ve been drunk.

4.) Is voice really the answer? I feel that voice recognition, to some degree, is a product of sci-fi culture. It seems great in fictional representations, but in reality, I’ve watched people red-faced with embarrassment, cupping their iPhone to their mouth on the subway trying to get Siri to do something for them. It’s very often not an ideal interface, or even the quickest way to get things done (cars and living rooms being possible exceptions). In fact, it’s kind of a clunky and dumb way to interact with a computer. Then again, maybe the devices will learn to hear whispers and learn shorthand phrases unique to each user, allowing for a measure of privacy and personalization.

Regardless, it’ll be fascinating to watch and take part in the progress! Thanks for the article.

p.s. — “Siri reading microexpressions of a sleeping or crying baby,” now that sent shivers down my spine! You should write dystopian fiction in addition to insightful articles! Haha.

Good points. Re: fast roll-out/iterative updates. Yes it would be nice, though I suspect where Siri is concerned Apple are fully aware there is a major pitfall to negotiate where extreme care is required. The race is on to provide 3rd party interface support and whoever does so first has the potential to see a vibrant ecosystem grow around it. However whoever does so first also risks being locked into an inferior solution. The moment you publish a third party API, you are locked in to meeting the rules you establish for third parties. For a web service, you can set up a new integration point or a new version of the API. For voice though, it’s a different matter. You get locked into supporting any semantic patterns/rules your developers have come to rely on being available. There is only one user interface. How to do third party integration is a big, big challenge. The answer isn’t to offer third parties any guarantees on how certain commands will be interpreted, indeed the real challenge is what to offer them whilst avoiding all such guarantees, so your interface won’t get locked in and limited to a single point in time. The ramifications of this constraint aren’t widely understood. On the web you can offer two websites, but you can’t offer two agents for someone to talk to, unless perhaps you offer Siri, then Fred, then Doug and allow the user to switch. Hmmm. But still a huge problem for third party integration as they would have no place when you move from Siri they have integrated against, to the new Doug agent which they have not. On the web you can have both services bookmarked but with voice you would have to be presented intermediated by the medium and rules of the agent the user is using.

Your final point. Agree about the embarrassment factor. But disagree the negative of voice is as sweeping broad as you make out. Try to use a hammer for every job and it will be bad at most. However there are many, many use cases where voice is hands down superior. But yes, as you point out, the important thing is the user has to have a very high level of confidence it will work before bothering to try. So with that in mind, yes every time, setting a timer when cooking dinner is hands down more useful than doing it via the touch interface. It’s an order of magnitude faster, and a far better suited interface for a man with oil or dough on his hands.

But yes, to agree with your point, it is simple enough for Siri to get it right consistently enough to pass the reliability hurdle but unfortunately many if not most of the more complex use cases don’t as yet pass the hurdle (especially as you have noted, in public)! The point here is as reliability increases, the situations it will be the “better” interface choice for will multiply. Think how you would use Siri if on a TV when the remote is by another chair. If it worked reliably, pretty soon for many tasks you would stop reaching for the remote even if it is nearby.

You make goods points on Siri access by third parties via formalized SDK, which I intend to write about in the future.

One thing that’ll help in that front is the very structured nature of the data/metadata available at those third parties/partners. Rules about navigating that data and disambiguation of intent at least on a statistical basis, for example, would be separate and can be customized per partner as part of a licensing process. But given the opportunities in integration, this is still a huge undertaking for Apple to get it right.

TheBasicMind — Agreed, and I didn’t mean to sound brutish in my assessment of voice. I just think it is worth considering voice one of many tools and not necessarily THE tool of the future. Cooking is a great example of it as a success! Hadn’t thought of that, as I don’t cook. Also, as I mentioned, I think the living room is a potential goldmine. I’ve been looking forward to it coming to Apple TV for a while.

Kontra — I’m a tech geek so I’d love it if Siri actually worked as efficiently as described here, I’m constantly trying things out on it to see if they do. But, I think that many users will need a slower adoption approach to avoid the creepiness. Again, TheBasicMind’s example of using Siri while cooking is a great one for slowly introducing it into one’s life through pure functionality.

Siri was a brilliant innovative (unsurprisingly given that this is Apple) idea when it was introduced and is still very impressive. But after the first few weeks, when the novelty wore off, it’s now borderline useless. Apple really needs to invest more in this if it wants Siri to go anywhere. As a person with an iPhone and an Android tablet, Google Now is far more useful and relevant!

I wouldn’t be so presumptuous as to use the past tense when referring to Siri, which has crept almost unnoticed into my DAILY usage as a simple and unfailing task accomplisher – one prolonged touch from sleep sets a timer, searches a topic, speaks current temperature, calculates quickly, runs an app, and is slowly taking over from tactile swiping and typing for more and more mundane usage as I discover with my i-Devices.

The need for “constant sampling of and adjusting input … would rule out Siri embedded into offline devices.”

That’s probably the least limiting of all the challenges. Any device that wants Siri will just have to afford to give her access to nutrition—the newspapers and a few comedians’ work as well as her voice-recognition—from time to time. And while nobody thinks Apple has totally nailed device synchronization, they’ve at least made a credible start in that direction.