Share this story

Apple's "intelligent personal assistant" came before Google Now and Cortana and Alexa, but all of those assistants have caught up to and lapped Siri in one important way: they let third-party applications and services use them to do stuff. As it stands today, Siri can be used to launch third-party apps, but it isn't able to do anything else.

That will change in iOS 10, which is extending Siri to third-party developers via "SiriKit." Developers can't quite do everything that Apple can do—even when letting developers in, Apple still holds them at arm's length with clearly defined extension points and rules—but the company is making it possible to do more stuff without actually launching an app and digging around. Based on the developer documentation that Apple has published so far, here are the kinds of things that third-party apps are going to be able to do with Siri in iOS 10 and what developers have to do behind the scenes to make it work.

Supported apps (and platforms)

Third parties can use Siri in six different kinds of apps, though those applications encompass a wide range of common and popular App Store offerings: audio and video calling apps; messaging apps; payment apps; apps that allow searching through photo libraries; workout apps; and ride booking apps.

That covers a lot of ground, but there's a lot that's missing: music and video apps like Spotify and Netflix, mapping apps like Google Maps, and third-party to-do list apps don't fall under any of these umbrellas, possibly because they conflict too directly in some cases with Apple apps and services like Apple Maps and Apple Music (Apple Maps in particular is becoming more deeply integrated into the OS with every new update). Hopefully Siri will become more useful to a wider variety of apps later on, but for now the first-party apps are definitely still in a privileged position.

And Apple didn't draw too much attention to it yesterday, but these Siri capabilities are only available in iOS 10, not in the newly Siri-fied macOS Sierra.

The SiriKit order of operations

Here's how Apple breaks down a Siri interaction and the work developers have to do.

Andrew Cunningham

The main part of it is the Intents extension, which tells Siri what your app can do and is responsible for handling user requests behind the scenes when they come in.

Andrew Cunningham

Developers can also define custom UI for Siri responses.

Andrew Cunningham

Here's an example of a command coming in. This particular app has fed Siri some vocabulary suggestions based on contact names in the app.

Andrew Cunningham

How the input looks to the Intents extension: you want to send a message that says "you're my only hope" to Obi-Wan.

Andrew Cunningham

Because it has "vocabulary" about your contacts in the app, it knows that "Obi-Wan" is also known as "Old Ben Kenobi."

Andrew Cunningham

And here's all the user sees: their query and your app's response based on that query.

Andrew Cunningham

Is your app one of the supported types of apps? Great! Here's what happens next.

As explained in the developer sessions and in Apple's documentation, a SiriKit interaction has four parts: Speech, which is when your user is giving the app a command; Intent, which interprets that speech and matches it up with something that the application can do; Action, which is when your app actually does the thing specified in your Intent; and Response, where the user is asked to confirm that the command has been interpreted correctly and whether they're sure they'd like to perform the specified action.

A user's interaction with SiriKit begins with a voice command. Developers can define certain general and user-specific vocabulary terms to help Siri interpret those voice commands properly, but Siri is doing the voice-to-text translation and decides what app to pass commands off to. There are specific rules that govern what and how many words developers can define for Siri—user-specific vocabulary includes registering contact names from an in-app contacts list, custom user-defined photo tags and photo album names, and the names of custom workouts. "Global" vocabulary used by all users of your app is limited to ride options (think "UberX" or "UberPool") and workout names.

All of this is intended to make Siri more accurate; if it knows you have a contact named Craig in your app, it's more likely to hear "Craig" correctly instead of "Greg." If you use a ride-sharing service named Flarp to get from place to place—there's literally no name too silly these days—you can help Siri understand that "call me a Flarp to Newark Airport" means that you want a car to drive you to the airport, not that you want to "call a florist at the Newark Airport."

Once you have defined your vocabulary, you use Intents extensions to define what services your app offers and how your app responds to specific user commands. Intents extensions are composed of three kinds of objects: the "intent object" is the input, primarily using data sent from Siri ("call me a Flarp to Newark Airport"); the "handler object" interprets that command and matches those words up to something your app actually offers (the ability to call a car that will drive you to the airport); and the "response object" makes that interpretation of your command visible to the user ("OK, I'll call you a Flarp to Newark Airport").

Apple also explained this interaction using a Star Wars reference in a presentation yesterday, perhaps aware that nerds understand things better when they know how they relate to Star Wars. An app called "Hologram" was being used to send a message to Old Ben Kenobi, conveying that Kenobi was the sender's only hope. Here's how it breaks down:

The user says "Send a Hologram to Obi-Wan saying you're my only hope."

Siri interprets that command. Because the developer has registered in-app contact names as recognized vocabulary, Siri knows that a message to "Obi-Wan" is intended for "Old Ben Kenobi."

From this interaction, the Hologram app's Intents extension knows what you want to do: you want to send a message that says "you're my only hope" to Old Ben Kenobi.

The message is created.

The user is asked to confirm that their request has been handled correctly—in this case, they're shown the Hologram message and asked whether they would like to send it or cancel the request.

If you have ever used Siri to send an iMessage, this workflow should sound pretty familiar. When things are working properly, the first step and the last one are the only ones the user actually sees; everything in between is invisible.

When asking the user to confirm his or her request, developers can either choose to use Apple's standard UI without doing any extra work, or they can create a separate Intents UI extension that uses their app's branding and styling and provides extra options or information. These custom Siri responses can't contain ads, but a messaging app could display your text with its own typefaces and styling, and a ride booking or workout app could provide you extra details about how long your car will take to arrive or how long your workout will take.

As it stands today, SiriKit is very much in line with all the iOS extensions Apple has defined since iOS 8. Yes, it lets developers and users do more with their phones and tablets. But third-party apps still don't get to do everything Apple's apps can do, and Apple is the one calling the shots (the App Review process will also add another wrinkle once iOS 10 is released, since that's when we find out the difference between what Apple's APIs technically allow developers to do and how Apple actually wants those APIs to be used in practice). It's a good start, but there's still a lot of untapped potential.

Promoted Comments

This is so inexplicable. Why can't I pass a search to a weather app to see the weather for a city? Why can't I start a podcast in a 3rd party podcast app like Downcast. (Honestly, that's the time I'm using my phone in the car -- much more than sending SMS by voice) Why can't I ask Daylite to dial a contact? Why can't I get tomorrow weather for a city I'm traveling to from my weather app of choice?

Honestly the places they open are least useful to me. In an exercise app I'm probably going to be breathing too heavy to use Siri accurately. 3rd party messaging makes sense as do 3rd party callers. But why photo search but not weather search or some other search. I can look in some photo app but not ask IMDB to look up some actor? (Not that Amazon would likely support it given they're a competitor)

Share this story

Andrew Cunningham
Andrew wrote and edited tech news and reviews at Ars Technica from 2012 to 2017, where he still occasionally freelances; he is currently a lead editor at Wirecutter. He also records a weekly book podcast called Overdue. Twitter@AndrewWrites