Understand How Users Interact with Skills

When a user speaks to a device with Alexa, the speech is streamed to the Alexa service in the cloud. Alexa recognizes the speech, determines what the user wants, and then sends a structured request to the particular skill that can fulfill the user's request. All speech recognition and conversion is handled by Alexa in the cloud.

Every Alexa skill has an interaction model defining the words and phrases users can say to make the skill do what they want. This model determines how Alexa communicates with your users.

The following sections provide examples and more detail around how users communicate with Alexa and what you need to do as a developer when designing a skill.

How Users Interact with Alexa

End users interact with Alexa's abilities in the same way – by waking the device with the wake word (or a button for a device such as the Amazon Tap) and asking a question or making a request.

In addition, with Alexa-enabled devices with a screen, the end user can touch the screen to interact with Alexa, if the skill supports this interaction.

For example, users interact with the built-in Weather service like this:

User: Alexa, what's the weather?Alexa: Right now in Seattle, there are cloudy skies…

Interacting with a custom-built skill to look up tide information is very similar, although the user must include a name identifying the skill ("tide pooler" in this example):

User: Alexa, get high tide for Seattle from Tide PoolerTide Pooler: Wednesday February 17th in Seattle, the first high tide will be around 1:42 in the morning, and will peak at about 10 feet…

Similarly, a user can tell a skill to control a particular cloud-enabled device such as a light. In this case, the light must be connected to the Internet (such as via a smart home hub). The user names the specific light ("living room lights" in this example):

User: Alexa, turn on the living room lightsA light previously configured and named living room lights is turned on.

Alexa: OK.

What is an Interaction Model?

In the context of Alexa, an interaction model is somewhat analogous to a graphical user interface in a traditional app. Instead of clicking buttons and selecting options from dialog boxes, users make their requests and respond to questions by voice:

Action

Voice User Interface (Interaction Model)

Typical Graphical User Interface

Make a request

User says, "Alexa, get high tide from tide pooler."

User clicks a button.

Collect more information from the user

Alexa replies, "For what city?" and then waits for a response.

App displays a dialog box and waits for user to select an option.

Provide needed information

User replies, "Seattle."

User selects options and chooses OK.

User's request is completed

Alexa speaks the requested information:

"Wednesday February 17th in Seattle, the first high tide will be…""

App displays the results of the request.

When users speak questions and make requests, Alexa uses the interaction model to interpret and translate the words into a specific request that can be handled by a particular skill. The request is then sent to the skill.

You define your own interaction model when creating a custom skill. The Smart Home Skill API, Video Skill API, Music Skill API, and others provide a built-in interaction model.

Examples of Interaction Models

Interact with a Custom Skill

Note this phrase a user can speak:

User: Alexa, get high tide for Seattle from Tide Pooler.

"Tide Pooler" is the invocation name that identifies a particular skill. When invoking a custom skill, users must include this name.

"get high tide for Seattle" is a phrase in Tide Pooler's interaction model. This phrase is mapped to a specific intent supported by this skill.

Alexa uses this custom interaction model to create a structured representation of the request called an intent. Alexa sends the intent to the Tide Pooler skill. The skill can then look up tide information and send back a response.

Interact with the Smart Home Skill API

Note this phrase a user can speak:

User: Alexa, turn on the living room lights

"turn on the…" is a phrase recognized by Alexa's built-in interaction model. Alexa recognizes that this is a request to turn on a light.

The words "living room lights" identify a particular device that the user has previously configured and named. Note that this is the name of the device to control, not the name of the skill. The user does not need to say an invocation name for this type of skill.

Alexa uses the pre-built interaction model for smart home requests to create a structured representation of the request, called a device directive. Alexa sends the device directive to the specific skill that can control the "living room lights" device. This skill turns on the specified lights by communicating with the device cloud over the Internet, then returns a response indicating whether it was successful.

Interact with the Video Skill API

Note this phrase a user can speak:

User: Alexa, play Manchester by the Sea

"play…" is a phrase recognized by Alexa's built-in interaction model. Alexa recognizes that this is a request to play content.

The words "Manchester by the Sea" identify a particular video title. Note that the user does not need to say an invocation name for this type of skill.

Alexa uses the pre-built interaction model for video skills to create a structured representation of the request, called a device directive. Alexa sends the device directive to the specific skill that controls playback of video content. This skill plays the content by communicating with the service that controls the content, and returns a response indicating whether it was successful.

Interact with the Flash Briefing Skill API

Note this phrase a user can speak:

User: Alexa, what's my flash briefing?

"What's my flash briefing" is a phrase recognized by Alexa's built-in interaction model. Alexa recognizes that this is a request to read and stream content from feeds that the user previously selected.

Alexa uses the pre-built interaction model for content requests to invoke the flash briefing feature. Alexa loads the content feeds the user has selected to include in their flash briefing and streams the content.

Interact with the Music Skill API

Note this phrase a user can speak:

User: Alexa, play Poker Face by Lady Gaga on skill name.

"play…" is a phrase recognized by Alexa's built-in interaction model. Alexa recognizes that this is a request to play content.

"Poker Face by Lady Gaga" identifies a particular song title.

"skill name" identifies the skill to invoke to play this content.

Alexa uses the pre-built interaction model for music skills to create a structured representation of the request, then sends that request to the specified skill. This skill communicates with the service that manages the content, then returns a response with the requested content for playback on an Alexa-enabled device.