Now that we’re comfortable with intents, utterances, and slots (and custom slots), let’s introduce another major component of the Alexa Skills Kit: sessions.

We’re going to build an application that allows users to ask this:

Alexa, ask Movie Facts about Titanic

Alexa should respond with some facts about the movie Titanic. Then our users should be able to ask context-based questions without restating the invocation name "Movie Facts," such as:

Who directed that

Who starred in that

Alexa should respond with the director of Titanic and a list of the cast.

Alexa should remember that the user asked about Titanic in the first request, and limit her response to subsequent requests to the context of the first.

Additionally, a user will be able to ask:

Start over

And then follow up with:

Ask about Beauty and the Beast

Again, Alexa should then answer questions about Beauty and the Beast.

1. An introduction to Sessions

To build this conversational interface, we will need to make use of Alexa’s ability to manage sessions.

A conversational interface allows users to engage in dialogue with technology, with the technology providing meaningful responses based on the context of the dialogue.

A session is the length of a user's conversation with our skill. As developers, we can control when to end or continue the session. If we end the session, the user will need to start their next phrase with, "Alexa, ask Movie Facts..."

If we leave it open, the user has eight seconds to respond and continue the conversation. If there is no reply after eight seconds, Alexa will provide a reprompt (defined by us) and wait for another eight seconds before closing the session herself. During this session, we can persist attributes (more on that later).

Set Up A New Skill and Application

Set up a new skill with the invocation name "Movie Facts" and a new Sinatra application. Again, we’ll be using ngrok to tunnel our development server over HTTPS, and providing the ngrok HTTPS endpoint to our skill as our endpoint.

Feel free to use another method of connecting a Ruby application to Alexa via HTTPS. We’ll move forward assuming you’re using an ngrok Tunnel, but you can adapt as desired.

Before we try and build our Movie Facts skill, let’s get to grips with some key concepts regarding sessions—what they are, how we use them, and why they’re handy. We’ll build a simple VUI that responds to the following:

Alexa, ask Movie Facts to talk to me

Alexa should respond with: “This is the first question," but only on the first request. On all subsequent requests, Alexa should respond with a count of how many questions the user has asked.

In other words, when a user asks:

Alexa, ask Movie Facts to talk to me

Alexa should respond with: "This is question number number,” depending on how many times the user has asked Movie Facts to talk with them.

Set Up a Minimal Interaction

Let’s set up a minimal intent schema using the intent name MovieFacts:

{
"intents": [
{
"intent": "MovieFacts"
}
]
}

We’ll add a simple utterance:

MovieFacts talk to me

Now, in our Sinatra application, we can provide a simple minimal response. In addition, let’s print the request so we can have a look at it:

Using the Service Simulator or on any Alexa-enabled device, we receive the first message first, and the second message for all subsequent requests.

If you’re using the Service Simulator, don’t forget to hit the "Reset" button, or refresh the page, to start a new session with Alexa.

Persisting Information to the Session Attributes

However, there’s a problem: at the moment, the user will first hear, “This is the first question,” and then they’ll hear “This is question number two” forever—regardless of how many times they ask. We need a way to persist information about how many questions the user has asked, and reference it between requests.

To persist information between requests in this way, we can use session attributes. Sessions can store information about what a user has said to Alexa in the past, and our Sinatra application can use that persisted information to construct a response.

First, we need to initialise the session attributes for our first response to include a new attribute, numberOfRequests:

# for brevity, here's just the Ruby code making the first response
if this_is_the_first_question
return {
version: "1.0",
# here, we can persist data across multiple requests and responses
sessionAttributes: {
numberOfRequests: 1
},
response: {
outputSpeech: {
type: "PlainText",
text: "This is the first question."
}
}
}.to_json
end

Using puts to output the request body, notice how the request now contains a reference to the number of requests made, in an attribute called numberOfRequests:

Now, we are persisting—and acting on—data across multiple interactions. Try it out in the Service Simulator!

In the Service Simulator, remember to hit the "Reset" button, or refresh the page, to start a new session with Alexa.

Different Ways of Restarting a Session

One final thing: what if we want to allow users to start the count over? To do that, we have two choices:

End the session.

Clear the session attributes.

These should be used in two different circumstances:

End the session when Alexa tells the user something. For instance, "Goodbye."

Clear the session attributes when Alexa asks the user something. For instance, "Okay, starting over. Would you like to talk to me?"

The user's experience is different in each case:

End the session: The user has to start the interaction over from the beginning by saying, "Alexa, ask Movie Facts to talk to me." This is similar to the user logging out of a web app.

Clear the session attributes: The existing session continues, but the application forgets everything that's happened so far. The user can just say, "Talk to me" (i.e. no invocation name is required). This is similar to the user restarting some process in a web app, without logging out.

Let’s allow users to say:

Start over.

Alexa should respond with:

Okay, starting over. Would you like to talk to me?

And the user should answer with:

Talk to me.

Since we don't want the user to restate the invocation name, we are going for option number two: clearing the session attributes.

Starting a Session over Using AMAZON.StartOverIntent

Amazon provides us with an intent for starting an interaction from the beginning: AMAZON.StartOverIntent. Rather than defining our own, let's use the built-in Intent.

In our Sinatra application, let’s add a response just for requests to clear the session. In the response, we clear the session attributes, but don't end the session:

if parsed_request["request"]["intent"]["name"] == "AMAZON.StartOverIntent"
return {
version: "1.0",
# adding this line to a response will
# remove any Session Attributes
sessionAttributes: {},
response: {
outputSpeech: {
type: "PlainText",
text: "Okay, starting over. What movie would you like to know about?"
},
# Let's be really clear that we're not
# ending the session, just restarting it
shouldEndSession: false
}
}.to_json
end

This response will now start the session over. However, when the user next says, "Alexa, talk to me", their session will not be "new"; it'll just have empty session attributes. So we need to upgrade our this_is_the_first_question variable:

# This is the 'first question' IF
# the 'new' session key is true OR
# the Session Attributes are empty
this_is_the_first_question = parsed_request["session"]["new"] || parsed_request["session"]["attributes"].empty?

In fact, we can refactor this: any 'new' session will have empty Session Attributes anyway. So our final this_is_the_first_question variable looks like this:

As well as restarting a session using a built-in intent, users can end a session any time in one of three circumstances:

The user says “exit”

The user does not respond or says something that does not match an intent you have defined

An error occurs

In either of these cases, your Sinatra application will receive a special type of request: a SessionEndedRequest. Your application cannot return a response to SessionEndedRequests, but you may wish to use these requests to do some cleanup.

Now a user can reset their session and start the question count over! Now let’s do something a little more complex.

2. Querying IMDb

First, we want users to be able to ask:

Alexa, ask Movie Facts about {some movie name}

Let’s upgrade our first utterance to respond to information about movies:

MovieFacts about {Movie}

Adding a MOVIE slot

If your skill is an English (US) skill, you can use Amazon’s built-in AMAZON.Movie Slot Type to pass the name of the movie. If not, you’ll need to define a custom slot type with the names of several movies, to guide voice recognition for whichever movie the user requests. Assuming the latter, let’s define a custom slot type, named MOVIE, with a definition containing a few example movies:

titanic
jaws
the perfect storm

If you would prefer to use an exhaustive list of movies available on the Internet Movie Database (IMDb), you can find a list of every movie IMDb has listed here.

Add a slot with the appropriate slot type to your intent schema, and test that your slot is filled appropriately by printing requests to your Sinatra application.

Querying IMDb Using a Gem

In our Sinatra application, let’s use the open-source IMDb gem to query IMDb for information about whichever movie the user wants to know more about:

Remember to run the command-line command gem install imdb before you try to run your Sinatra application (or use a more rigorous dependency management system such as Bundler).

Once you’ve verified this is all working in the Service Simulator, let’s move on to the final section: using the session to make a conversational interface.

3. Building a Dialogue

So far, our users can ask Alexa:

Alexa, ask Movie Facts about {some movie name}

Alexa will respond with the plot synopsis for the first movie matching the name the user provides. For example, if a user asks “Alexa, ask Movie Facts about Titanic,” Alexa will respond with a plot synopsis for the 1997 movie Titanic.

We’d love our users to ask follow-up questions about the movie they initially queried—but how can we do that without requiring the user give the movie name a second time? Let’s use session attributes!

Remembering the Movie the User Asked About

We can persist the title of the requested movie after our initial request using the session attributes:

# After the block that handles the first request
if parsed_request["request"]["intent"]["name"] == "FollowUp"
# Fetch the movie title from the Session Attributes
movie_title = session["attributes"]["movieTitle"]
# Search again for this movie, and pull out the first one
movie_list = Imdb::Search.new(movie_title).movies
movie = movie_list.first
# Find out which Role the user was interested in
# this could be 'directed' or 'starred in' (or any other Values
# we provided to our Custom Slot Type)
role = parsed_request["request"]["intent"]["slots"]["Role"]["value"]
# Construct response text if the user wanted to know
# who directed the movie
if role == "directed"
response_text = "#{movie_title} was directed by #{movie.director.join}"
end
# Construct response text if the user wanted to know
# who starred in the movie
if role == "starred in"
response_text = "#{movie_title} starred #{movie.cast_members.join(", ")}"
end
# Pass the response text to the response, and remember to
# store the movie title in the Session Attributes so users
# can make subsequent requests about role in this movie
return {
version: "1.0",
sessionAttributes: {
movieTitle: movie_title
},
response: {
outputSpeech: {
type: "PlainText",
text: response_text
}
}
}.to_json
end

Routing Multiple Intents

We now have three possible Intents (as well as numerous built-in intents) the user can use: AMAZON.StartOverIntent, MovieFacts, and FollowUp. In each case, our Sinatra application does something different:

AMAZON.StartOverIntent: Clear the session and start again, ready to ask about a new movie.

MovieFacts: Retrieve the synopsis of a movie, ready for follow-up questions about that movie.

FollowUp: Give more information about a given movie.

Your intent schema will generally tie one-to-one with actions in your application. In other words, our post / route is acting as a kind of router, with intents as the possible routes.

As a result of this three-intent system, we no longer need to know if this_is_the_first_question. Let's upgrade our code to reflect that:

Alexa responds with "Titanic was directed by James Cameron”. Great! And, because we’re storing the movie title in the Session Attributes, our users can continue querying:

Who starred in that

Alexa responds with a list of cast members for the 1997 movie Titanic. And, because we’ve added a session-clearing intent, users can ask:

Start over

And they’ll be offered the chance to start querying a new movie. When they query the new movie, the user doesn't have to state the invocation name or "Alexa":

Ask about Beauty and the Beast

Awesome!

4. Improving the User Experience (UX)

Let's look at some ways we can improve the user's interaction with this application.

Limiting Response Text

At the moment, the user can ask:

Alexa, ask Movie Facts about Titanic

Alexa will respond with the entire plot synopsis for the 1997 movie Titanic. It's pretty long! The user will be waiting around for a while before they get a chance to query the movie further. Let's improve the user experience by chopping it off after the first 140 characters of synopsis:

# Construct response text if the user wanted to know
# who directed the movie
if role == "directed"
response_text = "#{movie_title} was directed by #{movie.director.join.slice(0, 140)}"
end
# Construct response text if the user wanted to know
# who starred in the movie
if role == "starred in"
response_text = "#{movie_title} starred #{movie.cast_members.join(", ").slice(0, 140)}"
end

That slightly improves the UX!

Extra credit: Extracting sentences from strings is a tough task. However, there are regexes which can approximate it. Upgrade this response-shortening to extract the first few sentences of each response, rather than arbitrarily chopping off the response in the middle of a word.

Adding Prompts

The user may not know that they can query Alexa for further information about a movie. Alexa should prompt them. Let's append some strings to our responses, giving the user prompts for their next action:

# Construct response text if the user wanted to know
# who directed the movie
if role == "directed"
response_text = "#{movie_title} was directed by #{movie.director.join.slice(0, 140)}. You can ask who directed #{movie_title}, ask who starred in it, or start over."
end
# Construct response text if the user wanted to know
# who starred in the movie
if role == "starred in"
response_text = "#{movie_title} starred #{movie.cast_members.join(", ").slice(0, 140)}. You can ask who directed #{movie_title}, ask who starred in it, or start over."
end

Now that we've implemented some more signposting for the user, our skill is easier for them to use.

Extra Credits

Extra Credit #1: It can take a while to search IMDb and then whittle down the response to a single movie. Using a more sophisticated set of session attributes, try persisting information relevant to the movie in the session, and extracting subsequent user requests from the session instead of querying IMDb.

Extra Credit #2: Our codebase is looking pretty scrappy, and it’s highly procedural. There are a few things that feel like they’re violating the "don’t repeat yourself" rule by duplicating knowledge about the system at several points. Try refactoring the procedural codebase into something a little more OO. If you do it right, you’ll wind up with the start of a useful framework that could abstract some of the messy JSON manipulation we’ve been doing. This will be the subject of module 4.

Extra Credit #3: It’s important to know that the request to your application is coming from Alexa, and not from anywhere else (say, a user trying to access your application via cURL from the command-line). To do this, Amazon recommend that before taking any action on a request, developers first verify the request you receive comes from the application you expect. JSON requests from the Amazon Alexa Service come with a key for doing just this: the session.application.applicationId. The value for this key is a string. For extra credit, add a guard clause to verify that the request came from your application, and return an appropriate HTTP error if it does not.

Extra Credit #4: It's pretty easy to crash our application—say, if the user asks for a movie that doesn't exist. Upgrade the application handling of the MovieFacts Intent to handle the case where the user's requested movie cannot be found.

Build a Skill, Get a Shirt

The Alexa Skills Kit (ASK) enables developers to build capabilities, called skills, for Alexa. ASK is a collection of self-service APIs, documentation, templates, and code samples that make it fast and easy for anyone to add skills to Alexa.