Archive

Editor’s note: This is part one of our Makers Academy series for Ruby developers. Learn more about this free training on the Alexa Skills Kit in this blog post. And check out the full training course for free online.

Welcome to the first module of Makers Academy's short course on building Alexa skills using Ruby. Amazon's Alexa Skills Kit allows developers to extend existing applications with deep voice integration and construct entirely new applications that leverage the cutting-edge voice-controlled technology.

This course will cover all the terminology and techniques required to get fully-functional skills pushed live to owners of Alexa-enabled devices all around the world using Ruby and Sinatra.

What's in This Module?

This module contains a basic introduction to scaffolding a skill and interacting with Alexa. This module introduces:

Intent schemas

Utterances

Alexa communication paradigm

Tunneling a local application using ngrok over HTTPS

Connecting Alexa to a local development environment

Alexa-style JSON requests and responses

During this module, you will construct a simple skill called “Hello World.” While building this skill, you will come to understand how the above concepts work and play together. This module uses:

1. Amazon-side Setup: Setting up the Voice User Interface (VUI)

Click “Alexa” on the navigation bar, then click “Get started with the Alexa Skills Kit”:

Click “Add a new skill”

Use a “default custom interaction model”

Set up the skill:

Language

Name (“Hello World”)

Invocation Name (“Hello World”)

The invocation name is used by the user to access a certain skill. For example, "Alexa, ask Hello World to say hello world."

Intent Schemas

Now we have a new skill, let's construct the intent schema.

The intent schema lists all the possible requests Amazon can make to your application.

{
"intents": [
{
"intent": "HelloWorld"
}
]
}

The minimal intent schema is a JSON object with a single property: intents. This property lists all the actions an Alexa skill can take. Each action is a JSON object with a single property: intent. The intent property gives the name of the intent.

Utterances

Now that we have the intent schema, let's make the utterances. Utterances map intents to phrases spoken by the user. They are written in the following form:

IntentName utterance

In our case, we have only one Intent: HelloWorld, and we'd like the user to say the following:

Alexa, ask Hello World to say hello world.

Our utterances are:

HelloWorld say hello world

We've now set up our skill on Amazon's Alexa Developer Portal.

2. Setting up the Backend: A local Tunneled Development Environment

Our second step is to set up our local Ruby application to be ready to receive encrypted requests from Amazon’s servers (i.e. HTTP requests over SSL or “HTTPS” requests).

We will walk through setting up a Ruby server using Sinatra. The server will run locally and be able to receive HTTPS requests through a tunnel.

Unzip the package and transfer the executable to your hello_world_app directory

Start ngrok using ./ngrok http 4567

Copy to the clipboard (command-C) the URL starting with “https” and ending with “.ngrok.io” from your ngrok terminal

In a second terminal, start your Sinatra application using ruby server.rb.

3. Linking the Alexa VUI to Our Backend via the Endpoint

Our third step is to link the skill we set up on Amazon (1) with the tunnel endpoint (2) so our skill can send requests to our local application.

Configuring the Endpoint in the Alexa Skills Portal

When Amazon invokes an intent, Amazon sends a POST request to the specified endpoint (web address).

Head back to your Alexa skill (for which you just entered intents and utterances). Hit “Next,” then set up the endpoint.

Use HTTPS, not AWS Lambda (there is currently no Ruby support on Lambda)

Geographical region: Europe (we picked Europe because Makers Academy is in the UK, but you would select North America if you’re in the United States)

Paste the endpoint to your application into the text input field

If using ngrok, your endpoint is the URL you copied, starting with "https" and ending with ".ngrok.io."

You won't need account linking for this skill.

Configuring SSL

Amazon Alexa only sends requests to secure endpoints: ones secured using an SSL certificate (denoted by the 'S' in HTTPS). Since we used ngrok to set up our HTTPS endpoint, we can use ngrok's wildcard certificate instead of providing our own.

If you used ngrok to set up a tunnel, select “My development endpoint is a sub-domain of a domain that has a wildcard certificate from a certificate authority.”

Hit “Next” again.

Testing in the Service Simulator

The Service Simulator in the Amazon Alexa Developer Portal allows you to try out utterances. Once you’ve written an utterance into the Service Simulator, you can send test requests to the application endpoint you defined. You can see your application’s response to each request that you send.

Use the Service Simulator to test that the say hello world utterance causes Amazon to send an intent request to your local application, and observe that the request body printed to the command-line matches the JSON request sent in the Service Simulator.

You will receive an error in the Service Simulator as you’re not sending a response to this request just yet. This is not a problem for now; check your logs to view the request that was sent.

You’ve now hooked up your local development environment to an Alexa skill.

4. Responding to Alexa Requests

Now we have set up an Alexa skill (1), built a local development server with an endpoint tunnelled via HTTPS (2), and can make requests from Amazon to our local development server through that endpoint (3).

Our final step is to construct a response from our endpoint such that Amazon can interpret the response to make Alexa say, “Hello, world” to us.

Building the JSON Response

Amazon sends and receives JSON responses in a particular format. Let's set that up here.

response (object): required. Tells Alexa how to respond: including speech, cards, and prompts for more information.

outputSpeech (object). Tells Alexa what to say.

type (string) required. Tells Alexa to use Plain Text speech, where Alexa will guess pronunciation, or Speech Synthesis Markup Language (SSML), where you can specify pronunciation very tightly.

EXTRA CREDIT: Change the response to use custom pronunciation using SSML.

text (string) required. Tells Alexa exactly what to respond with.

EXTRA CREDIT: Play around with this response, restarting the server and sending an Intent Request from the Service Simulator each time.

Testing Our Response in the Service Simulator...and Beyond!

Now that we've built a JSON response, we can restart the server and test out the new response in the Service Simulator.

If you would like to try your new Hello World skill out live, ask Hello World to say, “Hello, World” on any Alexa-enabled device registered to your developer account. You can also try it out on the browser-based Alexa skill testing tool Echosim.io.

Next Steps

Build a Skill, Get a Shirt

The Alexa Skills Kit (ASK) enables developers to build capabilities, called skills, for Alexa. ASK is a collection of self-service APIs, documentation, templates, and code samples that make it fast and easy for anyone to add skills to Alexa.