Tag: TwiML

At Amazon AWS ReInvent 2016 one of the cool new features that was released was Polly, an amazingly slick synthetic voice engine. At time of writing, Polly supports a total of 47 male & female voices spread across 24 languages.

From the moment I saw the demo I knew I could use this to generate on the fly audio for use on Twilio.

Using AWS Polly with Twilio will allow you to make use of multiple languages, dialects and pronunciation of words. For example the word “live” in the phrases “I live in Seattle” and “Live from New York.” Polly knows that this pair of homographs are spelled the same but are pronounced quite differently. (sic) .

In this example I’m going to be using AWS Polly, Twilio TwiML (the Twilio XML based markup language) and NodeJS to produce an app that will allow you to generate on demand MP3 files which can then be nested in TwiML <Play> verbs.

Create a new use that has programatic access (this will generate keys and a key secret).

Attach the user to “AmazonPollyFullAccess” policy and finish the account creation steps.

Phase Two: Create a new NodeJS project

In terminal navigate to where you keep your projects and create a new directory

mkdir nodeJSPollyTwiML && cd nodeJSPollyTwiML

Inside the directory initialise node using

npm init

The initialise script will ask you some basic questions about the application; name, keywords etc. I will leave this up to you.

Once the initialise script has finished we can install the required modules needed by NodeJS to run our application.

npm install --save aws-config aws-sdk body-parser express forms

This will install the required modules needed to build and run this app.
Now we can begin to build out this application.
Call up your favourite text editor – mine currently is Atom, which is made by the GitHub team.

Atom allows you to keep a project directory on the left for easy navigation as well as colour coding all the files based on their git state.

The structure of the app is going to be:

├── server.js
| ├── config.js
├── audioFiles

server.js will be responsible for all the application processing

config.js will be where the system configuration files will be stored

audioFiles will house the saved audio records.

Before we can write any server code we need somewhere to store our AWS credentials. Create a new file called config.js, add to this file:

In the configuration page we have broken out the settings into two parts; one for production systems and one for test systems (default in this case). As we are still building this app I will be working from the test environment.

Using Module exports we can now call the config file into server.js and load the credentials when we need them!

Open server.js file and load the modules needed, this should be the same as what was in package.json after npm install had completed.

Breaking this down, the path is comprised of ‘play‘ this refers to the Twilio Verb Play, if the application were to be built out further you could use other verbs or commands to define other paths, e.g. /host/ could generate the audio file but return the URL path of the audio file, letting the application host the file.

Next ‘Carla‘ refers to the Polly voice we want to use, as mentioned before AWS polly has a total of 47 male & female voices. Each of these voices has a name so its easy to reference which voice you want to use by calling that name.

The last part: ‘Hi%20Mathew.%20this%20is%20Carla%20from%20Amazon%20Web%20services.‘ Is the message that needs to be converted into speech. To ensure that the message is transmitted correctly you will need to URL encode the string, this converts spaces into %20, you can find more details on URL encoding here.

When a HTTP GET requests comes in that starts /play/ this function will be called. Next voiceID is the variable for the the Polly voice requested, and textToConvert the URL encoded text that needs to be converted.

To make the request to AWS Polly, the pollyParameters object needs to be populated, this consists of the chosen voice and the text to convert. MP3 has been fixed in this example.

Now the application is ready to call Polly,

polly.synthesizeSpeech(pollyParameters, pollyCallback);

here the app passes the parameters as well as a callback that will be invoked when the job is finished.

Once Polly has finished generating the audio file it will run the callback and (if successful) pass back the audio file.

The callback pollyCallback is now responsible for a two things; saving the file to disk and passing the file back to the users request.

Now we have a HTTP addressable endpoint we can integrate our audio files into Twilio’s TwiML, when Twilio makes a request to your Twilio application for TwiML you can now integrate this application into <Play> verbs. An example is:

When you compile your TwiML you will need to make sure that the text to speak has been url-encoded, otherwise Twilio will fail the TwiML for not being compliant.GitHub: https://github.com/dotmat/nodeJSPollyTwiML

Conclusion:

Integrating AWS Polly into your Twilio apps is now fast and easy. With a HTTP request you can request a desired voice convert text into an audio file which can be used with Twilio to play back to a caller / customer.

Betterments:

At present time, the application is unsecured, anyone with access to your app / domain could quickly start using polly to increase your AWS spend. While Polly is very cheap ($0.000004 per character, or about $0.004 per minute of generated audio) its still something that should be addressed. I would recommend implementing some kind of auth that can check against known users database (Basic Auth for example)

For common messages that you might use over and over its pointless to keep generating this for one time use, if you had a prebuilt database of common messages that your application uses, you could reference these from the API. In the GitHub repo I expanded the code to allow you to pull audio files by calling the MP3 file name.

The app currently does no house keeping of audio files, each time you make a request to Polly it will generate an audio file. A good tool next would be something that deletes audio files over than a X days or weeks.

I hope this tutorial has been helpful, please reach out if you have any issues or questions!