Voice controlled PHP apps with API.ai

In this tutorial we’ll be looking into Api.ai, an API that lets us build apps which understand natural language, much like Siri. It can accept either text or speech as input, which it then parses and returns a JSON string that can be interpreted by the code that we write.

Concepts

Before we move on to the practical part, it’s important that we first understand the following concepts:

agents – agents are applications. We create an agent as a means of grouping individual entities and intents.

entities – entities are custom concepts that we want to incorporate into our application. They provide a way of giving meaning to a specific concept by means of adding examples. A sample entity would be ‘currency’. We define it by adding synonyms such as ‘USD’, ‘US Dollar’, or just ‘Dollars’. Each synonym is then assigned to a reference value that can be used in the code. It’s just a list of words which can be used to refer to that concept. Api.ai already provides some basic entities such as @sys.number, which is an entity referring to any number, and @sys.email which is an entity referring to any email address. We can use the built-in entities by specifying @sys as the prefix.

intents – intents allow us to define which actions the program will execute depending on what a user says. A sample intent would be ‘convert currency’. We then list out all the possible phrases or sentences the user would say if they want to convert currency. For example, a user could say ‘how much is @sys.number:number @currency:fromCurrency in @currency:toCurrency?’. In this example, we’ve used 2 entities: @sys.number and @currency. Using the colon after the entity allows us to define an alias for that entity. This alias can then be used in our code to get the value of the entity. We need to give the same entity a different alias so that we could treat them separately in our code. In order for humans to understand the above intent, all we have to do is substitute the entities with actual values. So a user might say ‘How much is 900 US Dollars in Japanese Yen?’ and Api.ai would just map ‘900’ as the value for @sys.number, ‘US Dollar’ for the fromCurrency @currency and ‘Japanese Yen’ for the toCurrency @currency.

contexts – contexts represent the current context of a user expression. For example, a user might say ‘How much is 55 US Dollars in Japanese Yen?’ and then follow with ‘what about in Philippine Peso?’. Api.ai, in this case, uses what was previously spoken by the user, ‘How much is 55 US Dollars,’ as the context for the second expression.

aliases – aliases provide a way of referring to a specific entity in your code, as we saw earlier in the explanation for the intents.

domains – domains are pre-defined knowledge packages. We can think of them as a collection of built-in entities and intents in Api.ai. In other words, they are tricks that Api.ai can perform with little to no setup or coding required. For example, a user can say, ‘Find videos of Pikachu on YouTube.’ and Api.ai would already know how to parse that and returns ‘Pikachu’ as the search term and ‘Youtube’ as the service. From there, we can just use the data returned to navigate to Youtube and search for ‘Pikachu’. In JavaScript, it’s only a matter of setting the location.href to point to Youtube’s search results page:

To use domains for your agent, select your agent from the console and then click on the domains menu at the top. From there, enable the domains knowledge base and fulfillment. Note that domains are currently in beta, but you can always use the API console to test them.

Enabling the domains knowledge base enables the domains functionality. Enabling the fulfillment enables the use of third-party services such as Small Talk and Weather. This means that we won’t need to make a separate request to a specific API if the service that we need already integrates with Api.ai.

Getting the Current Time in a Specific Place

Now that we have an understanding of the main concepts, we can proceed with building a simple app. The first thing that we’re going to build is an app for getting the current time in a specific place.

Next, go to the agents page and create a new agent by clicking on the ‘Create Agent’ button. Once in the page for creating a new agent, enter the name, description, and language, and save.

This gives you the subscription key, developer access token and client access token. You can use these to make requests to the API, either from the client (browser) or from the server. One advantage of making the requests from the server is keeping your credentials hidden.

The agent that we’ve created will be using domains. This means that we do not need to set up entities and intents. What we need is a little help from two Google APIs: Geocoding API and Timezone API. Geocoding API is used to convert the location that we get from Api.ai into coordinates. We then use these coordinates to query the Timezone API to get the current time for that location. Go to your Google Console and enable the Timezone API. The Geocoding API doesn’t require an API key to be supplied, so we don’t need to enable it.

Next, install Guzzle. We will be using Guzzle 5 to make a request to Api.ai.

composer require guzzlehttp/guzzle:~5.0

Then, create a new PHP file (time.php) and add the following code so we can use Guzzle from our file.

Naturally, in a real app, you’d probably keep credentials outside of app logic, in some kind of local configuration file.

We can now make a request to Api.ai. To make a request, we need to pass in the developer access token and subscription key as headers. We then pass in the body of the request as JSON. The request body should contain the query and the lang keys. The query is submitted from the client-side through a POST request. An example of a query for this app would be “What time is it in Barcelona, Spain?” or “What’s the current time in Ikebukuro, Japan?”. The response returned is a JSON string so we convert it to an array by calling the json method on the $response.

If we get a status code of 200, it means the request was successful. The data that we need are stored in the result item. In this case, we only need to extract the location from the parameters. If a location isn’t returned, then we just tell the user that the location isn’t found.

if(!empty($result['result'])&&!empty($result['result']['parameters']['location'])){
$location = $result['result']['parameters']['location'];}else{
echo "Sorry, I could not find that location.";}

If a location is found, we make a request to the Google Geocoding API to convert the location to coordinates. If the status is OK, this means that we got a result. So we just extract the latitude and longitude values from the first result.

Next, we get the current unix timestamp. We pass this value along with the latitude and longitude as a query for our request to the Google Timezone API. We then extract the timeZoneId which we can use to temporarily set the timezone using the date_default_timezone_set method. Finally, we just output the formatted time to the user.

Let’s explain the above file. First is the global variable that we will use to store the current speech recognition object.

var recognition;

Next is the startRecognition method. What this does is create a new speech recognition object. This will ask the user to use the microphone. Next, we set the language to English and start the speech recognition. We then listen for the onstart event. When this event is triggered, it means that the speech recognition has started. We call the updateRec method when this happens, which changes the text of the button for starting and stopping speech recognition. We also listen for the onresult event which is triggered when the user has stopped speaking for a couple of seconds. This contains the results of the speech recognition. We have to loop through the results and use the transcript item in each one to get the text that we need. Once that’s done, we call the setInput method which changes the value of the query text field and calls the send method that submits the query to the server. Next, we call the stopRecognition method to stop the speech recognition and update the UI. We also need to do the same in the onend event.

Most of the code in the speech-recognition.js file is from this gist which shows an example of how to use Api.ai on the client-side.

Next is the main.js file where we submit our query to the server. Once we get a response, we use responsive-voice to speak it out and also output it in the response container. That way, we can check the response visually.

The ‘user says’ section is where we define examples of what the user can say to trigger this specific intent. What we’re doing here is using entities as substitutes for actual values that the user might use. @sys.number can refer to any number. @currency can refer to any currency that we added earlier when we created the currency entity. Using the colon after the entity allows us to assign an alias to it. This alias can then be used to get the value used by the user in the code.

The ‘action’ section is where we define the action or method that we want to execute if this specific intent is used. In this case we won’t define anything because we’re only creating an app which only does one thing.

The ‘fulfillment’ section is where we define a template to the speech that we want to output once the intent is used. For example, we can put the following:

$number $fromCurrency is equivalent to $result $toCurrency

This will then be available on the speech item in the result that we get. From there, we can perform string replacement to replace those variables with the actual values that we get. But let’s just leave it as blank for this app.

Once you’re done, click on the ‘save’ button to save the intent.

Now we’re ready to proceed with the code. Create an exchange-rate.php file in your working directory then, add the following code:

As you can see from the code above, it’s basically the same as we did earlier in our previous app. Only this time we’ve added the $currencylayer_apikey variable. This stores the API key that we got from currencylayer.com, an API which allows us to get the current exchange rate from one currency to another. If you wish to follow along, go ahead and sign up for an API key.

Next, we check if there are any results and extract the data that we need. In this case, we need to get the currency that the user wishes to convert, the currency to convert it to and then the amount.

Conclusion

In this tutorial, we have learned how to use Api.ai for creating voice-enabled PHP apps. Browser support is still pretty limited because the Web Speech API still isn’t widely implemented. But Api.ai supports other platforms aside from the Web. Android, Cordova, .Net, iOS are a few examples. This means that we can use Api.ai without worrying about support on these platforms. Be sure to check out their docs if you want to learn more. The files that we’ve used in this tutorial are available in this Github repository.

Wern is a web developer from the Philippines. He loves building things for the web and sharing the things he has learned by writing in his blog. When he's not coding or learning something new, he enjoys watching anime and playing video games.