How We Built a Google Home Office Assistant

A Connected Butler Story

We do a lot of work with Conversational Interfaces at Connected Lab. We even hosted the first Amazon Echo and Alexa Hackathon in Canada. The recently released Google Home is a voice-activated speaker similar to the Amazon Echo so we were excited to spend a week experimenting with Google Home and its developer API.

After some initial experimentation with the Google-provided samples, we decided to build Connected Butler, a voice assistant that would greet office guests and inform the Connected employee of their arrival through Slack.

In order to build custom applications for Google Assistant, which powers Google Home, we used Google Home and the Google Actions developer tools. This post will highlight our experience using Conversation Actions to build an application that interacts with a user through a voice interface.

How Connected Butler Works

The guest interfaces with the Connected Butler app using Google Home, whileGoogle Home (Actions on Google) interfaces with an “Agent” hosted on API.AI. API.AI allows us to quickly build up a simple voice conversation and pull out the data we need. This includes getting the guest’s name and finding out who at Connected Lab they are visiting.

API.AI defines the conversation state logic, handles natural language processing, and hooks into our custom web app. In our case, the web app is hosted on AWS, but it can be hosted anywhere as long as it has an exposed web-facing endpoint. Our web app, written in Node.js, handles any non-trivial logic as well as Slack integration to notify Connected Lab team members about a guest arriving to see them.

The list of Connected Lab employees is simply a list of Slack users extracted through the Slack API via the web app component. While API.AI simplifies voice interaction with the guest and parses out information like a guest’s name and the host’s name, the main challenge is matching the host’s name picked up by Google Home and matching that to an actual Connected Lab team member name from the list.

This is how we match what the guest said to a first and last name on that list:

Step 1:We use the Double-Metaphone phonetic algorithm to generate 2 phonetic codes for an Connected Lab employee’s first name, and also their full name to be used as Map keys. The Map Values are sets of Slack user objects which correspond to those codes.

For my name “Leo Kaliazine” I have two phonetic codes: “LKLSN” and “LKLTSN”. The map keys “LKLSN” and “LKLTSN” would both point to the same Slack user object.

Step 2:We then use the Double-Metaphone phonetic algorithm again for the host’s name received from API.AI.

Step 3:We then attempt to find a match between the user input codes and the generated Maps. If a successful match is made, we message that host in Slack notifying that they have a guest waiting for them.

Possible outcomes include:

If we find a single match:We send the host the notification message.

If we find several matches, e.g. the guest spoke only the host’s first name, and there are several employees with that name at the company:We tell the guest that we found several matches, list their full names, and ask for the guest to repeat the full name of the person they’re here to see.

If no exact double-metaphone match is found:We calculate the Jaro-Winkler distance (a measure of similarity between two strings) between the double-metaphone user input codes from Step 2 and Map keys from Step 1.a) If distance is > 90%, we consider that a match and we send the host the notification message.b) Otherwise, we list the full names of the top 4 matches to the guest and ask them to say the full name of the person they are visiting. Then we essentially jump back to Step 2 but this time the algorithm only matches against full names.

If all else fails:A message is sent to a dedicated Slack channel monitored by the Operations team.

One of the biggest challenges faced when building Connected Butler is matching the host’s name said by the guest to an actual Connected team member. API.AI has a first/last name speech parsing feature (accessible via API.AI’s predefined entities @sys.given-name and @sys.last-name). The list of given names it recognizes is based on 2500 from the U.S. Social Security Administration (SSA) popular names list, which doesn’t cover all of our employee’s names. To work around this, we use the @sys.any entity to allow us to accept less common names.

Since we use @sys.any, Google Home has trouble recognizing some non-english names. For example, a Connected Lab employee whose name is “Uzair”, pronounced o͞oze(ə)r, is often interpreted by Google Home as “Who’s there”.Solution: Use Jaro-Winkler to measure of similarity between two strings to find closest phonetic matches.

Some employee names create double-metaphone collisions with other names. For example, the double metaphone code for “Dan Lee” (“TNL”) is the same as that for “Daniel”.Solution: If a second round of matching is necessary, we list the potential full name matches to the guest, then match the guest’s response against full names only. This creates less double-metaphone collisions.

Other Considerations

Some considerations when building an application for Google Assistant:

Although the Google Actions SDK provides everything you need to build a Google Assistant app (with Conversation Actions), it does not manage user interaction for you. We highly recommended you use supported tools such as API.AI to simplify and speed up the conversation development experience.

API.AI has a free tier which should be good enough for many use cases, but does have some limitations. Production applications that require things like higher bandwidth, SLA, pre-built domains will need to upgrade to the paid “Preferred” tier.

API.AI is still in a relatively early development stage. Their web-based development interface does not currently have a version control system or support for multiple developer accounts for a single API.AI agent. However, it’s relatively simple to export/import the agent in a JSON format and store it in your own version control repository.

Conclusion

Google Actions together with API.AI allow us to quickly build Google Assistant conversation apps. API.AI is a very useful and powerful language processing tool, but it is still in the early stages of becoming a production development platform. It supports a large number of 3rd party integrations (e.g. Actions on Google, Alexa, Skype, Slack) which means you can build one app that converses with users on many different platforms!

Google is also working on Direct Actions, which will handle all user interaction for a large number of app categories (e.g. Music, Video, Messaging). In the future it should be much easier to integrate your product or service on devices running Google Assistant in the supported categories.

If you like this article, please favourite it and share it with your network.Stay connected by signing up for our mailing list here. Thanks for reading!

Leo is a Sr. Software Engineer at Connected. He is a passionate Android developer who loves all things Google. When he’s not coding, Leo is a legend on the volleyball court.