Your first Google Assistant skillHow to build conversational app for Google Home or Google Assistant

Smart home speakers, assistant platforms and cross-device solutions, so you can talk to your smartwatch and see the result on your TV or car’s dashboard. Personal assistants and VUIs are slowly appearing around us and it’s pretty likely that they will make our lives much easier.Because of my great faith that natural language will be the next human-machine interface, I decided to start writing new blog posts series and building an open source code where I would like to show how to create new kind of apps: conversational oriented, device-independent assistant skills which will give us freedom in platform or hardware we use.
And will bring the most natural interface for humans – voice.

This post is a part of series about building personal assistant app, designed for voice as a primary user interface. More posts in series:

WaterLog assistant skill

In this post we’ll start with the simples implementation of assistant skill. WaterLog is an app which lets us track daily water intake by talking or writing in natural language directly to Google Assistant. First version of the app will have ability to log how much liters or milliliters of water we have drunk during the day.

For the sake of simplicity we’ll skip theory behind VUI design focus only on technical aspects of how to build fully working implementation.

Here are scenarios of possible conversations (happy paths):

New user

User: Ok Google, Talk to WaterLogWaterLog:Hey! Welcome to Water Log. Do you know that you should drink about 3 liters of water each day to stay healthy? How much did you drink so far?User: I drunk 500ml of waterWaterLog: Ok, I’ve added 500ml of water to your daily log. In sum you have drunk 500ml today. Let me know when you drink more! See you later.

Returning user

User:Ok Google, Talk to WaterLogWaterLog: Hey! You have drunk 500ml today. How much water should I add now?User: 100mlWaterLog: Ok, I’ve added 100ml of water to your daily log. In sum you have drunk 600ml today. Let me know when you drink more! See you later.

Returning user asking for logged water

User: Ok Google, Ask WaterLog how much water have I drunk today?WaterLog: In sum you have drunk 600ml today. Let me know when you drink more! See you later.

In case you would like to test this skill on your device, it’s available live in Google Assistant directory, or on website:

Getting started

The app is extremely simple but even this kind of project still requires to tie some pieces together to make it working. While we have a lot of freedom when it comes to platform selection (we could build our app in many different languages and host it on any cloud solutions like Google Cloud Platform or Amazon Web Services), at the beginning we choose the most recommended tech stack:

When it’s done, you will be asked to pick a tool or platform to build assistant skill. Like I said, it’ll be Dialogflow. If you do it right, your apps (Actions and Dialogflow) should be connected. You can check this in Dialogflow agent setting (see Google Project property):

Dialogflow agent

First big piece of our assistant app is conversational agent, which is built on Dialogflow platform in our case. The most important role of it is to understand what user says to our app and convert natural language sentence into actions and properties which can be handled by our code. And this is exactly what Dialogflow Intents do.
According to the documentation:

An intent represents a mapping between what a user says and what action should be taken by your software.

Let’s start defining our intents. Here are the list of sentences which we would like to handle:

Default Fallback Intent

The only one which we leave untouched for now. Like the name says, this intent is triggered if a user’s input is not matched by any of the regular intents or enabled domains. Documentation. It’s worth mentioning that this intent isn’t even passed into our application code. It’s entirely handled by Dialogflow platform.

welcome_user

Event used to greet our user. It’s used always when user ask for our app (e.g. Ok Google, talk to WaterLog) without any additional intention.

— Config —
Action name: input.welcome
Events: WELCOME, GOOGLE_ASSISTANT_WELCOME — events are additional mappings which allow to invoke intents by an event name instead of a user query.Fulfillment: ✅ Use webhook — Intent welcome_user will be passed to our backend.

log_water

Event is used to save how much water user would like to log during the conversation. There will be a couple cases which we would like to handle in the same way. Let’s list some of them:

Ok Google, Talk to WaterLog to log 1 liter of water — intent is triggered immediately when user invoke our action. In this case welcome intent is skipped. More about assistant invocation can be found in Google Actions documentation.

Log 500ml of water — told in the middle of conversation, when app is waiting for user’s input.

500ml — usually as an answer for assistant question:WaterLog: …how much water did you drink today?User:500ml

To handle similar cases we need to provide example utterances which could be told by users. Examples then are used by Dialogflow Machine Learning to teach our agent to understand user input. The more examples we use, the smarter our agent becomes.

Additionally we need to annotate fragments of our examples which needs to be handled in special way, so e.g. our app knows that utterance:

I have drunk 500ml of water

contains number and units of volume of water that has been drunk. All we have to do is to select fragment and pick correct entity (there are plenty of built-in entities, see the documentation).

Fulfillment: ✅ Use webhookGoogle assistant: ✅ End conversation — pick this to let Google Assistant know that conversation should be finished here.

get_logged_water

Event used to user how much water he or she has drunk in current day. Similarly to log_water, there different ways to invoke this intent:

Ok Google, ask WaterLog how much water did I drink today? — called instead of welcome intent when the action is known,

How much did I drink? — asked in the middle of conversation with our app.

— Config —
Action name: get_logged_water
User says:

Fulfillment: ✅ Use webhookGoogle assistant: ✅ End conversation

And that’s it for Dialogflow configuration for now. If you would like to see full config, you can download it and import into you agent from the repository(WaterLog.zip file).

The code

If you followed Actions on Google guide (Build fulfillment), you should already have basic code structure, deployed fulfillment into Firebase Cloud Functions and connected it with Dialogflow agent through fulfillment config.
Now let’s build a code for WaterLog app. Repository with final implementation is available on Github:

In our Cloud Function we defined mapping of Intents into functions which need to be called as a fulfillment for conversation.
As an example let’s see conversation.actionLogWater() (fulfillment for log_water Intent):

Unit testing

While this paragraph isn’t directly connected with assistant apps or voice interfaces, I believe it’s still extremely important in each kind of software we build. Just imagine that every time you change something in the code, you need to deploy function and start conversation with you app. In WaterLog app it was relatively simple (but still it took at least tens of deployments). In bigger apps it will be critical to have unit tests. It will speed up development time by order of magnitude.

All unit tests for our classes can be found under functions/test/ directory. Tests in this project aren’t extremely sophisticated (they use sinon.js and chailibraries without any additional extensions) they still helped a lot with going to production in relatively short time.