Josh Marinacci

Head of Developer Evangelism

I'm the head of developer evangelism at PubNub.
At PubNub we run a Data Stream Network. It lets people
conduct realtime communication with extremely high security
and extremely low latency. We have a lot of customers doing
cutting edge things, so this gives me an inside seat at what
is coming next. And one of those things is chatbots.

Serverless Chatbots

[image of a chatbot]

So what are chatbots? At first you might think it's a little program
which you interact with through text. And that's true. Many if not
most are text driven, but not all. Chat bots can use voice too.

like amazon alexa. It's essentially a complex set of chatbots using
voice recognition. And these things are going to continue to evolve.
It won't be just voice. Some chatbots will have images, and video,
and maybe more.

Perhaps we'll have assistants in virtual reality and augmented reality.
I don't know what this visualization is showing. I don't think I'd want
barcharts so close to my eyes.

But i can tell you about a real example. We sponosred a hackathon
a few months ago and one of the teams made a chatbot which used photos of
sign language to convert into text then speech, in realtime. One day a system like this will be built into our phones or whatever VR/AR thing comes after phones.

AI + network = Chatbot

So we're going to see a lot of crazy things over the next
few years that will have nothing to do with text, but could
still be consider a chatbot. These all have a common thread.
They will all combine AI with the network. So to build one of
these we need these ingredients.

Your requirements

Domain Knowledge

Artificial Intelligence

Realtime Infrastructure

First you need your domain knowledge. This is the knowledge bout what your bot is actually doing. If it's a bot for news then it's the news sources. If it's a bot for ordering pizza then it's knowledge about the store locations and pizza ingredients. It's whatever your bot does.

Next you'll need some level of Artificial Intelligence. Some chatbots need full natural language processing. some need a backend end with a rich neural net. Some just need a glorified phone tree. Maybe just some if statements. It really depends on what you are doing.

When you make a chatbot you also need some sort of realtime infrastructure. A chatbot involves constant communication between the end user and your bot, possibly with other platform proxies in the way, and possibly with other webservices which provide the knowledge or AI that the chatbot needs. So you need some realtime infrastructure to tie all this together with very low latency and high security. This infrastructure is what ties everything together.

Your requirements, not your focus

Domain Knowledge

Artificial Intelligence

Realtime Infrastructure

But here's the thing. These second two features are your requirements but they are not your focus. Imagine you are building a physical robot. Electricity is a requirement, but it's not your focus. You aren't going to invent a new battery or a new charger circuit. That's not the problem you are actually trying to solve. It's the same with chatbots. The bot's infrastructure and AI are required but they are not your focus. Only the Domain Knowledge is your focus.

Don't build your own infrastructure

Don't build your own AI

Focus on your domain knowledge

So please, if you remember nothing else from this talk I want you to remember
this slide.

Demo Time

So that's enough high level stuff. Let's look at a simple chatbot.

EmojiBot

This is the EmojiBot. It fixes what you type in by making it heartier.

now lets dive into some code. this is what makes the emoji bot work.
This is my AI. now, I'm using AI loosely here. It's just a search and replace.
So the code is simple. What's more interesting is where does this code run.

HeartBot Architecture

The code runs in the realtime infrastructure. I'm using PubNub and BLOCKS.
PubNub is realtime as a service. You publish a message to a channel. You subscribe to a channel. pubnub moves the messages around, anwyehere in the world in 250ms.

[code for publish a message and subscribe]

PubNub BLOCKS runs JS on the edge. What does that mean? As the message moves through the network code you write is executed on those messages. Where is the code run? It doesn't matter. It's always run in the part of the network closest to where the users are.

Serverless !== No Servers

This is serverless. Severless does not mean that there's no servers.
Obviously the code is running somwehere.

Serverless means coding at the level of a function or object, not the level of an application or server. We don't care about where it's hosted, what's the underlying OS. We just write code that responds to a request. push a button and it's deployed worldwide instantly. have 10 ten users? it works perfect. suddenly you get 10000 users? it still works. no code changes. serverless infrastructure is secure and scalable.

BTW: PubNub's free tier is free forever. 1 million messages a month. Unless you are going into production with an product, you'll never go through this quota.

Cloudinary

So that was the Emoji Bot. Let's look at something more complicated. Something that does something useful.

There's this cool company called Cloudinary. They make an image manipulation service. It's mainly meant for resizing and adjusting images for your blog. But with a little clever work I built a chatbot that manipulates images for you.

ImageBot

please reset the image

please show the image

upload image

please set the width to 500px

please auto-contrast and auto-sharpen

please make it square

please overlay acmelogo at the south west corner

now the really cool thing about this is that it's a group chat.
multiple people can collaborate on it at once. each giving
commands to the bot to adjust the image until the group likes
what it sees. And it will scale no matter how many users we have.

ImageBot Architecture

Here is the image bot. This one doesn't use a full NLP solution,
just some code I put together that looks for commands after
the trigger word 'please'.

When the user types in a message, it comes to the block. the block
sends it to a stateless webservice I wrote which turns the text into
a command, or returns false if it didn't see the stopwords.
The final result is a command. something like "action:resize";

The second part of the code uses the action to generate
a custom Cloudinary URL and finishes the publish.
All of this happens in the split second between
the time the user sent their message to the network
and when it was forwarded on to the destination.

It doesn't matter how many people are listening
to this channel. They will all be able to interact with the bot
and see what the other is doing just by listening to the same channel.

If they want their own conversation then they can just use different
channels. it's all the same code.

Notice in this code that the settings are stored in a context
object which is saved and loaded from the network. This is very important
because if different people are in different parts of the network their
action will be run by different instances of the code. het network
has a key value store which can save objects and sync them between nodes
so no matter where you are the context state will be at the node
closest to you.

ImageBot lesson

Don't built your own NLP

So the reason I showed you this demo is to convince you not to do what
I did. When I started this project I thought, it's not real nlp. It's
just some keyword recognition. As I added more and more features the
code grew and grew and became harder to understand.

in particular using a rules based approach like i did gets harder
to manage the precedence of the rules. As you add more features
the existing features become more fragile. This approach won't
scale.

The solution is don't build youur own AI. There are existing NLP
solutions that can easily handle this. So that brings me
to my next demo.

This is my favorite one. MR ROCKBOT is a chat bot who understands
only rock related facts and really terrible rock jokes.
let's try it out.
This is a chat app with a knowledge base of facts from wikipedia
and gifs from giphy. It also uses IBM Watson to process questions
as well as translate from french to english.
ask tell me a joke in french.
i'm sure it's hilarious if I spoke french.

MR ROCKBOT Architecture

So Mr Rockbot has a similar architecture. A message goes to the network compute block, which calls a series of IBM and wikipedia services to calculate the final answer. In this case I'm using IBM's Alchemy API for language translation and the Conversation API for the natural language part, which is far more effective than
the hand written code I did for the previous demo.

IBM Watson Text Alchemy API

Mr rockbot is the most complicated because he uses several services.
IBM has an amazing set of APIs under the Watson group.
The first thing I did was use the language detection and translation
APIs.

Here's the code. It first calls the language detection API to see if it's in english
or french. If it's in english then it sends it to the next stage of the pipeline. If it's
not in english then calls translate message which does another webservice call.

IBM Watson Text Alchemy API

The next part is processing the conversation now that everything is in english. In my original version I used IBM Alchemy language API to look at the input text. it pulls out entities and intents. Unfortunately this API is really meant for looking at larger documents, not tiny snippets of text.

IBM Watson Conversations

However, IBM recently introduced a new api specifically for conversational
interactions called, appropriately enough: Conversations. It was originally
called the Dialog Service, so you might see some tutorials with that name.
With Conversations you actually train the system by giving it examples
of the kinds of things you are looking for.
First you create 'intents.' These are the things the user could ask the chatbot to do for it. For example, if this was a home automation bot you might use an intent like 'turn on'. For this bot the intents were things like tell me a joke, tell me your favorite something, what is something. roughly these are your verbs.
Next you create entities. These are things the user could ask about, or are the
target of an intent. For example: in a home automation bot you might
use an entity like 'lights' or 'door'. You can also specify synonyms
so that it can recognize many forms of the request. For this bot I used
things like music and food, so you can ask 'what is your favorite music'.
Think of these like nouns that the chatbot can understand.
Finally you can create dialogs. These are workflows that the user
can go through. This lets you specify what the bot actually
says to the end user in different circumstances. If you arlready
have a knowledgebase of facts and responses then you can skip this part
and just use the intents and entities.

After you teach Watson about your problem domain you can call it from your serverless code using a simple HTTP POST. It's important to note that Conversations is a stateless API. In order to understand the context of a conversation you have to provide this context on each request using a context structure.

I'm storing this in PubNub BLOCKS, our serverless platform. But remember that this code will always be run on the edge nearest to the end user, and that user might move. So I store this context in our Key Value store which is eventually consistent. If the user moves to another part of the network the context will follow them.

Bonus!

Realtime isn't just for text

Don't build your own AI

Don't build your own realtime infrastructure.

Cloudinary

IBM Watson

PubNub

@joshmarinacci

josh@pubnub.com

Today I've shown you how to build three different chatbots using
serverless infrastructure and AI services. At no point did I have
to spin up a server or write multi-threaded code. That's the magic of serverless.
I also didn't have to have a PhD in Machine Learning to build these.
That's the magic of AI services.
So please. Don't make your own realtime service or AI. Focus on the
problem you are trying to solve, not infrastructure.
Thank you.