Search This Blog

Wednesday, 3 May 2017

Chatbots that get smarter at #AtlassianSummit

I’m in Barcelona at the moment, at AtlasCamp giving a talk about helpdesk chatbots that get smarter.

It’s easy to write a dumb chatbot. It’s much harder to write a smart one that responds sensibly to everything you ask it. Some famous examples: if a human mentions Harrison Ford, they a probably not talking about a car.

There are three different kinds of chatbot, and they are each progressively harder to get write.

The simplest chatbots are just a convenient command-line interface: in Slack or Hipchat, there are usually “slash” commands. Developers will set up a program that wakes up to “/build” whenever it is entered into a room that pulls the latest sources out of git, compiles it and shows the output of the unit tests. Since this is a very narrow domain, it’s easy to get right, and as it is for the benefit of programmers, it’s always cost-effective to spend some programmer time improving anything that isn’t any good.

The next simplest are ordering bots, that control the conversation by never letting the user deviate from the approved conversational path. If you are ordering a pizza, the bot can ask you questions about toppings and sizes until it has everything it needs. Essentially this is just a replacement for a web form with some fields, but in certain markets (e.g. China) where there are near-universal chat platforms this can be quite convenient.

The hardest are bots that don’t get to control the conversation, and where the user might ask just about anything.

Support bots are examples of that last kind: users could ask the helpdesk just about anything, and the support bot needs to respond intelligently.

I did a quick survey and found at least 50 startups trying to write helpdesk bots of various kinds. It’s a lucrative market, because if you can even turn 10% of helpdesk calls into a chat with a bot, that can mean huge staff cost savings. I have a customer with over 150 full-time staff on their servicedesk -- there are millions of dollars of savings to be found.

Unfortunately, nearly every startup I’ve seen has completely failed to meet their objectives, and customers who are happy with their investments in chatbots are actually quite rare.

I’ve seen three traps:

Several startups have lacked the courage to believe in their own developers. There’s a belief that Microsoft, Facebook, Amazon, IBM and Google have all the answers, and that if we leverage api.ai or wit.ai or lex or Watson or whatever they’ve produced this month that there’s just a simple “helpdesk knowledge and personality” to put on top of it, like icing on a cake. Fundamentally, this doesn’t work: for very soud commercial reasons the big players are working on technology for bots that replace web forms and with that bias comes a number of limiting assumptions.

A lot of startups (and larger companies) believe that if you just scrape enough data from the intranet -- analyse every article in Confluence for example -- that you will be able to provide exactly the right answer to the user. Others take this further and try to scrape public forums as well. This doesn’t work because firstly, users often can’t explain their problem very well, so there’s not enough information up front even to understand what the user wants; and secondly... have you actually read what IT people put into their knowledge repositories?

There are a lot of different things that can go wrong, and a lot of different ways to solve a problem. If you try to make your support chatbot fully autonomous, able to answer anything, you will burn through a lot of cash handling odd little corner cases that may never happen again.

The most promising approach I’ve seen was one taken by a startup that I was working with late last year. When they decided to head in another direction, I bought the source code back off them.

The key idea is this: if our support chatbot can’t answer every question -- as indeed it never will -- then there has to be a way for the chatbot to let a human being respond instead. If a human being does respond, then the chatbot should learn that that is how it should have responded. If the chatbot can learn, then we don’t need to do any up-front programming at all, we can just let the chatbot learn from past conversations. Or even have the chatbot be completely naive when it is first turned on.

The challenge is that in a support chat room, it’s often hard to disentangle what each answer from the support team is referring to. There are some techniques that I’ve implemented (e.g. disentangling based on temporal proximity, @ mentions and so on). A conversative approach is to have a separate bot training room where only cleanly prepared conversations happen. Taking this approach means that we substitute expensive highly-paid programmers writing code to handle conversations and replace them with an intern writing some text chats.

It’s actually not that hard to find an intern who just wants to spend all day hanging out in chat rooms.

Whatever approach you take, you will end up with a corpus of conversations: lots of examples of users asking something, getting a response from support, clarifying what they want, and then getting an answer.

Predicting the appropriate thing to say next becomes a machine learning problem: given a new, otherwise unseen data blob, predict which category it belongs to. The data blobs are all the things that have been said so far in the dialog, and the category is whatever it is that a human support desk agent is most likely to have said as a response.

There is a rich mine of research articles and a lot of well-understood best practice about how to do machine learning problems with natural language text. Good solutions have been found in support vector machines, LTSM architectures for deep neural networks, word2vec embedding of sentences.

It turns out that techniques from the 1960s work well enough that you can code up a solution in a few hours. I used a bag-of-words model combined with logistic regression and I get quite acceptable results. (At this point, almost any serious data scientist or AI guru should rightly be snickering in the background, but bear with me.)

The bag-of-words model says that when a user asks something, you can ignore the structure and grammar of what they’ve written and just focus on key words. If a user mentions “password” you probably don’t even need to know the rest of the sentence: you know what sort of support call this is. If they mention “Windows” the likely next response is almost always “have you tried rebooting it yet?”

If you speak a language with 70,000 different words (in all their variations, including acronyms), then each message you type in a chat gets turned into an array of 70,000 elements, most of which are zeroes, with a few ones in it corresponding to the words you happen to have used in that message.

It’s rare that the first thing a support agent says is the complete and total solution to a problem. So I added a “memory” for the user and the agent. What did the user say before the last thing that they said? I implemented this by exponential decay. If your “memory” vector was x and the last thing you said was y then when you say z I’ll update the memory vector to (x/2 + y/2). Then after your next message, it will become (x/4 + y/4 + z/2). Little by little the things you said a while ago become less important in predicting what comes next.

Combining this with logistic regression, essentially you assign a score for how strong each word is in each context as a predictor. The word “password” appearing in your last message would score highly for a response for a password reset, but the word “Windows” would be a very weak predictor for a response about a password reset. Seeing the word “Linux” even in your history would be a negative strength predictor for “have you tried rebooting it yet” because it would be very rare for a human being to have given that response.

You train the logistic regressor on your existing corpus of data, and it calculates the matrix of strengths. It’s a big matrix: 70,000 words in four different places (the last thing the user said, the last thing the support agent said, the user’s memory, and the support agent’s memory) gives you 280,000 columns, and each step of each dialog you train it on (which may be thousands of conversations) is a row.

But that’s OK, it’s a very sparse matrix and modern computers can train a logistic regressor on gigabytes of data without needing any special hardware. It’s a problem that has been well studied since at least the 1970s and there are plenty of libraries to implement it efficiently and well.

And that is all you have to do, to make a surprisingly successful chatbot. You can tweak how confident the chatbot needs to be before it speaks up (e.g. don’t say anything unless you are 95% confident that you will respond the way that a support agent will). You can dump out the matrix of strengths to see why the chatbot chose to give an answer when it gets it wrong. If it needs to learn something more or gets it wrong, you can just give it another example to work with.

It’s a much cheaper approach than hiring a team of developers and data scientists, it’s much safer than relying on any here-today-gone-tomorrow AI startup, and it’s easier to support than a system that calls web APIs run by a big name vendor.

If you come along to my talk on Friday you can see me put together the whole system on stage in under 45 minutes.