Links

A blog about Chat bots (mainly Watson), and life in general.

tips

A question recently asked was “How can I get Watson Conversation to repeat what I lasted asked it”? There are a couple of approaches to solve this, and I’d thought I would blog about them. First here is an example of what we are trying to achieve.

One thing to understand when going forward. Everything you build should be data driven. So while there are valid use cases where this is needed, it doesn’t mean it is needed for every solution you build, unless evidence exists otherwise.

Approach 1. Context Variable.

In this example we create a context variable at every node where we want the system to respond, like so:

This works but prevents easily creating variations of the response. On the plus side you can give normal responses, but when the user asks to repeat, it can give a fixed custom response.

Approach 2. Context variable everything!

Similar to the last approach except rather than creating the context variable in the context area, you build on the fly. So something like so:

This allows you to have custom responses. A disadvantage (all be it minor) is you are increasing the chance of a mistake in your code happening. Each response is adding 4 bytes to your overall skill/workspace size. This means nothing for small workspaces, but when you are enterprise level you need to be careful.

“Tools are the subtlest of traps”

Most developers learn the dangers of evil wizards, for everyone else it might not be so obvious. In the Watson tooling the purpose is democratize AI, that is to abstract the AI layer from your knowledge worker.

This allows you to utilize the power of AI, without having to search for the mythical person who understands your business and NLP, AI, etc.

While this can remove a lot of the complexity, it can also lead people into a false sense of security that their hand will be held through the whole process.

So I am taking a time out to talk about Watson Knowledge Studio. For those that don’t know what it is. The tool allows you to annotate your domain documents, and surface structured insights from unstructured data. It is extremely powerful and easy to use versus other solutions out there.

The downside is that it is extremely easy to use. So I have a number of different people/companies rush in and create a model that disappoints, or in some cases infuriates. It’s not that this is unique to WKS, only that you can overlook important steps in your workflow.

Now IBM does do 3-4 days training, and there are a number of videos (slightly out of date) that cover some of this. But to help some people starting off, I am going to list the main pitfalls you need to watch out for when doing your first WKS project.

Understand what you need to surface from your data!

This happens so often with technical people. They look at the tooling, see how easy it is and run off and annotate the world in their documents.

What normally happens in this regard is a very poor model which surfaces information correctly, but that data is meaningless to the business.

Get your business analysts/SMEs from the start.

You need someone to objectively understand what is the business problem you are trying to solve. You need to look at your data sources and determine if you can even surface that information (ie. enough samples to train).

Limit your Types and Relationships on your first pass.

After you have looked at what you want to surface, you need to focus on a small few number of types and relationships. Your BA/SME might have picked 50-100, but generally you should pick in around 20-40. The are reasons for this.

Each type/relationship adds more work for your human annotator.

Models can build faster.

As you work through documents you will find that your needs for types/relationships may change.

Don’t reinvent the wheel.

If you have existing annotators that will work as-is, don’t try and integrate them to your model. They may be part of your business requirement, but all you are doing is adding complexity to your model. You can run a second pass on your finished data to get that information.

Understand when to use rule based versus model based.

The purpose of the AI model is to have it train and understand content it has never seen before. To do this requires a lot of up front work on annotation and training.

Compare this to the rule based model. If you know new terms/phrases may not come up, but the nature of how they may be written changes, then rule based may solve your issue.

Personally the AI model is the better choice if you plan to go with WKS. There is easier tooling for rule based. For example Watson Explorer Studio.

Inter-Annotator Agreement is King.

Two things to realize before you start annotating.

Just because you are an expert in the content, doesn’t mean you are an expert at annotating.

The more subject matter experts you have, the less agreement on topics will happen in the real world.

To that end, you need to clearly define your inter-annotator agreement (IAA) so there is no ambiguity or disagreement. Have examples, and also have a single SME as the deciding factor where further disagreements occur.

Not creating a proper IAA can lead to more work to your main SME, and damage your model to the extent of hours/days of wasted work.

Data Wrangling is required.

Most of the work in formatting your data is to keep your human annotator sane. Annotating a document is a mentally exhausting process that normally follows these steps.

Read and understand the paragraph.

Annotate the paragraph with types and relationships.

Read and annotate the co-references.

Fix mistakes as you go.

You want to reduce the amount of time to do this for each document, and working set. So if there is information that isn’t required to annotate, remove it. If your document is very small, then join together (with some clear marker of a new document).

You want your document to be to be annotated in 30 minutes or so, and your document set in a day. This will allow you to progress at a reasonable speed, and build frequent models.

On top of this, you should also look at sourcing any dictionaries/terms that can be used to kickstart the annotation. (which most people do)

Lastly, check to see how your documents are ingested into WKS. For example I’ve seen instances of “word.word”. WKS sees this as a single term, and fixing that annotation can be annoying. It may be you need to do some formatting, or limit these mistakes.

Build your model soon and often.

You can’t really see what you are doing wrong until 2-3 models in. So it is important to build these models as soon as you can.

To that end I would recommend building a model as soon as you have a working set completed. Try to have sets be annotated within 1-2 days max, at least at the start of the project.

You can quickly see where the IAA is lacking, and if you need to change types/relationships or even data. Doing this sooner than later prevents technical debt of fixing the model.

Let the model work for you.

Once you have gotten 3-4 models created, and you are comfortable with some of the scoring, have it pre-annotate future working sets. It will reduce the mental requirements for the human annotators.

However! If you still have 1-2 of the three areas performing badly, recommend to your human annotator to just delete the poor performing part and redo. For example if Co-Reference doesn’t work well, just delete all co-references and redo. This is considerably faster than trying to manually fix every annotation error.

…

So I hope this helps those in their first journey into using WKS. Be aware that this is by no means a full tutorial.

So this is a long time pet peeve, but recently I have seen a load of these in succession. I am sure a lot of people who know me are going to read this and think “He’s talking about me”. Truth is there is no one person I am pointing my finger at.

Let me start with what triggered this post. Have a look at this screen shot. There are three things wrong with it, although one of the reasons is not visible, but you can guess.

So disclosure, this is a competitors chat bot, it is also a common pattern I have seen on that chat bot. But I have also seen people do this with Watson Conversation.

Did you guess the issues?

Issue 1: Never ask the end user did you answer them correctly or not. If your system is well trained, and tested then you are going to know if it answered well or not.

Those who think of a rebuttal to this, imagine you rang a customer support person and they asked you “Did I answer you correctly” every time they gave an answer? What would your action be? More than likely you would ask to speak to someone who does know what they are talking about.

If you really need to get feedback, make it subtle, or ask for a survey at the end.

Issue 2: BUTTONS. I don’t know who started this button trend, but it has to die. You are not building a cognitive conversational system. You are building an application. You don’t need an AI for buttons, any average developer can build you a button based “Choose your own adventure“.

Issue 3: Not visible on the image is that you are stuck until you click on yes or no. You couldn’t say yes or no, or “I am not sure”. For that matter I have seen cases where the answer is poorly written and the person would take the wrong answer as right, so what happens then? For that matter selecting yes or no does nothing to progress the conversation.

So what is the root cause in all this? From what I have seen normally it is one thing.

Developers.

Because older chat bots required a developer to build, it has sort of progressed along those lines for some time. In fact some chat bot companies tout the fact that it is developer orientated, and in some cases only offer code based systems.

I’ve also gotten to listen to some developers tell me how Watson Conversation sucks (because “tensor flow”), or they could write better. I normally tell them to try.

Realistically to make a good chat bot, the developer is generally far down the food chain in that creation. Watson conversation is targeted at your non-technical person.

Heres a little graphic to help.

Now your chances of getting all these is hard, but the people you do get should have some skills in these areas. Let’s expand on each one.

Business Analyst

By far the most important, certainly at the start of the project.

Most failed chat bot projects are because someone who knows the business hasn’t objectively looked at what it is you are trying to solve, and if it is even worth the time.

By the same token, I have seen two business analysts create a conversational bot that on the face of it looked simple, but they could show that it saved over a million euros a year. All built in a day and a half. Because they knew the business and where to get the data.

Conversational Copywriter

Normally even getting a copywriter makes a huge difference, but one with actual conversational experience makes the solution shine. It’s the difference between something clinical, and something your end user can make an emotional attachment to.

Behavioural Scientist

Another thing I see all the time. You get an issue in the chat conversation that requires some complexity to solve. So you have your developer telling you how they can build something custom and complex to solve the issue (probably includes tensor flow somewhere in all of it).

Your behavioural expert on the other hand will suggest changing the message you tell the end user. It’s really that simple, but often missed by people without experience in this area.

Subject Matter Expert (SME)

To be fair, at least on projects I’ve seen there is normally an SME there. But there are still different levels of SMEs. For example your expert in the material, may not be the expert that deals with the customer.

But it is dangerous to think that just because you have a manual you can reference, that you are capable to building a system that can answer questions as if it is an SME.

Data Scientist

While you might not need a full blown one, all good conversational solutions are data driven. In what people ask, behaviours exhibited and needs met. Having someone able to sift through the existing data and make sense of it, helps make a good system.

Also almost every engagement I’ve been on, people will tell you what they think the end user will say or do. But often it is never the case, and the data shows this.

UI/UX

What the conversational copywriter does for the engaging conversation, the UI/UX does for the system. If you are using existing channels like Facebook, Skype, Messenger, Slack, etc.. then you probably don’t need to worry as much. But it’s still possible to create something that can upset the user without good UX experience.

It’s also a broad skill area. For example, UX for Web is very different to Mobile, IVR, and Robots.

Machine Learning

Watson conversation abstracts the ML layer from the end user. You only need to know how to cluster questions correctly. But knowing how to do K-Fold cross validation, or the importance of blind sets helps in training the system well.

It also helps if your developers have at least a basic understanding of machine learning.

I often see non-ML developers trying to fix clusters with comments like. “It used this keyword 3 times, so that’s why it picked this over that”, which is not how it works at all.

It also prevents your developers (if they code the bot) to create something that is entity heavy. Non-ML Developers seem to like entities, as they can wrap their head around them. Fixed keywords, regex, all makes sense to a developer, but in the long run make the system unmaintainable (basically defeats the purpose of using Watson conversation).

Natural Language Processing (NLP)

I’ve made this the smallest. There was a time, certainly with the early versions of Watson you needed these skills. Not so much anymore. Still, it’s good to understand the basics of NLP, certainly for entities.

Developer

In the scheme of things, there will always be a place for the developer.

You have UI development, application layer, back-end and integration, automation, testing, and so on.

Just development skills alone will not help you in building something that the end user can feel a connection to.

… and please, stop using buttons.

One of common requests for conversation is being able to understand the running topic of a conversation.

For example:

USER: Can I feed my goldfish peas?

WATSON: Goldfish love peas, but make sure to remove the shells!

USER: Should I boil them first?

The second response “them” is called an “anaphora”. The “them” refers to the peas. So you can’t answer the question without first knowing the previous question.

On the face of it, it looks easy. But you have “goldfish”, ‘peas”, ‘shells” which could potentially be the reference, and no one wants to boil their goldfish!

So the tricky part is determining the topic. There are a number of ways to approach this.

Entities

The most obvious way is to determine what entity the person mentioned, and store that for use later. This works well if the user actually mentions an entity to work with. However in a general conversation, the subject of the conversation may not always be by the person who asks the question.

Intents

When asking a question and determining the intent, it may not always be that an entity can be involved. So this has limited help in this regards.

That said, there are certain cases where intents have been used with a context in mind. So it can be easily done by creating a suffix to the intent. For example:

#FEEDING_FISH_e_Peas

In this case we believe that peas is a common entity that has a relationship to the intent of Feeding Fish. For coding convention we use “_e_” to denote that the following piece of the intent name is an entity identifier.

At the application layer, you can do a regex on the intent name “_e_(.*?)$” for the group 1 result. If it is not blank, store it in a context variable.

Regular Expressions

Like before, you can use regular expressions to capture an earlier pattern to store it at a later point.

One way to approach this is have a gateway node that activates before working through the intent tree. Something like this:

The downside to this is that there is a level of complexity to maintain in a complex regular expression.

You can make at least maintaining a little easier by setting the primary condition check as “true” and then individual checks in the node itself.

Answer Units

An answer unit is the text response you give back to the end user. Once you have responded with an answer, you have created a lot of context within that answer that the user may follow up on. For example:

Even with the context markers of the answer, the end user may never pick up on them. So it is very important to craft your answer that will drive the user to the context you have selected.

NLU

The last option is to pass the questions through NLU. This should be able to give you the key terms and phrases to store as context. As well as create knowledge graph information.

I have the context. Now what?

When the user gives a question that does not have context, you will normally get back low confidence intents, or irrelevant response.

If you are using Intent based context, you can check the returning intents for a similar context to what you have stored. This also allows you to discard unrelated intents. The results from this are not always stellar, but offer a cheaper one time call.

The other option you can take is to preload the question that was asked and send it back. For example:

PEAS !! Can I boil them first?

You can use the !! as a marker that your question is trying to determine context. Handy if you need to review the logs later.

As time passes…

So as the conversation presses on, what the person is talking about can move away from the original context, but it may still remain the dominant. One solution is to build a weighted context list.

For example:

"entity_list" : "peas, food, fish"

In this case we maintain the last three context found. As a new context is found, it uses LIFO to maintain the list. Of course this means more API calls, which can cost money.

Lowering calls on the tree.

Another option in this to create a poor mans knowledge graph. Let’s say the last two context were “bowl” and “peas”. Rather then creating multiple context nodes, you can build a tree which can be passed back to the application layer.

"entity" : "peas->food->care->fish"
...
"entity" : "bowl->care->fish"

You can use something like Tinkerpop to create a knowledge graph (IBM Graph in Bluemix is based on this).

Now when a low confidence question is found, you can use “bowl”, “peas” to disambiguate, or use “care” as the common entity to find the answer.

Talk… like.. a… millennial…

One more common form of anaphora that you have to deal with, is how people talk on instant messaging systems. The question is often split across multiple lines.

Normal conversation systems take one entry, and give one response. So this just wreaks their AI head. Because not only do you need to know where the real question stops, but where the next one starts.

One way to approach this is capture the average timing mechanism between each entry of the user. You can do this by passing the timestamps from the client to the backend. The backend can then build an average of how the user talks. This needs to be done at the application layer.

Sadly no samples this time, but it should give you some insights into how context is worked with a conversation system.

One swallow does not make a summer.

So this is one example, of one phrase. Really for testing, you should test the whole model. From a demonstration from development, it was able to increase a S2T model accuracy from around 50% to over 80%.

You might notice that when you update your entities that Conversation says “Watson is training on your recent changes”. What is happening is that Intents and Entities work together in the NLU engine.

So it is possible to build entities that can be referenced within your intents. Something similar to how Dialog entities. Work.

For this example I am going to use two entities.

ENTITY_FOODSTUFF

ENTITY_PETS

In my training questions I create the following example.

The #FoodStore question list is exactly the same, only the entity name is changed.

Next up create your entities. It doesn’t matter what the entity itself is called, only that if has one value that mentions the entity identifiers above. I have @Entity set to the same as the value for clarity.

“What is the point?” you might ask? Well you will notice that both entities have a value of “fish”.

When I ask “I want to get a fish” I get the following back.

FoodStore confidence: 0.5947581078492985

Petshop confidence: 0.4052418921507014

So Watson is not sure, as both intents could be the right answer. This is what you would expect.

Now after we delete the “fish” value from both entities, I then add the same training question “I want a fish” to both intents. After Watson has trained and I ask “I want to get a fish”, you get the following back.

Petshop confidence: 0.9754140796608233

FoodStore confidence: 0.02458592033917674

Oh dear, now it appears to be more confident then it should be. So entities can help in making questions ambiguous if training is not helping.

This is not without it’s limitations.

Entities are fixed keywords, and the intents will treat them as such. So while it will find “fish” in our example, it won’t recognise “fishes” unless it’s explicitly stated in the entity.

Another thing to be wary of is that all entities are used in the intents. So if a question mentioned “toast”, then @ENTITY_FOODSTUFF becomes a candidate in trying to determine which intent is correct.

The last thing to be aware of is that training questions take priority over entities when it comes to determining what is correct.

If we were to add a training question “I want fishes” to the first example. Then ask the earlier question, you would find that foodstore now takes priority. If we add “I want fishes” to both intents and ask the question “I want to get a fish”, you will get the same results as if the entities never had the word “fish” in it.

This can be handy for forcing common spelling mistakes that may not be picked up, or clearly defined domain keywords a user may enter (eg. product ID)

A common question that comes up is how to handle where the end user makes two utterances, but you only want to take action on one.

The most common being someone saying hello, versus saying hello with a question. You would want the question to take priority.

It’s very easy to do. You just do the following:

Create your first node with a condition of True and create your priority intents under this. Set your top node to jump to the first in that branch.

Create your second node which handles greetings.

Add a True node at the end of your important intents, and let it jump to greeting condition.

And that’s it! But that is the old style conversation way. Just before the new year a new version of conversation was released that makes this so much more simple.

The magic is in the first node.

Here we check to ensure that a greeting hasn’t been mentioned, then check each important intent.

With this method you don’t need any complex branches or jumping around. One important thing is to ensure that your less important intents do not have any training data that may cause it to pick it over the important intents.

There is one feature of Conversation that many people don’t even factor in when creating a conversational system. Let’s take the standard plan to spell it out.

Unlimited API queries/month

Up to 20 workspaces

Up to 2000 intents

Shared public cloud

Yep, you have 20 workspaces to play with! Most people starting off just use it for development, testing and production. But there is so much more. Putting it in context.

Functional Actions

For those new to Conversation, the first experience is normally the car demo. This is a good example of functional actions. In this case you know your application, and you want your end user to interact with it. So the user normally has prompts (conversational or visual) to allow them to refer to your user interface.

These offer the least resistance of building. The user is taught the names of the interfaces by using it, and are unlikely to deviate from that language. In fact if they do use non-Domain language, it is more likely a fault of your user interface.

Question & Answers

This is where you have collected questions from the end user, to determine what the answer/action that is needed to be taken.

Often the end user does not understand the domain language of the documentation or business. So training on their language helps making the system better for them.

Process Flows

This is where you need to converse with the user, to collect more information to drive to meeting their needs.

Multiple Workspaces

Most see this as just creating a workspace for development, testing and production. But using these as part of your overall architecture can dramatically increase the functionality and accuracy of the system.

Two main patterns that have been used are off-topic + drill down.

Off-Topic / Chit chat.

In this model we have a primary workspace which stores the main intents, as well as an intent for off-topic + chit-chat. Once one of these is detected, a second call is made out to the related workspace.

From a price point of view this works well if you are calling out to a subject matter that is asked infrequently. If the user will often ask these questions though, then the drill down method is a better solution.

Drill Down.

This model is where the user asks a question which has a more expanded field going forward. For example when you enter a bank you may ask the information desk about mortgages, who will direct you to another desk to go into more detail on your questions.

For this to work well, you need clear separation of what each workspace does. So that an off topic is triggered so as to pass back to the main workspace.

When planning your model look for common processes vs entities. The example above might not be good to separate by pets, as they will share common questions with a different entity. But you could separate between purchasing, accessories, etc.

As long as the conversation will not switch topics often then costs are kept down.

Multiple Workspace Calls.

This is not a recommended model as your costs go way up. It was originally used when there was a 500 intent limit per space (NLC).

If money is no object, then this model works where you may have more then 2000 intents, or a number of intents that share similar patterns but you need to distinguish between them.

You need to factor in if your conversation service is returning relative or absolute confidences. If relative, then responses are relative to their workspace and not to each other.

If you do have numerous intents, it may be easier and better to use a different solution, like Discovery Service for example.

While the cognitive ability of Conversation is what sets it apart from other chat bots, the skills in message shaping can even the odds. It is a common technique used in customer support when you need to give a hard message.

From a Conversation point of view, it allows you to dramatically reduce the level of work required to build and maintain. As well as improving customer satisfaction.

So let’s start with what is message shaping. Take this example: You own a pet store chat bot and the user says:

I wish to buy a pet.

You can start with “What kind of pet?”. Here you have left the users response too open. For a cognitive system, this on the face of it isn’t an issue as a well trained Conversation will handle this well.

“A pet that is popular for millennials” – Now it starts getting crazy.

You will be driven to insanity trying to cater for every possible response coming back from the user. Even if you get a good response like “I want to buy a puppy” you may need to walk through to and fro, only to find that you don’t have that pet in the store to sell them.

So you can reduce complexity by taking control of the conversation. First you need to examine what the user hasn’t said. They haven’t said what kind of pet. This means they are unsure on what they want to get.

As a pet store owner, you know that certain pets are good for certain living conditions. So you can reduce and control the direction by saying something like:

I see you are maybe unsure about the pet you want. I can recommend a low maintenance pet, which are good for apartments or busy lifestyles. Or a high maintence pet which is good for families with a home.

Here you have given two options which dramatically narrow the scope of the next response from the user. It is still possible someone may go off script, but it is unlikely. If they do you can do a conversational repair to force them back into the script.

As the flow progresses, you can push the user to pets that you are trying to sell faster.

In doing so however it is important to understand that the end user must have free will, even if it is an illusion. For example if the person wants a puppy, it may be that a certain breed is not available. Rather then saying they can’t have that breed, offer other breeds.

If you give no options to the user, it leads to frustration. Even if given options which are not what the person wants, it is still better then no option. Actually if you shape your messages well you can give two options which lead to the same outcome, and the end user will still feel like they are in control.

Shaping through UI

Now you can see that the end user has not supplied all credit card information. You would need to code complex flows to cater for this. The information is clearly visible, and it could become a nightmare to parse to have it anonymised.

To solve all of this you can use the UI to force the user to structure their own data.

Watson Virtual Agent actually does this out of the box for a number of common areas.

Buttons are for apps, not for conversing.

For UI related prompts, try not to overdo it. For structured data it is fine. For buttons it can also be fine, but if you overdo it then it does not feel intelligent to the end user. As it starts to feel more like an application, the users have different expectations of the responses they get back.

Practise makes perfect

Don’t be fooled that this is easy to do. Most developers I have seen work with conversation fall back to looking for a technical solution, when only changing how you speak to the end user will suffice.

Even people working in support can take 6 months to a year to pick up the skills from no experience. Although it can be a bit harder having to do it on the fly versus creating a script.

For more reading on techniques, here is some stuff to get you started.

If you have gone into your conversation service since yesterday, you will find you have a new feature called System Entities. For the moment you only have number, percentage and currency recognition.

For this introduction I am just going to use @sys-number entity to make Watson do simple math.

First let’s train Watson about the math terms we are going to use. For this we are going to use intents.

Why intents and not entities? Hands down Intents will mean very little training for it to understand similar terms. Also as system entities, are also entities they can interfere with any logic you put in. I use the number “42” so as to not bias the classification to a particular number.

Next we go to entities and switch on the @sys-number entity.

Now for dialog, first we want to make sure what the person has said is a valid math question, if not we say we don’t understand. We do it with the following conditional statement.

intents.size() >0
AND intents[0].confidence < 0.30

This will ensure that the system only responds if it is confident to do so. Next we put another node in which checks to see if the system is unsure.

Now you will notice we are using entities.size(). This is because the numbers are entities, and @sys-number doesn’t have the size() method. We want to make sure that the end user typed in two numbers before continuing.

Now what we have done that, the procedure is more or less the same for each action, so here is just the addition.

This takes the first and second numeric value and adds them. While conversation will recognise numbers from text and take action on them, it won’t always do this. So we have to use the numeric_value attribute.

While mostly fine, there are issues you won’t be able to easily cater for. For example the importance of the numbers location.

Take for the example the two questions which are the same, but will give very different answers.

What is ten divided by two?

Using the number two divide up the number ten.

One way to solve this is just to create a second division intent which knows the numbers are reversed, but more likely you can solve this with message shaping.

You will find a similar issue though when you start to use the other system entities. For example if you have @sys-percentage active, then “20%” is not only a percent, but it is also a number. This makes it tricker when trying to read the entity or @sys-number stack.

For what comes back from conversation, you will see that the entity structure has changed.