How Intelligent Agents Leverage NLU, Intents, and Entities (Video)

Ulster University Professor Michael McTear discusses how contemporary natural language-based intelligent agents use intents and entities rather than traditional parse trees in this clip from SpeechTEK 2018.

Michael McTear: One of the basic technologies that's involved in intelligent agents, apart from speech recognition, is natural language understanding. Most platforms nowadays that are built for intelligent agents and conversation interfaces actually use the two concepts of intents and entities rather than doing the very complex parse trees that are the traditional way of doing natural language understanding.

What is an intent? Well it's basically what the user wants to achieve, so they might want to find a restaurant, get the weather report, or find a recipe. The task for the NLU is to take the user's utterance and to match that to one of the intents that is supported by the system, by the app, by the agent, so that's an intent. Entities are the little bits of data, the parameters of the objects--sometimes called slots--so for example, location or duration, and the personal chef, that would've been the ingredients, and some of the other parameters as well. Here these have to be extracted then from the user’s input, so they can be used.

Here's an example of an intent for recipe recommendations. Basically, the idea is that when you build the application, you create a lot of the sorts of things the user is likely to say. In the particular software that I'm going to illustrate, it's called Dialogue Flow. It used to be called API.AI. At one time it was a separate company, but now it's been acquired by Google, so it's really Google Dialogue Flow. It also automatically highlights potential entities. For example, for colds that might automatically assign that to the category of temperature, but you can actually customize it, so if you want, for example, to take chicken, which is naturally on one of these, as well as that being the type of meat that might be protein.

Basically, what the developer does is to predict all the sample utterances that can be said. The system will automatically use machine learning to classify those in relation to the intents, so that actually, even if the input is not exactly a match, but is similar enough to one of the predicted ones that have been hard coded, it can actually map that to a particular intent. That's the beauty of the system. You don't have to exhaustively classify or write down and predict all of the things the user is likely to say because that actually always turns out to be impossible. They will say something else that confounds you and then it crashes. The machine learning is using statistical techniques to classify what the most likely intent is going to be, and that's actually used in lots of different systems--Dialogue Flow, as I've mentioned, Microsoft, Facebook, Amazon, IBM, and a host of much smaller companies that are using the same technology.

Entities in this case are very, very simple. As you can see, these are some for dish type, and on the right-hand side, you have synonyms of the entity. So the user builds some of these things, and then the system uses machine learning to advance them. With entities, it's really what's called tagging the thing. We used to have tagging for part of speech, but this would be tagging for particular slots, particular entity types.