Policy optimisation in DM

This course covers a wide range of tasks in Natural Language Processing from basic to advanced: sentiment analysis, summarization, dialogue state tracking, to name a few. Upon completing, you will be able to recognize NLP tasks in your day-to-day work, propose approaches, and judge what techniques are likely to work well. The final project is devoted to one of the most hot topics in today’s NLP. You will build your own conversational chat-bot that will assist with search on StackOverflow website. The project will be based on practical assignments of the course, that will give you hands-on experience with such tasks as text classification, named entities recognition, and duplicates detection.
Throughout the lectures, we will aim at finding a balance between traditional and deep learning techniques in NLP and cover them in parallel. For example, we will discuss word alignment models in machine translation and see how similar it is to attention mechanism in encoder-decoder neural networks. Core techniques are not treated as black boxes. On the contrary, you will get in-depth understanding of what’s happening inside. To succeed in that, we expect your familiarity with the basics of linear algebra and probability theory, machine learning setup, and deep neural networks. Some materials are based on one-month-old papers and introduce you to the very state-of-the-art in NLP research.
Do you have technical problems? Write to us: coursera@hse.ru

MV

Definitely best course in the Specialization! Lecturers, projects and forum - everything is super organized. Only StarSpace was pain in the ass, but I managed :)

TL

Jul 08, 2018

Filled StarFilled StarFilled StarFilled StarFilled Star

Anna is a great instructor. She can explain the concept and mathematical formulas in a clear way. The design of assignment is both interesting and practical.

从本节课中

Dialog systems

This week we will overview so-called task-oriented dialog systems like Apple Siri or Amazon Alexa. We will look in details at main building blocks of such systems namely Natural Language Understanding (NLU) and Dialog Manager (DM). We hope this week will encourage you to build your own dialog system as a final project!

教学方

Anna Potapenko

Researcher

Alexey Zobnin

Accosiate professor

Anna Kozlova

Team Lead

Sergey Yudin

Analyst-developer

Andrei Zimovnov

Senior Lecturer

脚本

Hi. In this video, we will talk about Policy Learner in Dialogue Manager. Okay, let me remind you what policy learning is. We have a dialogue that progresses with time, and after every turn, after every observation from the user will somehow update our state of the dialogue and state records responsible for that. And then, after we have a certain state, we actually have to make some action, and we need to figure out the policy that tells us if you have a certain state then this is an action that you must do, and this is something that we then sell to the user. So let's look at what dialog policy actually is. It is actually a mapping from dialog state to agent act. Imagine that we have a conversation with the user. We collect some information from him or her, and we have that internal state that tells us what the user essentially wants, and we need to take some action to continue the dialog. And we need that mapping from dialog state to agent act, and this is what dialog policy essentially is. Let's look at some policy execution examples. A system might inform the user that the location is 780 Market Street. The user will hear it as of the following, "The nearest one is at 780 Market Street." Another example is that the system might request location of the user. And the user will see it as, "What is the delivery address?" And we have to train a model to give us an act from a dialog state or we can do that by hand crafted rules, which is my favorite. Okay, so let's look at the Simple approach: hand crafted rules. You have NLU and state tracker. And you can come up with hand crafted rules for policy. Because if you have a state tracker, you have a state, and if you remember the dialog state tracking challenge dataset, it actually contains a part of the state which has requested slots, and we can use that information to understand what to do next, whether we need to tell the user a value of a particular slot or we should search the database or something else. So, it should be pretty easy to come up with hand crafted rules for policy. But it turns out that you can make it better if you do it with machine learning. And there are two ways to do that, to optimize dialog policies with machine learning. The first one is Supervised learning, and in this setting, you train to imitate the observed actions of an expert. So we have some human-human interactions, one of them is an expert, and you just use that observations and try to imitate the action of an expert. It often requires a large amount of expert label data and as you know it is pretty expensive to collect that data, because you cannot use crowd sourcing platforms like Amazon Mechanical Turk. But even with a large amount of training data, parts of the dialog state space may not be well covered in the training data and our system will be blind there. So, there is a different approach to this called Reinforcement learning, and this is a huge field and it is out of our scope, but it is like an honorable mention. Given only rewards signal, now, the agent can optimize a dialog policy through interaction with users. Reinforcement learning can require many samples from an environment, making learning from scratch with real user is impractical, we will just waste the time of our experts. That's why there, we need simulated users based on the supervised data for reinforcement learning. So and this is a huge field and it gains popularity in dialog policies optimization. Let's look at how supervised approach might work. Here is an example of another model that does joint NLU and dialog management policy optimization, and you can see what it does. We actually have four utterances that are all utterances that we got from the user so far. We pass each of them through NLU which gives us intents and slot tagging, and we can also take the hidden vector, the hidden representation of that phrase from the NLU and we can use it for a consecutive LSTM that will actually come up with an idea what system action we can actually execute. So, we've got several utterances, NLU results, and then the LSTM reads those utterances in latent space from NLU, and it actually decides what to do next. So this is pretty cool because, here, we don't need dialog state tracking, we don't have state. State here is replaced with a state of the LSTM, so that is some latent variables like 300 of them let's say. So our state becomes not hand crafted, but it becomes a real valued vector. So this is pretty cool. And then we can actually learn a classifier on top of that LSTM, and it will output us the probability of the next system action. Let's see how it actually works. If we look at the results, there are three models that we compare here. The first one is baseline. That is a classical approach to this problem. We have a conditional random field for slot tagging and we have SVM for action classification. As you can see, the frame level accuracies, that means that we need to be accurate about everything in the current frame that we have after every utterance, and you can see that the accuracy for dialog manager is pretty bad here. But for NLU, it's okay. Then, another model is Pipeline-BLSTM, and what it actually does is it does NLU training separately and then that bidirectional LSTM for dialog policy optimization on top of that model. But these models are trained separately. And you can see that the third option is when these two models, NLU and bidirectional LSTM which was in blue in the previous slides, we can actually train them end to end, jointly, and we can increase the dialog manager accuracy by a huge margin and we actually improve NLU as well. So we have seen that effect of joint training before, and it still continues to happen. Okay, so what have we looked at? Dialog policy can be done by hand crafted rules if you have a good NLU and you have a good state tracker. Or it can be done in a supervised way where you can learn it from data and you can learn it jointly with NLU, and this way you will not need state tracker for example. Or you can do the reinforcement learning way, but that is a story for a different course.