This course covers a wide range of tasks in Natural Language Processing from basic to advanced: sentiment analysis, summarization, dialogue state tracking, to name a few. Upon completing, you will be able to recognize NLP tasks in your day-to-day work, propose approaches, and judge what techniques are likely to work well. The final project is devoted to one of the most hot topics in today’s NLP. You will build your own conversational chat-bot that will assist with search on StackOverflow website. The project will be based on practical assignments of the course, that will give you hands-on experience with such tasks as text classification, named entities recognition, and duplicates detection.
Throughout the lectures, we will aim at finding a balance between traditional and deep learning techniques in NLP and cover them in parallel. For example, we will discuss word alignment models in machine translation and see how similar it is to attention mechanism in encoder-decoder neural networks. Core techniques are not treated as black boxes. On the contrary, you will get in-depth understanding of what’s happening inside. To succeed in that, we expect your familiarity with the basics of linear algebra and probability theory, machine learning setup, and deep neural networks. Some materials are based on one-month-old papers and introduce you to the very state-of-the-art in NLP research.
Do you have technical problems? Write to us: coursera@hse.ru

MV

Definitely best course in the Specialization! Lecturers, projects and forum - everything is super organized. Only StarSpace was pain in the ass, but I managed :)

TL

Jul 08, 2018

Filled StarFilled StarFilled StarFilled StarFilled Star

Anna is a great instructor. She can explain the concept and mathematical formulas in a clear way. The design of assignment is both interesting and practical.

レッスンから

Vector Space Models of Semantics

This module is devoted to a higher abstraction for texts: we will learn vectors that represent meanings. First, we will discuss traditional models of distributional semantics. They are based on a very intuitive idea: "you shall know the word by the company it keeps". Second, we will cover modern tools for word and sentence embeddings, such as word2vec, FastText, StarSpace, etc. Finally, we will discuss how to embed the whole documents with topic models and how these models can be used for search and data exploration.

講師

Anna Potapenko

Researcher

Alexey Zobnin

Accosiate professor

Anna Kozlova

Team Lead

Sergey Yudin

Analyst-developer

Andrei Zimovnov

Senior Lecturer

字幕

Hey. Let us understand how to train PLSA model. So, just to recap, this is a topic model that predicts words in documents by a mixture of topics. So we have some parameters in this model. We have two kinds of probability distributions, phi parameters stand for probabilities of words and topics, and theta parameters stand for probabilities of topics and documents. Now, you have your probabilistic model of data, and you have your data. How do you train your models? So, how do you estimate the parameters? Likelihood maximization is something that always help us. So the top line for this slide is the log-likelihood of our model, and we need to maximize this with the respect to our parameters. Now, let us do some modification in this formula. So first, let us apply logarithm, and we will have the sum of logarithms instead of the logarithm of the products. Then, let us just get rid of the probability of the document because the probability of the document does not depend on our parameters, which they do not even know how to model this pairs. So we just forget about them. What we care about is the probabilities of words in documents. So we substitute them by the sum of our topics. So this is what our model says. Great. So that's it. And we want to maximize this likelihood, and we need to remember about constraints. So our parameters are probabilities. That's why they need to be non-negative, and they need to be a normalized. Now, you can notice that this term that we need to maximize is not very nice. We have a logarithms for the sum, and this is something ugly that is not really clear how to maximize. But fortunately, we have EM-algorithm, you could hear about this algorithm in other course in our Specialization. But now, I want just to come to this algorithm intuitively. So let us start with some data. So we are going to train our model on plain text. So this is everything of what we have. Now, let us remember that we know the generative model. So we assume that every word in this text has some one topic that was generated when we decided to reach what will be next. So let us pretend, just for a moment, just for one slide, that we know these topics. So let us pretend that we know that the words sky, raining, and clear up go from sub topic number 22, and that's it. So we know these assignments. How would you then calculate the probabilities of words in topics? So you know you have four words for this topic, and you want to calculate the probability of sky, let's say. This is how you do it. You just say, "Well, I like one word out of these four words. So the probability will be one divided by four." By NWT here, I denote the count of how many times this certain word was connected to this certain topic. So, can you imagine how would we evaluate the probability of topics in this document for this colorful case. Well, it's just the same. So we know that we have four words about this red topic, and we have 54 words in our document, that's why we have this probability for this example. Well, unfortunately, life is not like this. We do not know this colorful topic assignments. What we have is just plain text. And that's a problem. But, can we somehow estimate those assignments? Can we somehow estimate the probabilities of the colors for every word? Yes we can. So, Bayes rule helps us here. What we can do, we can say that we need probabilities of topics for each word in each document and apply Bayes rule and product rule. So, to understand this, I just advise you to forget about D in all this formulas, and then everything will be very clear. So we just apply these two rules, and we get some estimates for probabilities of our hidden variables, probabilities of topics. Now, it's time to put everything together. So, we have EM-algorithm which has two steps, E-step and M-step. Each step is about estimating the probabilities of hidden variables, and this is what we have just discussed. M-step is about those updates for parameters. So we have discussed it for the simple case when we know the topics assignment exactly. Now, we do not know them exactly. So, it is a bit more complicated to compute NWT counts. This is not just how many times the word is connected with this topic, but it's still doable. So, we just take the words, we take the counts of the words, and we weight them with the probabilities that we know from the E-step. And that's how we get some estimates for NWT. So this is not int counter anymore. It has some flow to variable that still has the same meaning, still has the same intuition. So, the EM-algorithm is a super powerful technique, and it can be used any time when you have your model, you have your observable data, and you have some hidden variables. So, this is all formulas that we need for now. You just want to understand that to build your topic model, you need to repeat those E-step and M-step iteratively. So, you scan your data, you compute probabilities of topics using your current parameters, then you update parameters using your current probabilities of topics and you repeat this again and again. And this iterative process converge and hopefully, you will get your nice topic model trained.