This course covers a wide range of tasks in Natural Language Processing from basic to advanced: sentiment analysis, summarization, dialogue state tracking, to name a few. Upon completing, you will be able to recognize NLP tasks in your day-to-day work, propose approaches, and judge what techniques are likely to work well. The final project is devoted to one of the most hot topics in today’s NLP. You will build your own conversational chat-bot that will assist with search on StackOverflow website. The project will be based on practical assignments of the course, that will give you hands-on experience with such tasks as text classification, named entities recognition, and duplicates detection.
Throughout the lectures, we will aim at finding a balance between traditional and deep learning techniques in NLP and cover them in parallel. For example, we will discuss word alignment models in machine translation and see how similar it is to attention mechanism in encoder-decoder neural networks. Core techniques are not treated as black boxes. On the contrary, you will get in-depth understanding of what’s happening inside. To succeed in that, we expect your familiarity with the basics of linear algebra and probability theory, machine learning setup, and deep neural networks. Some materials are based on one-month-old papers and introduce you to the very state-of-the-art in NLP research.
Do you have technical problems? Write to us: coursera@hse.ru

MV

Definitely best course in the Specialization! Lecturers, projects and forum - everything is super organized. Only StarSpace was pain in the ass, but I managed :)

TL

Jul 08, 2018

Filled StarFilled StarFilled StarFilled StarFilled Star

Anna is a great instructor. She can explain the concept and mathematical formulas in a clear way. The design of assignment is both interesting and practical.

レッスンから

Vector Space Models of Semantics

This module is devoted to a higher abstraction for texts: we will learn vectors that represent meanings. First, we will discuss traditional models of distributional semantics. They are based on a very intuitive idea: "you shall know the word by the company it keeps". Second, we will cover modern tools for word and sentence embeddings, such as word2vec, FastText, StarSpace, etc. Finally, we will discuss how to embed the whole documents with topic models and how these models can be used for search and data exploration.

講師

Anna Potapenko

Researcher

Alexey Zobnin

Accosiate professor

Anna Kozlova

Team Lead

Sergey Yudin

Analyst-developer

Andrei Zimovnov

Senior Lecturer

字幕

Hey. You know the basic topic model which is called PLSA, and now you know how to train it. Now, what are some other topic models in this world? What are some other applications that we can solve with the topic modeling? I want to start with a nice application. It is about diary of Martha Ballard. So, this is a big diary. She was writing for 27 years. This is why it's rather complicated for people to read this diary and to analyze this. So, some other people decided to apply topic modeling to this and see what other topics revealed in this diary. These are some examples of the topics and you can see just the top most probable words. So, you remember you have your Phi metrics which stand for probabilities of words and topics. And this is exactly those words with the highest probabilities. And actually you can see that the topics are rather intuitively interpretable. So, there is something about the gardens, and potatoes, and work in these gardens. There is something about shopping like sugar, or flour, or something else. So, you can look through these top words, and you can name the topics, and that's nice. What's nicer, you can look into how these topics change over time. So, for example the gardening topic is very popular during summer, in her diary, and it's not very popular during winter, and it makes perfect sense. Right? Another topic which is about emotions has some high probabilities during those periods of her life when she had some emotional events. For example, one moment of high probability there corresponds to the moment when she got her husband into prison, and somebody else died, and something else happened. So, the historians can I say that, ''OK, this is interpretable. We understand why this topic has high probability there.'' Now, to feel flexible and to apply your topics in many applications, we need to do a little bit more math. So, first, this is the model called Latent Dirichlet Allocation, and I guess this is the most popular topic model ever. So, it was proposed in 2003 by David Blei, and actually any paper about topic models now cite this work. But, you know this is not very different from PLSA model. So, everything that it says is that, ''OK we will still have Phi and Theta parameters, but we are going to have Dirichlet priors for them.'' So, Dirichlet distribution has rather ugly form and you do not need to memorize this, you can just always Google it. But, important thing here is that we say that our parameters are not just fixed values, they have some distribution. That's why as the output of our model, we are also going to have some distribution over parameters. So, not just two matrices of values, but distribution over them, and this will be called posterior distribution and it will be also Dirichlet but with some other hyperparameters. In other course of our specialization devoted to Bayesian methods, you could learn about lots of ways how to estimate this model and how to train it. So, here I just name a few ways. One way would be a Variational Bayes. Another way would be Gibbs Sampling. All of them have lots of complicated math, so we are not going to these details right now. Instead, I'm going just to show you what is the main path for developing new topic models. So, usually people use probabilistic graphical models and Bayesian inference to provide new topic models and they say, ''OK, we will have more parameters, we will have more priors. They will be connected to this and that way.'' So people draw this nice pictures about what happens in the models. And again, let us not go into the math details but instead let us look how these models can be applied. Well, one extension of LDA model would be Hierarchical topic model. So, you can imagine that you want your topics to build some hierarchy. For example, the topic about speech recognition would be a subtopic for the topic about algorithms. And you see that the root topic has some very general Lexis and this is actually not surprising. So, unfortunately, general Lexis is always something that we see with high probabilities, especially for root topics. And in some models, you can try to distill your topics and to say well maybe we should have some separate topics about the stop words and we don't want to see them in our main topics, so we can also play with it. Now, another important extension of topic models is Dynamic topic models. So, these are models that say that topics can evolve over time. So, you have some keywords for the topic in one year and they change for the other year. Or you can see how the probability of the topics changes. For example, you have some news flow and you know that some topic about bank-related stuff is super popular in this month but not that popular later. OK? One more extension, multilingual topic models. So, topic is something that is not really dependent on the language because mathematics exists everywhere, right? So, we can just express it with different terms in English, in Italian, in Russian, and in any other language. And this model captures this intuition. So, we have some topics that are just the same for every language but they are expressed with different terms. You usually train this model on parallel data so you have two Wikipedia articles for the same topic, or let's better say for the same particular concept, and you know that the topics of these articles should be similar, but expressed with different terms, and that's okay. So, we have covered some extensions of Topic Models, and believe me there are much more in the literature. So, one natural question that you might have now if whether there is a way to combine all those requirements into one topic model. And there might be different approaches here and one approach which we develop here in our NLP Lab is called Additive Regularization for Topic Models. The idea is super simple. So, we have some likelihood for PLSA model. Now, let us have some additional regularizers. Let us add them to the likelihood with some coefficients. So, all we need is to formalize our requirements with some regularizers, and then tune those tau coefficients to say that, for example, we need better hierarchy rather than better dynamics In the model. So, just to provide one example of how those regularizers can look like, we can imagine that we need different topics in our model, so it would be great to have as different topics as possible. To do this, we can try to maximize the negative pairwise correlations between the topics. So, this is exactly what is written down in the bottom formula. You have your pairs of topics and you try to make them as different as possible. Now, how can you train this model? Well, you still can use EM algorithm. So, the E-step holds the same, exactly the same as it was for the PLSA topic model. The M-step changes, but very slightly. So, the only thing that is new here is green. This is the derivatives of the regularizers for your parameters. So, you need to add these terms here to get maximum likelihood estimations for the parameters for the M-step. And this is pretty straightforward, so you just formalize your criteria, you took the derivatives, and you could built this into your model. Now, I will just show you one more example for this. So, in many applications we need to model not only words in the texts but some additional modalities. What I mean is some metadata, some users, maybe authors of the papers, time stamps, and categories, and many other things that can go with the documents but that are not just words. Can we build somehow them into our model? We can actually use absolutely the same intuition. So, let us just, instead of one likelihood, have some weighted likelihoods. So, let us have a likelihood for every modality and let us weigh them with some modality coefficients. Now, what do we have for every modality? Actually, we have different vocabularies. So, we treat the tokens of authors modality as a separate vocabulary, so every topic will be now not only the distribution of words but the distribution over authors as well. Or if we have five modalities, every topic will be represented by five distinct distributions. One cool thing about multimodal topic models is that you represent any entities in this hidden space of topics. So, this is a way somehow to unify all the information in your model. For example, you can find what are the most probable topics for words and what are the most probable topics for time stamps, let's say. And then you can compare some time stamps and words and say, ''What are the most similar words for this day?'' And this is an example that does exactly this. So, we had a corpora that has some time stamps for the documents and we model the topics both for words and for time stamps, and we get to know that the closest words for the time stamp, which corresponds to the Oscars date would be Oscar, Birdman, and some other words that are really related to this date. So, once again, this is a way to embed all your different modalities into one space and somehow find a way to build similarities between them. OK. Now, what would be your actions if you want to build your topic models? Well, probably you need some libraries. So, BigARTM library is the implementation of the last approach that I mentioned. Gensim and MALLET implement online LDA topic model. Gensim was build for Python and MALLET is built for JAVA. And Vowpal Wabbit is the implementation of the same online LDA topic model, but it is known to be super fast. So, maybe it's also a good idea to check it out. Now, finally, just a few words about visualization of topic models. So you will never get through large collections and that is not so easy to represent the output of your model, those probability distributions, in such a way that people can understand that. So, this is an example how to visualize Phi metrics. We have words by topic's metrics here and you can see that we group those words that correspond to every certain topic together so that we can see that this blue topic is about these terms and the other one is about social networks and so on. But actually, the visualization of topic models is the whole world. So this website contains 380 ways to visualize your topic models. So, I want to end this video and ask you to just explore them maybe for a few moments, and you will get to know that topic models can build very different and colorful representations of your data.