This course will cover the major techniques for mining and analyzing text data to discover interesting patterns, extract useful knowledge, and support decision making, with an emphasis on statistical approaches that can be generally applied to arbitrary text data in any natural language with no or minimum human effort.
Detailed analysis of text data requires understanding of natural language text, which is known to be a difficult task for computers. However, a number of statistical approaches have been shown to work well for the "shallow" but robust analysis of text data for pattern finding and knowledge discovery. You will learn the basic concepts, principles, and major algorithms in text mining and their potential applications.

Avaliações

JS

The content was very useful, and the preparation of the course denoted much care and preparation by the teacher. I would love to see some modern topics like word embeddings covered in the course!

RC

Jun 25, 2017

Filled StarFilled StarFilled StarFilled StarFilled Star

very theoretical. very insightful too. finally got a glimpse how the features that I took for granted took shape ... thanks to the instructor, I love the textbook as well ~

Na lição

Week 1

During this module, you will learn the overall course design, an overview of natural language processing techniques and text representation, which are the foundation for all kinds of text-mining applications, and word association mining with a particular focus on mining one of the two basic forms of word associations (i.e., paradigmatic relations).

Ministrado por

ChengXiang Zhai

Transcrição

[SOUND] So here are some specific examples of what we can't do today and part of speech tagging is still not easy to do 100% correctly. So in the example, he turned off the highway verses he turned off the fan and the two offs actually have somewhat a differentness in their active categories and also its very difficult to get a complete the parsing correct. Again, the example, a man saw a boy with a telescope can actually be very difficult to parse depending on the context. Precise deep semantic analysis is also very hard. For example, to define the meaning of own, precisely is very difficult in the sentence, like John owns a restaurant. So the state of the off can be summarized as follows. Robust and general NLP tends to be shallow while a deep understanding does not scale up. For this reason in this course, the techniques that we cover are in general, shallow techniques for analyzing text data and mining text data and they are generally based on statistical analysis. So there are robust and general and they are in the in category of shallow analysis. So such techniques have the advantage of being able to be applied to any text data in any natural about any topic. But the downside is that, they don't give use a deeper understanding of text. For that, we have to rely on deeper natural language analysis. That typically would require a human effort to annotate a lot of examples of analysis that would like to do and then computers can use machine learning techniques and learn from these training examples to do the task. So in practical applications, we generally combine the two kinds of techniques with the general statistical and methods as a backbone as the basis. These can be applied to any text data. And on top of that, we're going to use humans to, and you take more data and to use supervised machine learning to do some tasks as well as we can, especially for those important tasks to bring humans into the loop to analyze text data more precisely. But this course will cover the general statistical approaches that generally, don't require much human effort. So they're practically, more useful that some of the deeper analysis techniques that require a lot of human effort to annotate the text today. So to summarize, the main points we take are first NLP is the foundation for text mining. So obviously, the better we can understand the text data, the better we can do text mining. Computers today are far from being able to understand the natural language. Deep NLP requires common sense knowledge and inferences. Thus, only working for very limited domains not feasible for large scale text mining. Shallow NLP based on statistical methods can be done in large scale and is the main topic of this course and they are generally applicable to a lot of applications. They are in some sense also, more useful techniques. In practice, we use statistical NLP as the basis and we'll have humans for help as needed in various ways. [MUSIC]