In the past few years, there was a rapid
expansion of activities in Web content mining. This is not surprising because of
the huge amount of valuable information of almost any imaginable type on the Web
and significant economic benefits of such mining. However, due to the
heterogeneity and the lack of structure of the Web data, automated discovery of
targeted or unexpected knowledge/information still presents many challenging
problems. This tutorial introduces several such problems and some
state-of-the-art techniques for dealing with them, e.g., data/information
extraction, Web information integration, opinion mining, and information
synthesis. These problems all have strong connections with NLP. In the tutorial,
it is paid special attention to such connections and discuss how NLP researchers
may contribute towards solving these problems. Many real-life examples are given
to help participants understand research concepts and see how the technologies
may be deployed to real-life applications. The tutorial thus has a mix of
research and industry flavor, addressing seminal research ideas and looking at
the technology from an industry angle.