Interactive Machine Learning for Information Extraction

Abstract

Most of the world's knowledge, be it factual news, scholarly research,
social communication, subjective opinions, or even fictional content,
is now easily accessible as digitized text. Unfortunately, due to the
unstructured nature of text, much of the useful content in these
documents is hidden. The goal of "information extraction" is to
address this problem: extracting meaningful, structured knowledge
(such as graphs and databases) from text collections. The biggest
challenges when using machine learning for information extraction
include the high cost of obtaining annotated data and lack of guidance
on how to understand and fix mistakes.

In this talk, I propose interpretable representations that allow users
and machine learning models to interact with each other: enabling
users to inject domain knowledge into machine learning and machine
learning models to provide explanations as to why a specific
prediction was made. I study these techniques using relation
extraction as the application, an important subtask of information
extraction where the goal is to identify the types of relations
between entities that are expressed in text.

I first describe how
symbolic domain knowledge, if provided by the user as first-order
logic statements, can be injected into relational embeddings to
improve the predictions. In the second part of the talk, I present an
approach to "explain" machine learning predictions using symbolic
representations, which the user may annotate directly for more
effective supervision. I present experiments that demonstrate that an
interactive interface between a user and machine learning is effective
in reducing annotation effort and in quickly training accurate
extraction systems.

Bio: Sameer Singh is a Postdoctoral Research Associate at the
University of Washington, working on large-scale and interactive
machine learning applied to information extraction and natural
language processing. He received his PhD from the University of
Massachusetts, Amherst, during which he also interned at Microsoft
Research, Google Research, and Yahoo! Labs on massive-scale machine
learning. He was recently selected as a DARPA Riser, won the grand
prize in the Yelp dataset challenge, has been awarded the Yahoo! Key
Scientific Challenges and the UMass Graduate School fellowships, and
was a finalist for the Facebook PhD fellowship. (http://sameersingh.org)