LOVe: Linking Objects to Vectors in distributional semantics: A framework to anchor corpus-based meaning representations to the external world

Overview

We use language to talk about the world. If I say "I like J. K. Rowling's books", you will be able to understand what I say because you are able to connect the expression "J. K. Rowling" to a particular entity in the world, even if that person is not there in the moment. This, which comes naturally to us humans, is very difficult to do for machines. My project aims at understanding reference, the crucial property of language that allows us to link it to the external world, using computational tools. Specifically, it exploits distributional semantics and deep learning, a family of methods that can learn to represent language directly from language data generated by humans, such as texts drawn from the internet.

The project has three main objectives:

1) To explore the representation of entity names (such as "J. K. Rowling") in distributional semantics.

2) To connect referential expressions ("the mug", "the book I bought") to objects depicted in images.

3) To develop a semantic framework that combines information about entities and concepts in a meaningful way.

We find that it is possible for current distributional models to learn to refer directly from data, and that distributional representations usefully represent the meaning of entity names.

The project advances our scientific understanding of language, a defining trait of the human species, and makes significant progress towards building machines we can talk to, with the ensuing impact on our everyday lives.

Results

Our first goal was to link referential expressions to the external world, where the world is operationalized in terms of information represented in a database. For feasibility reasons, we focused on public entities (like Obama, Italy), as opposed to private entities like the bird I saw this morning or the neighbor next door. We first showed that it is possible to extract real-world, referential attributes (such as the population of a country or whether it belongs to the European Union) from distributional representations. We then showed that it is possible to learn when an entity (say, Abraham Lincoln) is an instance of a category (say, president) based on the distributional representation of the linguistic expressions ("Abraham Lincoln" and "president", in this case). Finally, we created a challenging dataset for natural language understanding that contains many referential phenomena, and showed that current distributional approaches perform very poorly on it. The dataset will guide the development of newer-generation methods to solve referential phenomena and other semantic phenomena at the discourse level.

Our second goal was to link referential expressions to the external world, where this time the external world is operationalized in terms of visual information (objects represented in images). We showed that our computational model is able to pick the image that corresponded to a referential expression ("the cat"), and to spot cases in which the referential expression is not adequate (for instance, asking for "the cat" when there are several different cats in the situation); we also showed that our model can learn the meaning of quantifiers like "all" and "some" (capturing the difference between "all circles are black" and "some circles are black") directly from images containing objects that correspond to these different expressions; finally, we showed that our model can combine visual information and linguistically-conveyed information: If yesterday I told you that I bought a particular mug, you can combine this information with the visual properties of several objects to pick the right object when today I ask you to pass me the mug that I bought.

Our third goal was to develop a semantic framework that encompasses conceptual and referential aspects of meaning. We made progress in this direction, with: 1) The description of a dual, conceptual and referential route in composition and its formalization; 2) the discussion of the limitations of distributional models for phenomena beyond the sentence level, pointing to specific directions in which the field needs to move; 3) the summarization and appraisal of the state of the art in the field of Formal Distributional Semantics, in a Special Issue in the top journal in the field, Computational Linguistics.

All the data gathered within the project, as well as all articles, are open and accessible to the scientific community and the whole of society.

Impact

Language is the most natural communication means for humans, and this project helps understand how it works. This is generally important for needs as different as language learning and overcoming language impairments. More specifically, this project paves the way for building machines we can talk to, because it helps computers make sense of the world around us in a way that they can communicate about it. This addresses the digital divide, because in the future we will be able to solve more and more tasks by simply talking to our computers, phones, or tablets, instead of having to delve into application menus and complicated settings. It also can make daily operations such as using a GPS easier for everybody: If our GPS could see, it would still need to be able to talk about what it sees. This project has made progress in this direction.