Publications: 2008

Web searches tend to be short and ambiguous. It is therefore not surprising that Web query disambiguation is an actively researched topic. However, most existing work relies on the existence of search engine log data in which each user's search activities are recorded over long periods of time. Such approaches may raise privacy concerns and may be difficult to implement for pragmatic reasons. In this work, we present an approach to Web query disambiguation that bases its predictions only on a short glimpse of user search activity, captured in a brief session of about 5--6 previous searches on average. Our method exploits the relations of the current search session in which the ambiguous query is issued to previous sessions in order to predict the user's intentions and is based on Markov logic. We present empirical results that demonstrate the effectiveness of our proposed approach on data collected form a commercial general-purpose search engine.

This paper introduces a new kernel which computes similarity between two natural language sentences as the number of paths shared by their dependency trees. The paper gives a very efficient algorithm to compute it. This kernel is also an improvement over the word subsequence kernel because it only counts linguistically meaningful word subsequences which are based on word dependencies. It overcomes some of the difficulties encountered by syntactic tree kernels as well. Experimental results demonstrate the advantage of this kernel over word subsequence and syntactic tree kernels.

A semantic parser learning system learns to map natural language sentences into their domain-specific formal meaning representations, but if the constructs of the meaning representation language do not correspond well with the natural language then the system may not learn a good semantic parser. This paper presents approaches for automatically transforming a meaning representation grammar (MRG) to conform it better with the natural language semantics. It introduces grammar transformation operators and meaning representation macros which are applied in an error-driven manner to transform an MRG while training a semantic parser learning system. Experimental results show that the automatically transformed MRGs lead to better learned semantic parsers which perform comparable to the semantic parsers learned using manually engineered MRGs.

Recognizing visual scenes and activities is challenging: often visual cues alone are ambiguous, and it is expensive to obtain manually labeled examples from which to learn. To cope with these constraints, we propose to leverage the text that often accompanies visual data to learn robust models of scenes and actions from partially labeled collections. Our approach uses co-training, a semi-supervised learning method that accommodates multi-modal views of data. To classify images, our method learns from captioned images of natural scenes; and to recognize human actions, it learns from videos of athletic events with commentary. We show that by exploiting both multi-modal representations and unlabeled data our approach learns more accurate image and video classifiers than standard baseline algorithms.

Markov logic networks (MLNs) are an expressive representation for statistical relational learning that generalizes both first-order logic and graphical models. Existing methods for learning the logical structure of an MLN are not discriminative; however, many relational learning problems involve specific target predicates that must be inferred from given background information. We found that existing MLN methods perform very poorly on several such ILP benchmark problems, and we present improved discriminative methods for learning MLN clauses and weights that outperform existing MLN and traditional ILP methods.

ML ID: 220

Learning to Sportscast: A Test of Grounded Language Acquisition[Details] [PDF] [Slides] [Video] David L. Chen and Raymond J. MooneyIn Proceedings of the 25th International Conference on Machine Learning (ICML), Helsinki, Finland, July 2008.

We present a novel commentator system that learns language from sportscasts of simulated soccer games. The system learns to parse and generate commentaries without any engineered knowledge about the English language. Training is done using only ambiguous supervision in the form of textual human commentaries and simulation states of the soccer games. The system simultaneously tries to establish correspondences between the commentaries and the simulation states as well as build a translation model. We also present a novel algorithm, Iterative Generation Strategy Learning (IGSL), for deciding which events to comment on. Human evaluations of the generated commentaries indicate they are of reasonable quality compared to human commentaries.

This paper introduces the single-entity-centered setting for transfer across two relational domains. In this setting, target domain data contains information about only a single entity. We present the SR2LR algorithm that finds an effective mapping of the source model to the target domain in this setting and demonstsrate its effectiveness in three relational domains. Our experiments additionally show that the most accurate model for the source domain is not always the best model to use for transfer.

To truly understand language, an intelligent system must be able to connect words, phrases, and sentences to its perception of objects and events in the world. Current natural language processing and computer vision systems make extensive use of machine learning to acquire the probabilistic knowledge needed to comprehend linguistic and visual input. However, to date, there has been relatively little work on learning the relationships between the two modalities. In this talk, I will review some of the existing work on learning to connect language and perception, discuss important directions for future research in this area, and argue that the time is now ripe to make a concerted effort to address this important, integrative AI problem.