NAACL-HLT 2012 Workshop on Inducing Linguistic Structure

Welcome to the homepage of the NAACL-HLT 2012 Workshop on Inducing Linguistic Structure. This workshop addresses the challenges of learning in an unsupervised or minimally supervised context with questions of linguistic structure. It encompasses many popular themes in computational linguistics and machine learning, including grammar induction, shallow syntax induction (e.g., parts of speech), learning semantics, learning the structure of documents and discourses, and learning relations within multilingual text collections. Unlike supervised settings, where annotated training data is available, unsupervised induction is considerably more difficult, both in terms of modelling and evaluation.

Unsupervised Part of Speech Inference with Particle Filters; Dubbin and Blunsom

Nudging the Envelope of Direct Transfer Methods for Multilingual Named Entity Recognition; Tackström

13.00-14.15

Lunch break

14.15-15.15

Invited talk: Noah Smith

15.15-15.30

Overview of PASCAL challenge (shared task); Gelling et al.

15.30-16.00

Coffee break and poster session

16.00-17.30

Poster session continues

All the research talks will be presented in short spotlight sessions, with 10 minute presentations back-to-back. This work will also be on display in the afternoon poster session. As well as these posters, the participants in the Grammar Induction Challenge (shared task) will be presenting their work.

Invited Talks

Alexander Clark

What types of linguistic structure can be induced?

In NLP, linguistic structure is typically taken to be data that one wishes to model: data that is accurately represented in annotated corpora like the Penn treebank; the role of the computational linguist is to recover this structure using supervised or unsupervised learning. In this talk we will claim that this view is mistaken and misleading; making the uncontroversial claim that the syntactic annotations are theoretical constructs and not data, and the more controversial claim that computational linguists should aim instead to specify well defined alternative structures that are capable of being induced efficiently.

I will present two simple proposals along these lines: one at the lexical level, giving a precise analogue of part-of-speech tags, and one more controversial at the level of syntactic structure. The final question is whether these structures are capable of performing the roles that traditional syntactic structures were meant to fulfill: supporting semantic interpretation and explaining certain syntactic phenonema.

Bio

Alexander Clark is in the Department of Computer Science at Royal Holloway, University of London. His research interests are in grammatical inference, theoretical and mathematical linguistics and unsupervised learning. He is currently president of SIGNLL and chair of the steering committee of the ICGI; a book coauthored with Shalom Lappin, 'Linguistic Nativism and the Poverty of the Stimulus' was published by Wiley-Blackwell in 2011.

Regina Barzilay

Selective-Sharing for Multilingual Syntactic Transfer

Today, we have at our disposal a significant number of linguistic annotations across many different languages. However, to achieve reliable performance on a new language, we still depend heavily on the annotations specific to that language. This limited ability to reuse annotations across languages stands in striking contrast with the unified treatment of syntactic structure given in linguistic theory. In this talk, I will put recent multilingual parsing models into the context of this unified view. I will explain some of the puzzling results in multilingual learning, such as the success of direct syntactic transfer over more sophisticated cross-lingual approaches. Finally, I will demonstrate the benefits of formulating multilingual parsing models that are consistent with this unified view and thereby can effectively leverage connections between languages.

Bio

Regina Barzilay is an Associate Professor in the Department of Electrical Engineering and Computer Science and a member of the Computer Science and Artificial Intelligence Laboratory. Her research interests are in natural language processing. She is a recipient of various awards including the NSF Career Award, Microsoft Faculty Fellowship, the MIT Technology Review TR-35 Award, and best paper awards in the top NLP conferences. She serves as an associate editor of the Journal of Artificial Intelligence Research (JAIR) and is an action editor for Transactions of the Association for Computational Linguistics (TACL).

Noah Smith

Rethinking Inducing Linguistic Structure

We now have a rich and growing set of modeling tools and algorithms for inducing linguistic structure. In this talk, I'll discuss some of the weaknesses of our current methodology. I'll present a new abstract framework for evaluating of NLP models in general and unsupervised structure prediction models in particular. The central idea is to make explicit certain adversarial roles among researchers, so that the different roles in an evaluation are more clearly defined and participants in all roles are offered ways to make measurable contributions to the larger goal. This framework can be instantiated in many ways, simulating some familiar intrinsic and extrinsic evaluations as well as some new evaluations. This talk is entirely based on preliminary ideas (no theoretical or experimental results) and is intended to spark discussion.

Bio

Noah Smith is the Finmeccanica Associate Professor of Language Technologies and Machine Learning in the School of Computer Science at Carnegie Mellon University. He received his Ph.D. in Computer Science, as a Hertz Foundation Fellow, from Johns Hopkins University in 2006 and his B.S. in Computer Science and B.A. in Linguistics from the University of Maryland in 2001. His research interests include statistical natural language processing, especially unsupervised methods, machine learning for structured data, and applications of natural language processing. His book, Linguistic Structure Prediction, covers many of these topics. He serves on the editorial board of the journal Computational Linguistics and the Journal of Artificial Intelligence Research and received a best paper award at the ACL 2009 conference. His research group, Noah's ARK, is supported by the NSF (including an NSF CAREER award), DARPA, Qatar NRF, IARPA, ARO, Portugal FCT, and gifts from Google, HP Labs, IBM Research, and Yahoo Research.

The workshop will solicit short papers (6 pages of text, up to 2 pages of references) for either oral or poster presentation. We will consider allowing additional pages for accepted papers to give space to address the reviewer comments. Please follow the standard NAACL guidelines for paper formatting (details here). You can submit your papers using the following link

News

November 13 It's come to our attention that Table 3 in the paper reported the wrong results for the Directed Accuracy (with cutoff 10 instead of no cutoff). Unfortunately this has invalidated a small part of the evaluation. A corrected paper with updated results can be found here. We thank Jonaton Bisk and Julia Hockenmaier for pointing us to the errors.

April 20 Results for submitted systems and some baselines are now available (see ResultsPos for POS induction, ResultsDep for dependency induction and ResultsPosDep for joint induction.

April 4 Baseline and evaluations scripts are now available (see SharedTask).

Jan 27 Training data has been released.

Feb 8 Accommodation after NAACL will be tight due to an overlap with a Formula 1 event. You will need to book your hotel early so that you can attend the workshops.