13 June 2012

Like many people, I spent last week in lovely Montreal (at least lovely for the second half) at NAACL. Despite the somewhat unfortunate submission date, I thought the program this year was quite good. Of course I didn't see every talk and haven't read many of the papers yet, but I figured I'd point out what I saw that I liked and other people can comment likewise.

Identifying High-Level Organizational Elements in Argumentative Discourse (Madnani, Heilman, Tetreault, Chodorow). This is maybe one of the first discourse papers I've seen where I actually believe that they have a chance of solving the problem that they've set out. Here, the problem is separating the meat (content of an essay) from the shell (the bits of discourse that hold the meat together). It's a cool problem and their solution seems to work well. Very nice paper. (And Nitin's talk was great.)

Risk Training of Approximate CRF-Based NLP Systems
(Stoyanov, Eisner). This paper is basically about training approximate models based on some given loss function. Reminds me a lot of the Ross et al. CVPR 2011 paper on Message-Passing. It's a cool idea, and there's software available. Being me, the thing I wonder the most about is whether you can achieve something similar being completely greedy, and then whether you need to do all this work to get a good decision function. But that's me -- maybe other people like CRFs :).

MSR SPLAT, a language analysis toolkit (Quirk, Choudhury, Gao, Suzuki, Toutanova, Gamon, Yih, Cherry, Vanderwende). This is a demo of a system where you send them a sentence and they tell you everything you want to know about it. Never run your own parser/NER/etc. again. And, having see it in action at MSR, it's fast and high quality.

Parsing Time: Learning to Interpret Time Expressions (Angeli, Manning, Jurafsky). This was a great paper about semantic interpretation via compositional semantics (something sort of like lambda calculus) for time expressions. I cannot find myself getting super jazzed up about time, but it's a nice constrained problem and their solution is clean. I'm actually thinking of using something like this (or a subset thereof) as a course project for the intro CL course in the Fall, since I'm always starved for something to do with compositional semantics.

Unsupervised Translation Sense Clustering (Bansal, DeNero, Lin). If you want to build a bilingual dictionary from parallel text, you need to cluster translations into senses. Here's a way to do it. Nice results improving using bilingual contexts, which was nice to see.

I also feel like I should raise my glass to the organizers of NLP Idol and congrats to Ray for winning with Robert Wilensky's paper "PAM." (If anyone can find an online version, please comment!) Though I would actually encourage everyone to read all three papers if you haven't already. They all changed how I was thinking about problems. Here are the others: Burstiness (Church), Context (Akman), Suppertagging (Bangalore, Joshi).