03 July 2006

(Minor note: the latest version of the EDT section didn't get folded in properly to the version of the thesis that went up yesterday; this is fixed now.)

Many people have asked me how I settled on a thesis topic, I think largely because they are trying to find their own paths. My story is (at least to me) a bit odd and circuitous, but I'll tell it anyway.

When I came to USC, I knew I wanted to do NLP and, more specifically, I wanted to do summarization. Working with Daniel was a natural choice. I was particularly interested in coming up with good models for getting around the sentence extraction paradigm that dominates the field. My first work was on extending Kevin and Daniel's sentence compression work to the document level by using discourse information. This worked reasonably well. My next milestone was to try to leverage alignment information of document/abstract pairs in order to learn complex transformations. This led to the segment HMM model for doc/abs alignment, that met with reasonable success (considering how darned hard this problem is).

At that point, it became clear that trying to do a full abstractive system just wasn't going to work. So I started looking at interesting subproblems, for instance sentence fusion. Unfortunately, humans cannot reliably do this task, so I threw that out, along with a few other ideas. Around the same time, I began noticing that to have any sort of reasonably interesting model that did more than sentence extraction, I really was going to need to run a coreference resolution system. So I went to the web (this was back in 2003) and found one to download.

Except I didn't because there weren't any publicly available.

So I went to build my own. I wanted to do is "right" in the sense that I wanted a principled, joint system that could be trained and run efficiently. I read a lot of literature and didn't find anything. So then I read a lot of machine learning literature (I was beginning to get into ML fairly hardcore at this point) to find a framework that I could apply. I couldn't find anything.

So I decided to build my own thing and came up with LaSO, which is essentially a formalization and tweak on Mike Collins and Brian Roark's incremental perceptron. My thesis proposal used LaSO to solve the EDT problem, and I presented LaSO at ICML 2005. Also at ICML 2005 I met John Langford, having previously noticed that what I was doing with LaSO looked something like some recent work he had done on reinforcement learning. We had a long dinner and conversation and, after a few visits to Chicago, many emails, and lots of phone calls, came up with Searn. With both LaSO and Searn, I always had in the back of my mind that I didn't want to make any assumptions that would render it inapplicable to MT, since everyone else at ISI only does MT :).

At about the same time, I started thinking about how to do a better job of sentence compression and came up with the vine-growth model that eventually made it into my thesis. This was really the first point that I started thinking about summarization again, and, in the evolution of LaSO to Searn, it now became possible to do learning in this model.

So, in a sense, I had come full circle. I started with summarization, lost hope and interest, began to get more interested in EDT and machine learning, and then finally returned to summarization, this time with a new hammer.