We describe an unsupervised approach to
multi-document sentence-extraction based
summarization for the task of producing
biographies. We utilize Wikipedia to auto-
matically construct a corpus of biographical
sentences and TDT4 to construct a corpus
of non-biographical sentences. We build a
biographical-sentence classifier from these
corpora and an SVM regression model for
sentence ordering from the Wikipedia corpus.
We evaluate our work on the DUC2004
evaluation data and with human judges.
Overall, our system significantly outperforms
all systems that participated in DUC2004,
according to the ROUGE-L metric, and is
preferred by human subjects.