IDEA: Text Summarization

I sat in on a masters thesis seminar that inspired this idea. The problem is to create a summary of a given document. the trade offs are accuracy for readability.

example problem:

Write a program that will create a 50 word summary of a 300 word document. summaries should be judged in 2 catagories, Accuracy and Human Readability.

extreme solutions:

1) take the top 50 words based on frequency count. accuracy rating will be high, human reability will be low.

2) take the first 'n' sentances up to 50 words. Human Readability will be high, however many important points in the document will be left out resulting in a low accuracy score.

The objective is to find a balance somewhere. Creating a summary that encompases the most information while still being human readable.

I thought this might be a good contest idea as anyone that can read a file into a program can participate. Newbs can use simple sentance or word selection algorithms while more advanced programmers can dip into areas of NLP (natural language processing) or anything else they can think of.

Originally posted by blackrat364 Good God that sounds complicated. How long are you expecting this contest to take?

its really not that complicated. you could write a 5 min program to just select a few sentances which could possibly perform better than someones 2 week implementation of some nasty NLP based word selection algorithm.

you basically just need to count some words, decide which words/sentances to take and write them to a file. The only thing that might take a long time is judging. i volunteer blackrat for that task lol