Subject to a co-operation with DIPF, a Master’s thesis at Technical University Darmstadt presented first approaches to developing a method offering machine-based support of manual summaries of data collections in the field of educational research.

Project description

The German Eduserver hosts a big amount of manually curated links and information from the domain of educational science and of relevance both for researchers and practitioners. Human curators prepare the summaries for linked sources, such as websites, books and articles. This is a time-consuming and tedious task. These summaries should guide users of the Eduserver to links which are of interest to them. The quality of the background material varies considerably, as the data comes from various sources. Using natural language processing (NLP) methods we aim at creating a framework, which gives suggestions about a potential summarization to support the human curator. Initial results obtained through a master’s thesis at the Technical University Darmstadt in collaboration with DIPF indicate that it is feasible to provide the human curators with sentences to use in a final summary and that indeed it is helpful in creating a final summary. As these are only initial results, we want to build on the framework developed in the course of the master’s thesis to improve the quality, but also to compare our system against other systems, using standard evaluation metrics. Here, we can make use of the year-long tradition in the NLP community of competitions in certain tasks. One of these was the Document Understanding Competition (DUC), which later became the Text Analysis Competition (TAC). Through these competitions big data sets of various tasks (single document summarization, multi-document summarization, short and very short summaries, but also update summaries) and reference data to compare results of a specific system are available. These competitions also helped to develop standard evaluation tools, which we will use to evaluate our system.