First Results on Detecting Term Evolutions

View/Open

Date

Author

Metadata

Abstract

The archival of content like publications or web pages is
just the first step toward “full” content preservation. It also
has to be guaranteed that content can be found and interpreted
in the long run. The correspondence between the
terminology used for querying and the one used in content
objects to be retrieved, is a crucial prerequisite for effective
retrieval technology. However, as terminology evolves over
time, a growing gap opens between older documents in (longterm)
archives and the active language used for querying
such archives. Thus, technologies for detecting and systematically
handling terminology evolution are required to ensure
“semantic” accessibility of archived content in the long
run. The core of our approach is to derive mappings between
terminologies originating from different times by the fusion
of term concept graphs. To verify the suitability of our approach,
we present first results of experiments conducted on
The Times archive that covers 200 years of documents. In
addition, we discuss how our approach can be applied to
web archives and the challenges that arise from this.