BSc defense Marcel Michels

Metadata

Title

Information Extraction from Wikipedia’s Software Language Infoboxes

Abstract

Wikipedia is one of the world's biggest knowledge bases. Besides the articles' plain text there are infoboxes which are a way to structurally summarize the article's content. This thesis aims at extracting those infoboxes and normalize the given data. The outcome of the data processing is supposed to build a base so that articles can be compared to each other. To achieve this goal the structure and contents of those infoboxes will be analyzed to identify trouble makers within the wiki markup and it will be explained how these problems are solved in the implementation.