Digital Text Analysis Working Group

The emergence of the computer has led to a considerable number of digital approaches and areas of research for many aspects of texts: computational linguistics, natural language processing (NLP), text mining, stylometrics, to name but a few. The digital text analysis working group brings together a group of interdisciplinary researchers fortnightly at the GCDH to discuss and better understand these new approaches. The typical format consists of one participant presenting some tool or method from their own research, followed by intense discussion focusing particularly on the use of this tool or method in the research of the other participants. The goal is to enhance textual scholarship on the Göttingen Research Campus through the introduction and further development of digital methods.

Computational linguistics focuses on designing algorithms and formal descriptions that can accurately represent, e.g., the morphological and syntactical structures of languages in order to process them computationally. Speech recognition, machine translation, language parsing, part of speech tagging, and historical linguistics can all be ordered under computational linguistics.

Natural language processing (NLP) is an application of computational linguistics. It typically focuses on using symbolic/logical and statistical methods machine learning algorithms to extract meaningful information from “natural language,” i.e., non-experimental written or spoken texts. It shares several tasks with computational linguistics, such as language parsing and machine translation, but also focuses on things such as named entity recognition (NER) and sentiment analysis.

Text mining seeks to extract information from texts. It focuses on such tasks as text categorization and clustering, sentiment analysis, and document summarization.

Stylometrics uses strategies from the three approaches mentioned above to identify stylistics elements that can lead to, e.g., authorship attribution, identification of forgeries, temporal classification of documents, and identification of translations and translation style.

An excellent introduction to several tools for digital text analysis that are accessible even for beginners and yet still powerful enough to produce genuine research results can be found at the Wiki "Literatur Rechnen".

For any queries regarding the programme of or participation in the working group, please contact Matt Munson.