Course Content

The lecture offers an introduction into the perspectives, problems, methods and techniques of text technology. All examples and tutorials are based on the programming language Python.

Key aspects:

Natural language processing (NLP)

Tokenizing

Segmentation

Part-of-Speech Tagging

Corpora

Statistical analysis

Machine Learning

Categorization and classification

Information Extraction

Introduction to Python

Data Structures

Library NLTK

Structured Programming

The course is based on the Python programming language together with an open source library called the Natural Language Toolkit (NLTK). NLTK allows explorative and problem-solving learning of theoretical concepts without the requirement of extensive programming knowledge.

The course assumes familiarity with basic computing concepts, but will not assume any knowledge of the Python language, which will be acquired during the course. If you like to work with your own notebook, we kindly ask you to follow the installation instructions given at http://www.nltk.org/download.