Projects

I am principle investigator of this project, funded by the German research council (DFG).

Emotion analysis in natural language processings aims at associating text with emotions, for instance with anger, fear, joy, surprise, disgust or sadness. This task extends sentiment analysis, which adds further qualitative value in applications, for instance in social media analysis, in the analysis of fictional stories or news articles.

Existing research has so far mainly focused on the association of text with specific emotion models from psychological research. The development of methods for detecting phrases in text which denote the emotion experiencer (the character or person who feels the emotion), the emotion theme (the cause of the development of an emotion) as well as the modifiers of an emotion (intensifiers and diminishers) has been neglected.

In this project, we aim at filling this gap. We will develop manually annotated corpora from different domains (news, novels, social media) in German and English. Based on these resources, we develop models which are able to automatically recognize and extract such information. We work on different levels: Firstly, we connect words with emotions (with distributional and lexical methods), including grammatical variants. Then, secondly, we analyze these mentions in context with modifiers, the feeler and the theme (cause) of the emotion. Thirdly, we model these information in context, i.e., beyond seperated mentions. All methods will be analyzed regarding their domain and language independence.

I am a co-principle investigator ("Co-Applicant") in this project (PI Prof. Sebastian Padó). It is funded by the German research council.

In many kinds of prose texts, both literary or newswire texts, reported speech plays an important role as a source of information aboutcharacters, their attitudes, and their relationships. Going further,such information can aid in the analysis of patterns of behavior and theconstruction of social networks.While readers do not have any problem in assembling representations forcomplete situations from individual instances of reported speech, thisis still a challenging task for computers. Current state of the artmethods are generally organized as "pipelines" which start fromindividual instances of reported speech and proceed incrementally tomore global properties of the situation or characters. Since individualinstances of reported speech are often short and uninformative, apipeline procedure often causes prediction errors which cannot berectified in retrospect.In this project, we develop joint inference methods to model the variousaspects of reported speech (who is the speaker? the hearer? What is thecontent? What is the relationship between speaker and hearer?) togetherinstead of individually. The resulting joint model takes account of theinterdependencies between these aspects. Thus, information from thedifferent aspects can complement each other. The result of this part ofthe project is a solid starting place (in terms of natural languageprocessing methods) for the application of such methods for theautomatic analysis of reported speech in digital humanities and socialsciences.This algorithmic goal is complemented by a goal from corpus andcomputational linguistics, namely elucidating the relationship betweenreported speech and other aspects of semantic analysis. In particular,there appears to be a close relationship between reported speech and (asubset) of semantic roles. Yet, no comprehensive formal analysis hasbeen carried out so far. We will provide a linguistic characterizationof the relationship and exploit it algorithmically to further improvethe recognition of reported speech as discussed above. The results ofthis part of the project is the (at least partial) consolidation of twostrands of research that have largely been treated as independent sofar.

I am principle investigator in this project funded by MWK Baden-Württemberg and University of Stuttgart.

In the Life Sciences, most information is only available in free text in scientific publications. Automatic methods to extract such knowledge and to provide it in structured databases is challenged by a dilemma: Especially if potentially new information is detected in text, it is unclear if this information is actually correct or if it is wrongly extracted, for instance because the text is formulated in an uncommon way. In this project, methods will be developed which help to estimate the reliability of extracted information from biomedical publications.