Tweets beyond Facts, the secrets of extracting information from Tweets

Dr Bergler presented a forward looking perspective on the extraction of knowledge from texts, outlining a series of use cases and techniques that allow to mine more than just factual information from texts, and emphasizing the necessity and impact of employing standard linguistic knowledge to achieve this.

0

votes

On Thursday, September 10th 2015 in the Mirror Hall of Sofia University, Professor Sabine Bergler gave a talk at the invitation of Mozaika, The Humanizing Technologies Lab, in front of select audience, including distinguished members of the Data Science Society. Doctor Sabine Bergler is a Full Professor at the Department of Computer Science at Concordia University, Montreal, Canada. She holds a Ph.D from Brandeis University, Boston, USA, on reported speech and has degrees from the University of Massachusetts at Amherst and the University of Stuttgart. She founded the CLAC Laboratory in 2002 at Concordia, where she conducts research on computational linguistics. Among the achievements of CLAC Lab we find groundbreaking work on sentiment analysis and embedding predicates as unified theoretical foundation in semantics and computational aspects of bioinformatics carried out under the direction of Dr Bergler. Her students consistently win competitions on speculative language in bioNLP at BioNLP, on negation focus and modality at worldwide shared task challenges such as *Sem and at QA4MRE and on sentiment analysis at SemEval.

Dr Bergler presented a forward looking perspective on extraction of knowledge from texts, outlining a series of use cases and techniques that allow to mine more than just factual information from texts, and emphasizing the necessity and impact of employing standard linguistic knowledge to achieve this. Her examples demonstrated that texts convey non-factual information such as authors’ points of view, that negation can be expressed in explicit and implicit manner in texts, that speculative and figurative languages require a combination of features to be accounted for and their values interpreted in an inverse way. For example, the word “positive” which has a positive connotation as a lexical unit, is obviously negative, when uttered in a context such as “positive results for brest cancer”.

She outlined the basics of the unified account adopted by her group of employing linguistic knowledge in shallow processing pipelines and the effects of applying this method to the different use cases. One convincing effect of these linguistic knowledge based approaches she reported about was that her team’s algorithms perform as top number 1 (the highest rank) on pilot worldwide shared challenges in different language processing areas, in which no prior task specific collection of text corpora to train and test with machine learning methods has been available, and the machine learning based algorithms fail in such shared task challenges.

The overall talk showed that linguistic principles form a solid baseline for modular, adaptable NLP modules, and that trigger-linguistic scope approach to speculative language, negation, and modality proved to be effective. Regardless, that it renders language processing tasks less scalable, relying on syntactic parsing is feasible, even for tweets, with appropriate preprocessing steps. Finally, extra-propositional parts of text prove effective in task-oriented evaluation.

Full of interesting examples from the bioinformatics domain and from tweets and featuring the work of several CLAC students and collaborators, conveying highly expert content in a very accessible form for a non expert audience, that did not lose concentration after close to two hours, the talk was followed by a vivid discussion during the social networking event at the Krivoto over a glass of wine and a pint of bear. Many had a lot of questions.

The slides of Dr Bergler’s presentation can be used for more details and information.