Hauptseminar Sentimentanalyse

Course goals

Students will get an overview about the area of sentiment analysis and the challenges it presents. Students will read scientific papers and familiarize themselves with this kind of literature.Note: The papers that will be read this year are different from those read last year, students from last year are welcome to join.Please also note: That obviously you cannot get credit twice for the same class even though you do a different presentation.

Schedule and Resources

Submission and e-mail notification will be managed in ILIAS, so please register there.

* If you are unable to access the paper, a subscription from the university library may be needed. Try it from inside the university network.

Course Content

Sentiment analysis automatically identifies opinions expressed in language about real-world items. Most commonly, opinions are classified into the categories "positive" and "negative". Sentiment analysis has become an important topic over the last 10 years and there has been a large amount of publications in this area. In this seminar different methods for analyzing opinions on different levels will be presented.

Subjectivity Classification

Subjective statements refer to the internal state of mind of a person and cannot be observed. In contrast, objective statements can be verified by observing and checking reality. It is sometimes useful for a sentiment analysis system to filter out objective language and predict sentiment based on subjective language only. Unfortunately, detecting subjectivity is also a complicated problem.

References: [RW03], [WWH05]

Subjectivity Word Sense Disambiguation

Sentiment analysis often uses dictionaries that list the polarity of each word. However, many words have both subjective and objective senses. Subjective words used in an objective sense are a significant source of error in sentiment classification. Subjectivity word sense disambiguation tries to automatically determine which word instances in a corpus are being used with objective senses.

References: [WM06], [AWM09], [AWCM11]

Polarity Reversers

To determine the polarity of an expression with only a lexicon of positive and negative words is often not sufficient, because many phenomena can influence the polarity. The most obvious example for such influences are "polarity reversers", words that reverse the polarity of a sentiment word, e.g., "no" or "not". An approach to tackle this problem is to assume the polarity of a word is known and classify each sentiment word as reversed or non-reversed according to its context.

References: [ITRO08], [CC08], [WBRK10]

Conditional Sentences

Conditional sentences are sentences that describe implications or hypothtical situations and their consequences. Some conditional sentences directly express sentiment on a product, but many of them express a hypothetical situation, a wish or a general implication.

References: [NLC09]

Comparative Sentences

A common way to express opinions is by comparing one entity with a different entity. There are different types of comparisons, direct comparisons of two entities, a comparison of the entity to a general standard and superlatives that set one entity above all others in the comparison set. Simply detecting comparative adverbs or adjectives is not sufficient, because it is possible for a sentence to contains a comparative word, although it is not a comparative sentence ("couldn't agree with you more") while on the other hand a comparative sentence does not necessarily have to include any comparative word ("no joy stick unlike the sony ericsson t60").

References: [JL06a], [JL06b], [GL08]

Topic Models (CS)

These papers present a framework for extracting the ratable aspects of objects from online user reviews. A statistical model is used to discover topics in text and extract text snippets supporting the ratings of aspect different aspects.

References: [TM08a], [TM08b]

Linguistic Features (CS)

Many classifiers for the classification of sentiment polarity use only shallow features like bag-of-words. To enhance the accuracy of sentiment polarity classification, several features based on linguistic analysis and syntactic structures have been proposed.

References: [DLP03], [Ga04], [MTO05]

Opinion Spam

The term "opinion spam" refers to fictive reviews that have been written to mislead humans or automatic systems in their evaluation of the opinions about a product or a service. Fictive positive reviews are written to artificially improve the perceived opinion of a product or a service, fictive negative reviews are written to damage the reputation of a competitor or its products.

References: [JL07], [JL08], [OCCH11]

General Organizational Information

The course is open for students of

M.Sc. Computational Linguistics as a part of the concentration "Statistical Natural Language Processing" or as an elective (3 ECTS or 6 ECTS).

Diplom Computerlinguistik as an elective in the Hauptstudium.

B.Sc. Maschinelle Sprachverarbeitung as an elective in the "Wahlbereich F: Fortgeschrittene Themen der Maschinellen Sprachverarbeitung" in the module "Fortgeschrittene Methoden in der Maschinellen Sprachverarbeitung". The final grade for the module will be an average of the grades in “Sentiment Analysis” and “Natural Language Generation”.

Diplom Informatik as part of the Nebenfach.

This course includes a number of introductory classes about the basics of sentiment analysis and the most important challenges in the area. Afterwards, some specific challenges for automatic sentiment analysis are presented in talks by the students. To get credit for this class, you need to give a presentation and hand in a written report about one of the topics presented above. Every student is required to read all papers to be discussed in class beforehand.

Some previous knowledge of machine learning methods may be helpful (e.g., from the class "statistische Sprachverarbeitung" or "Information Retrieval").

Evaluation

To get credit for this class, you need to give a presentation and hand in a written report about one of the topics presented above. The grade consist of the following parts:

The oral presentation (25 % of grade).
The presentation should be around 30-45 minutes with a discussion afterwards.
No template is given, it is not mandatory to have slides, feel free to use the blackboard or handouts. For slides LaTeX Beamer is recommended.
Deadline is date of presentation, it is recommended to get feedback beforehand.

The written report (50 % of grade).
A preliminary report has to be handed in a week before the talk. The final report has to be handed in a week after the talk (as original document and in pdf format).
Length: 10-12 pages for B.Sc. and M.Sc. 3 ECTS elective, 5-10 pages for M.Sc. concentration, 5-20 pages for M.Sc. 6 ECTS elective.
A template can be found below.
Reports should focus only on the main paper and include other references as far as necessary for the understanding of this paper. For M.Sc. 6 ECTS elective a discussion of related work with a search for more literature and more detailed comments is necessary.

You need to participate actively in class, this includes regular attendance and reading all other papers (25 % of grade).

A very quick guide to writing your report in LaTeX:
Download the files linked above. Put all of them in one folder. Rename the .tex file to ausarbeitungYOURNAME.tex. Open a terminal, go to that folder and type pdflatex ausarbeitungYOURNAME.tex. After a lot of printing on the command line, you should get a file named ausarbeitungYOURNAME.pdf. Voila, you did it!
Read through the things in ausarbeitungTemplate.tex, it contains examples for writing in italics, bold, creating tables, figures and references. Just copy what you need. Also, there are many many resources online, e.g. the LaTeX Wikibook.
If you get an error like ! LaTeX Error: File 'XYZ.sty' not found. make sure the file is in the same folder. If it is a file with .sty, it is a package. You have two possibilities, (a) remove the line \usepackage{XYZ} (which might cause some commands not to work or some things to look differently), or (b) try to download that file from CTAN and put it into your folder (it might be more complicated).
If you get a warning LaTeX Warning: There were undefined references. you will notice some ?? in your document at places where references should be. For references to sections, tables of figures, just run pdflatex ausarbeitungYOURNAME.tex again. For bibliography references you need to run bibtex ausarbeitungYOURNAME and then run pdflatex ausarbeitungYOURNAME.tex again twice.
If you get a warning LaTeX Warning: Label(s) may have changed. Rerun to get cross-references right. some references may be wrong (e.g. section 3 has changed to be now section 4, but your reference still says "see section 3"). Rerun pdflatex ausarbeitungYOURNAME.tex to get them right.Very important: Before you hand in, make sure none of these warnings appear!