Twitter Sentiment Analysis (Almost) from Scratch

A popular application in Natural Language Processing (NLP) is the Sentiment Analysis (SA), i.e., the task of extracting contextual polarity from a given text. The social network Twitter provides an immense amount of text (called tweets) generated by users with a maximum number of 140 characters. In this project, we plan to learn a tweet representation from publicly provided data from Tweets in order to infer sentiment from them. One challenge on this task is the fact that tweets are generated from very different users, making the data very heterogeneous (different from regular data which is written in proper English). Another challenge is, clearly, the large scale of the problem. We propose a deep learning sentence representation (called tweet representation) from user generated data to infer sentiment from tweets. This representation is learned from scratch (directly from the words in tweet) over a large unlabeled corpus of tweets. We demonstrate that we achieve state-of-the-art results for SA on tweets.