Improving the Sentiment Analysis Process of Spanish Tweets with BM25

[u' @incollection{sixto_improving_2016, series = {Lecture {Notes} in {Computer} {Science}}, title = {Improving the {Sentiment} {Analysis} {Process} of {Spanish} {Tweets} with {BM}25}, copyright = {\xa92016 Springer International Publishing Switzerland}, isbn = {978-3-319-41753-0 978-3-319-41754-7}, url = {http://link.springer.com/chapter/10.1007/978-3-319-41754-7_26}, abstract = {The enormous growth of user-generated information of social networks has caused the need for new algorithms and methods for their classification. The Sentiment Analysis (SA) methods attempt to identify the polarity of a text, using among other resources, the ranking algorithms. One of the most popular ranking algorithms is the Okapi BM25 ranking, designed to rank documents according to their relevance on a topic. In this paper, we present an approach of sentiment analysis for Spanish Tweets based combining the BM25 ranking function with a Linear Support Vector supervised model. We describe the implemented procedure to adapt BM25 to the peculiarities of SA in Twitter. The results confirm the potential of the BM25 algorithm to improve the sentiment analysis tasks.}, language = {en}, number = {9612}, urldate = {2016-06-21TZ}, booktitle = {Natural {Language} {Processing} and {Information} {Systems}}, publisher = {Springer International Publishing}, author = {Sixto, Juan and Almeida, Aitor and L\xf3pez-de-Ipi\xf1a, Diego}, editor = {M\xe9tais, Elisabeth and Meziane, Farid and Saraee, Mohamad and Sugumaran, Vijayan and Vadera, Sunil}, month = jun, year = {2016}, doi = {10.1007/978-3-319-41754-7_26}, note = {00000 }, keywords = {BM25, Data analysis, Linear support vector, NLP, Natural language processing, Sentiment analysis, Term frequency, Twitter, core-c, machine learning, social networks}, pages = {285--291} }']

Abstract

The enormous growth of user-generated information of social networks has caused the need for new algorithms and methods for their classification. The Sentiment Analysis (SA) methods attempt to identify the polarity of a text, using among other resources, the ranking algorithms. One of the most popular ranking algorithms is the Okapi BM25 ranking, designed to rank documents according to their relevance on a topic. In this paper, we present an approach of sentiment analysis for Spanish Tweets based combining the BM25 ranking function with a Linear Support Vector supervised model. We describe the implemented procedure to adapt BM25 to the peculiarities of SA in Twitter. The results confirm the potential of the BM25 algorithm to improve the sentiment analysis tasks.