Blogging About Data

DATA SCIENCE WARRIOR

Text Mining Airline Tweets

Companies, for a long time, have relied of consumer feedback to help improve the quality of their product or service, and today consumers are presented with an abundance of avenues to provide that feedback, from yelp, to google +, to twitter and other social media platforms. This has led to the overwhelming accumulation of data, both structured and unstructured, across all industries globally. While structured data has provided the essential statistical insights to help companies improve their decision-making process, most data collected is unstructured in the form of text, video and audio. The challenge of unlocking valuable insights from these unstructured has proven to be pivotal to the success and continued growth of businesses as thy strive to meet consumer demands.

In this project, we explore the use of Text Mining as a means of extracting valuable, quantifiable business insights from consumer tweets. The data used, obtained from Kaggle, focuses on tweets scarped from Twitter in February 2015 concerning six major U.S. airlines. Through various data exploration techniques, we reveal some insights from the results of the previous sentiment analysis that led to the dataset used. It will be shown that the previous study sought to analyse negative sentiments, but do not explicitly reveal anything about the positives.

Our analysis seeks to dig deeper, analysing the tweets from scratch to not only determine the sentiments, but also the major topics for each. We use the Latent Dirichlet Allocation natural language topic modeller to extract meaning from the tweets, provide insights into the consumer experience, and discover what the companies are doing right and wrong.

The insights will be presented in a shiny app that will allow these companies to better gauge the sentiments of their customer-base, discover the areas of their serve that need improvement and assist them in their decision making process.