Building a Twitter Sentiment Analysis Process in RapidMiner

Introduction

In this tutorial we’re going to walk you through using the “Text Analysis by AYLIEN” Extension for RapidMiner, to collect and analyze tweets. If you’re new to RapidMiner, or it’s your first time using the Text Analysis Extension you should read Part 1 of our Getting Started Blog which takes you through the installation process. Also, If you haven’t got an AYLIEN account, which you’ll need to use the Extension, you can grab one here.

Assign the tweets to different categories using the Categorize Operator

Visualize our results and make them more consumable and understandable

Gathering your tweets

Create a new Process in RapidMiner and add a Search Twitter Operator. Build your desired search as you would using the Twitter search API. You can see from the screenshot below we’re searching for tweets containing the keyword “Samsung”. We’ve cleaned up our search a little by removing retweets (-rt) and links (-http). We’ve also restricted the number of tweets to collect to 20 and decided we only want to see English tweets by adding “en” in the language parameter. We’ve also indicated that we want only recent or popular tweets to be returned using the “Result type” parameter.

Results from Twitter Search

Firstly, we’ll have a look at what kind of results our search returns. Once you hit run (don’t forget to connect your Operators) the results from the Twitter search are displayed in an ExampleSet tab, like the one below:

Analyzing your results

So now we have a collection of 20 tweets stored in an ExampleSet that are ready to be further analyzed. The first thing we’re going to do from an analysis point of view is, try and determine what the Sentiment of each tweet is, i.e. whether they are Positive, Negative or Neutral.

We do this by adding the Analyze Sentiment Operator to our Process and selecting “text” as our “Input attribute” on the right hand side, as shown in the screenshot below.

So now we have a relatively simple Twitter Sentiment Analysis Process that collects tweets about “Samsung” and classifies them according to their Polarity.

As is displayed in the ExampleSet below, the results now contain not only the tweets that were pulled in but their corresponding Polarity and Subjectivity as well as a confidence score for both.

Categorizing your tweets

So we’ve determined the sentiment of the tweets but like we said in the beginning, we also want to categorize them in some way. We can do this pretty easily by using the Categorize Operator from the Text Analysis Extension, but before we do we need to prepare our data for analysis.

Firstly we’re going to use a Data to Documents Operator to generate Documents from our existing data set making it easier to categorize.

We’ll then add a Categorize Operator which will basically classify our text based on a particular taxonomy, in this case we’re using the IAB QAG taxonomy, which is a standard used in the digital advertising industry for categorizing content.

Now our Process is starting to take shape, but because we previously transformed our data into documents before they were categorized, we need to reverse the process and create a dataset from the resulting categorized documents, which in turn will make it easier to visualize and understand as a whole.

So here’s what our completed Process looks like.

It collects tweets, analyzes the Sentiment of those tweets, prepares them for categorization against a taxonomy and displays the results in an ExampleSet, like the one below.

Cool, huh?

Visualizing the Results

So we have our results stored in a table (ExampleSet) but in order to make them more presentable we want to visualize them a bit better.

RapidMiner let’s you display and visualize results of your Process really easily using simple charts and visualizations like the ones below, which can all be created using the Charts widget on the left hand side of your results display.

Bar chart showing # of positive, negative and neutral tweets:

Pie chart showing # of positive, negative and neutral tweets:

Pie chart showing a breakdown of tweets by their top-level category:

For the Data Junkies among us however, you may want to export your results and visualize them using something else like Tableau for example, which by the way, there’s an integration on the way for.

We’ve also created a repository for sample Processes that we’ll be adding to on a regular basis. It will be a collection of use case focused RapidMiner Processes, that can be downloaded and imported directly in RapidMiner. You can find more info in the RapidMiner documentation on our website.

A legal convert with a masters degree from Smurfit Business School, Mike runs our Sales and Marketing at AYLIEN. Mike gathered his Sales and Marketing experience with technology companies in Sydney and Dublin before getting the startup itch and joining the team at AYLIEN.