Introducing RapidMiner extension for MonkeyLearn

Introducing RapidMiner extension for MonkeyLearn

A few weeks ago we released the MonkeyLearn extension for RapidMiner, and since then it has become one of our sales team’s favorite tools to demo and create a proof of concepts for our leads. Not only that, but we have users and customers using this integration to do some really interesting data analysis, saving hours of manual data processing with this extension.

In short, RapidMiner is a platform for data science teams. It unifies all the preparation, development and deployment of machine learning models.

Although it’s capacities far exceed what I have used it for, it so simple to use, that even someone like me with no coding skills whatsoever can quickly create automated processes and analysis.

In this post, I’ll guide you how to use RapidMiner and MonkeyLearn to analyze reviews including:

Installing MonkeyLearn Extension

Click on the Extensions tab and open up the “Marketplace” within the RapidMiner application.

Use the search bar to search for MonkeyLearn.

Click on the MonkeyLearn extension.

Check the “Select for installation” box.

Accept the terms of service.

Install the package:

Installing MonkeyLearn Extension

Understanding and using the different types of operators

Operators in RapidMiner are the building blocks used to create processes. An operator has inputs and output ports. These operators define what action is performed on the input and provide the result as output.

MonkeyLearn has two different types of Operators:

Classifier Operator.

Extractor Operator.

When using an operator for the first time you will need to input your API key from MonkeyLearn to connect your account:

In the “Operators” tab, under the extensions folder, open the folder for MonkeyLearn and select a MonkeyLearn Operator.

In the “Parameters” tab, click on the MonkeyLearn logo next to API Token.

Connect the Operator to an Input (which you can do by using the mouse and dragging it).

Connect the Output to results port or other operators.

Select API Token (previously added).

Select Model ID.

Select Input Attribute (this would be the text sent to MonkeyLearn to classify).

Extractor Operator

The MonkeyLearn Extract Operator allows you to consume Extraction models from the MonkeyLearn API. Extraction models are used to extract data from text, that is, the result you are looking for exists within the text. MonkeyLearn has different extraction models to extract different types of data: keywords, entities, insights and much more.

To use an Extractor Operator you need:

Connect the Operator to an Input.

Connect the Output to results port or other operators.

Select API Token (previously added).

Select Type of Extraction.

Select Input Attribute (this would be the text sent to MonkeyLearn to make the Extraction).

Additionally, you can select to Split Rows, this will output each extraction made on a different Row instead of doing it on the same line.

To include an operator in the Process:

Use the operator search bar to find the correct operator.

Drag and Drop it on the Process tab:

How to include an operator in RapidMiner.

Connecting data and operators:

To connect two operators or a source of data with an operator you need to click and drag your mouse from the output of the first one to the input of the former.

Visualizing the results

RapidMiner has visualizations tools built right into the studio platform. We can quickly use this to visualize our review results and the predictions of MonkeyLearn on our data. When the process ends, we’ll be taken to the Results tab which will look something like this:

Visualizing the results of our analysis

Each row presents an opinion unit (OU):

Content: Original Review (where a “?” appears it means de OU belongs to the review above).

MonkeyLearn Extraction: Each individual Opinion Unit.

Aspect: Topic mentioned in the review.

Path 1 – Category: Full category path.

Path 1 – Probability: Probability of the OU mentioning that Aspect.

Classification: Sentiment behind each OU.

Category 1: Full category path.

Probability 1: Probability of the OU being good or bad.

Creating some charts:

With these results you can do some really interesting things, for example, you can build easy Bar Charts that will quickly help you understand the reviews and its analysis. In this case, I started by analyzing sentiment:

Aggregation: count (So it will count the number of Opinion Units belonging to each tag).

Rotate labels (So it’s easier to see).

Vertical (I prefer visualizing it that way, but you could quickly change it to horizontal).

Creating some visualizations of our data analysis.

From this graph, we can see that most Opinions gathered were good (more than 2,200) than bad (barely 300).

Or we can try to understand how many Opinion Units mentioned different Aspects by switching Group-by Column to Aspect. Which will output a graph like this:

Visualizing the aspect predictions of our reviews.

Which allows us to see that most opinions were about location, staff, comfort & facilities as well as value for money.

Wrapping up

As we can assess the possibilities of combining RapidMiner with MonkeyLearn are endless and it’s just a matter of getting started and playing with data. RapidMiner does offer much more than what I could cover in this guide but the idea was to get you started on the path of analyzing reviews.

Do you have reviews about your product or services? Are you using this data to inform your decisions?