Sentiment Analysis with Python

Sentiment Analysis with Python

Let’s say that you have a lot of text lying around, written by different people. You haven’t read these pieces of text, and you don’t know what they’re talking about. However, they contain data that’s important to you or your business.

They could be online reviews of your products, NPS responses, or news pieces reporting on new technology you’re interested in. The possibilities are endless.

It’d be great to know the overall feeling and opinion expressed by the writer in each piece of text, but this can end up being a mammoth task for one person, especially if there’s a massive amount of data to read and make sense of.

What is Sentiment Analysis?

Sentiment analysis is a set of Natural Language Processing (NLP) techniques that takes a text (in more academic circles, a document) written in natural language and extracts the opinions present in the text.

In a more practical sense, our objective here is to take a text and produce a label (or labels) that summarizes the sentiment of this text, e.g. positive, neutral, and negative.

For example, if we were dealing with hotel reviews, we would want the sentence ‘The staff were lovely‘ to be labeled as Positive, and the sentence ‘The shared bathroom was absolutely disgusting‘ labeled as Negative.

Getting machines to do this is no easy feat, and it involves skills from different fields of knowledge, such as computer science, statistics, and linguistics.

Why is it Important?

Sentiment analysis allows businesses to quickly process and extract actionable insights from huge volumes of text without having to read all of them. More specifically, it’s useful for gauging how an audience feels about something. Be it tweets, product reviews, or NPS comments, sentiment analysis is a tool that enhances an organization’s understanding of customer opinions and actions.

Since it’s automated, sentiment analysis allows you to perform analysis of texts in real time and always against the same set of criteria. You aren’t dealing with several people with different biases at work, but rather with a single unified system that has a consistent output.

How to do Sentiment Analysis in Python?

Now, you can do sentiment analysis by rolling out your own application from scratch, or maybe by using one of the many excellent open source libraries out there, such as scikit-learn.

However, this can end up being a bit of a hassle. Implementing a machine learning solution on your own can be a daunting task that requires resources to build and maybe even hiring expert data scientists. You will need to gather quality data to train the models, source some hardware (maybe even GPUs) to run your software on, and test relentlessly to get a solution that works. Then, when it’s built and it works, more resources are required to integrate the new module into your existing solution, to maintain it, and to keep it updated.

Instead, you might be better of trying a SaaS API for sentiment analysis such as MonkeyLearn. In this tutorial, you will learn how to use MonkeyLearn and try a pre-built sentiment analysis model.

Check out our public sentiment analysis model and test it out. Have a go at writing texts with different sentiments, so you can see how the model performs:

Now, let’s say you are happy with how this model works, and want to use it in production. This, of course, is an entirely different beast.

You can use the MonkeyLearn API to automate access to the models, and programmatically perform sentiment classifications. In the API tab, there are instructions on how to integrate using your own code, whether written in Python, Ruby, PHP, Node or Java:

You can also send plain requests to our API, and parse the JSON responses yourself. However, we built SDKs in multiple languages to make integrating our API simpler for developers.

Now that the introductions are out of the way, let’s get down to business. First of all, to use our API you need to get an API key. Sign up for free to get yours. Then, install the Python SDK:

Shell

1

2

pip install monkeylearn

You can also clone the repository and run the setup.py script:

Shell

1

2

3

4

$git clonegit@github.com:monkeylearn/monkeylearn-python.git

$cdmonkylearn-python

$python setup.pyinstall

And that’s it for setup.

You’re ready to run a sentiment analysis on your texts with the following code:

Python

1

2

3

4

5

6

7

8

9

frommonkeylearn importMonkeyLearn

ml=MonkeyLearn('<<Your API key here>>')

data=['The restaurant was great!','The curtains were disgusting']

model_id='cl_pi3C7JiL'

result=ml.classifiers.classify(model_id,data)

print(result.body)

The output will be a Python dict generated from the JSON sent by MonkeyLearn, and should look something like this:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

[{

'text':'The restaurant was great!',

'classifications':[{

'tag_name':'Positive',

'confidence':0.993,

'tag_id':33767179

}],

'error':False,

'external_id':None

},{

'text':'The curtains were disgusting',

'classifications':[{

'tag_name':'Negative',

'confidence':0.979,

'tag_id':33767178

}],

'error':False,

'external_id':None

}]

We return the input text list in the same order, with each text and the output of the model. These results are ready for you to start automating processes and get insights from data.

For full documentation of our API and its features, check out our docs.

Creating your own Sentiment Analysis Model

An important thing to remember about machine learning is that a model will perform well on texts that are similar to the texts used to train it. It will not perform so well on texts that are different. This means that if you train a sentiment analysis model using survey responses, it will work great for new survey responses, but not so great for tweets, for example.

Generic sentiment analysis models are pretty good for many use cases and getting started right away, but sometimes it’s not enough – you need a custom model trained with your own data. We put a lot of love into creating our models, and they were trained with a lot of data, but their performance can be improved upon for smaller, more specific problems.

Another reason why you might want to train your own custom model is labeling criteria. One of the main appeals of automatic classification is consistency, but if the original criteria used for labeling is not useful for your case, then the model will not work for you. In other words, a negative for one organization may be a positive for you.

With MonkeyLearn you can do all this: you can create a custom model, upload your own texts, define your own tags, and train models using a simple user interface. This new model will perform well for texts that are similar to the training set, and will follow the exact criteria that you apply.

Data for Training the Model

The single most important thing for a machine learning model is the training data. Without good data, the model will never be good; as the saying goes, garbage in, garbage out.

For this example, you can use this dataset, composed of texts from hotel reviews. The dataset is a CSV file with two columns: Text and Sentiment, which can be one for negative or positive.

Not all the texts of the dataset are tagged. MonkeyLearn will train a model with the tagged texts, and then you can keep improving the model by tagging more texts yourself using our UI.

Now, let’s upload this data to MonkeyLearn to train a sentiment model for hotel reviews.

Training the Sentiment Analysis Model

Creating a custom model is simple. All you need to do is upload your data and tag it if needed, and the model will learn from this data. MonkeyLearn automatically chooses the best parameters and handles the training for you.

1. Create a text classifier

You’ll be prompted to choose a more specific classification model , so we can automatically tune it to your needs. Choose Sentiment Analysis:

2. Upload the data from the dataset

Next, you have to upload the data for your classifier. There are many ways to do this, but in this case choose CSV and upload the example dataset from earlier.

The final step is to tell MonkeyLearn how to interpret the columns in the file. If you were to upload an untagged file this wouldn’t be an issue. However, since our dataset has some tags already, you need to check Advanced and select Use as Tag on the tag column:

3. Test the model

You’re done! The model has been trained and is now ready to use.

In the Run tab, you can find all the options for testing and using your model, just like with the pre-trained sentiment analysis model from before.

4. Keep improving the model

Remember, in the dataset we included some untagged texts as well. You can go to the train tab and keep tagging with active learning, in order to improve your model:

Calling the Model API with Python

You’re done! Now the model is ready to use.

You can perform a classification and get sentiment labels for your texts in pretty much the same way as you were using the public model from before:

Python

1

2

3

4

5

6

7

8

9

frommonkeylearn importMonkeyLearn

ml=MonkeyLearn('<<Your API key here>>')

data=['The room was great!','The curtains were disgusting']

model_id='<<Your model ID here>>'

result=ml.classifiers.classify(model_id,data)

print(result.body)

And the output for this code will be similar as well:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

[{

'text':'The room was great!',

'classifications':[{

'tag_name':'positive',

'confidence':0.836,

'tag_id':103237939

}],

'error':False,

'external_id':None

},{

'text':'The curtains were disgusting',

'classifications':[{

'tag_name':'negative',

'confidence':0.924,

'tag_id':103237938

}],

'error':False,

'external_id':None

}]

An important side note is that you can do all of this from the API – you can create a classifier, upload data to it, create or delete tags, and so on. If you’re interested in how this is done, check out our API documentation.

Wrap-up

We’ve established that sentiment analysis is a powerful tool with many applications. However, using it to automate processes and get insightful data is not always simple.

Instead of setting up your own algorithms from scratch to run sentiment analysis, we recommend using a pre-built model. At least to start with, so that you can understand how sentiment analysis works, as well the benefits it can bring to your business. It’s also time-consuming to set up your own machine learning infrastructure, not to mention costly since you’ll need extra resources and hardware.

With MonkeyLearn, you can start doing sentiment analysis right now, either with a pre-trained model or by training your own. We recommend the latter so that you can tailor models to your business using data and tags that are relevant to the problems you’re trying to solve – leading to better insights for your business.

MonkeyLearn also has a clear documentation on how to set up your own models using our API. First, you’ll need an API key, which you can get when you sign up for free to MonkeyLearn. Then, all that’s left is to get started with sentiment analysis by installing the Python SDK. So, what are you waiting for?!