Analyzing GitHub Issue Comment Sentiment With Azure

Developers are real passionate about their semi-colons; or lack thereof. Comment threads on GitHub can get a bit…testy…on this topic. What’s a beleaguered1 repository maintainer to do when an issue comment thread gets out of hand?

GitHub provides community tools maintainers can use to define community standards for their projects. For example, it’s easy to add a code of conduct to a repository. It’s also possible report offensive comments directly to GitHub. However, a code of conduct is only a set of words on a page. It’s only effective if you enforce it. And face it, enforcing it can be very time consuming.

What if a bot could help? Now I’m not so naïve to think you can take the very human problem of enforcing community standards and just sprinkle a bit of Machine Learning on it and the problem goes away. Clippy taught me that.

The Idea

This was the idea I had in mind when I decided to explore some new technologies. I learn best by building something so I set out to add sentiment analysis to GitHub issue comments.

Sentiment analysis (also known as opinion mining) is the use of computers to analyze text to try and determine whether a piece of writing is positive, negative, or neutral. It relies on multiple fields related to AI such as natural language processing, computational linguistics, machine learning, and wishful thinking.

To make this work I need to do four things:

Drink some whiskey

Listen to and respond to GitHub issue comments.

Analyze the sentiment of the comment.

Update the comment with a note about the sentiment.

The idea is this: when an issue receives a negative issue comment, I’m going to have my “SentimentBot” update the comment with a note to keep things positive.

DISCLAIMER: I want to be very clear that I chose this behavior as a proof of concept. I don’t think it’d be a good idea on a real OSS project to have a bot automatically respond to negative sentiment. If I were doing this for real, I’d probably have it privately flag comments in some manner for follow-up. You’ll probably see me make this clarification again because people have short memories.

The GitHub Listener

Webhooks are a powerful mechanism to extend GitHub. There are three key steps to set up a webhook.

Set up an application that can receive an HTTP POST from github.com.

Register the application as a webhook on a repository.

Configure the repository events the webhook listens to in the repository settings page.

That first step is a bit of a pain. I need to write an entire application and host it at a publicly available URL? Ugh! So 2015!

All I really want to do is write a tiny bit of code to respond to a Webhook call. I don’t care how its hosted.

Serverless architecture to the rescue! The “Serverless” nomenclature has been the source of a lot of snide comments and jokes. The name may lead one to believe we chucked the server and are hosting our code on gumption and hope. But it’s not like that. Of course there’s a server! You just don’t have to worry about it. You just write some code and the Serverless service handles hosting, scaling, etc. all for you.

Azure Functions and AWS Lambda are the two most well known examples of Serverless services. I decided to play around with Azure Functions because they have specific support for GitHub Webhooks. GitHub Webhooks and Azure Functions go together like Bitters and Bourbon. Mmmm, I’ll be right back.

Follow these instructions to set up an Azure Function inside of the Azure Portal that responds to a GitHub webhook in no time. The result is a method with a signature like this.

The shape of the data is determined by the event type that the webhook subscribes to. For example, if you subscribe to issue comments like I did, the payload represented by data is the IssueCommentEvent.

In my example, we use a dynamic type for ease and convenience (but at the risk of correctness). However, you can deserialize the response into a strongly typed class. The Octokit.net library provides such classes. For example, I could deserialize the request body to an instance of IssueCommentPayload.

Analyzing Sentiment

The next step is to write code to analyze sentiment. But how do I do that? A naïve approach would search for my favorite colorful words in the text. A more sophisticated approach is to use something like Microsoft’s Cognitive Services. They have a Text Analytics API you can use for analyzing sentiment.

And of course, there’s a NuGet package for that.

Install-Package Microsoft.Azure.CognitiveServices.Language

I installed the package, wrote a bit of code, and had the sentiment analysis working in short order. The API returns a score between 0 and 1. Scores close to 0 are negative. Close to 1 are positive.

Updating the comment

Now that all the sentiments are determined, let’s do something with that information. For the sake of this proof of concept, I will update overly negative comments with a little reminder to keep it positive. After all, we know how much humans enjoy being chided by a software robot. Again, I want to reiterate that I wouldn’t use this for a real repository. I’d probably just flag the comment for a human to follow-up.

I will also update positive comments with a nice thank you for keeping it positive. Gotta reward the nice people from time to time.

In order to update the comment, I’ll use Octokit.net! Once again, NuGet to the rescue.

Also, I don’t want to pay a lot of money for this demo, so it might fail in the future if my trial of the text analysis service runs out.

Future Ideas

My goal in this post is to show you how easy it is to build a GitHub Webhook using Azure Functions. I haven’t tried it with AWS Lambda. I hope it’s just as easy. If you try it, let me know how it goes!

The possibilities here are legion. With this approach, you can build all sorts of extensions that make GitHub fit into your workflows. For example, you may want to flag first time issue commenters. Or you may want to run static analysis on PRs. All of that is easy to build!

But before you get too wild with this, note that there are a lot of GitHub integrations out there that might already do what you need. For example, the Probot project has a showcase of interesting apps that range from managing stale issues to enforcing GPG signatures on pull requests. There’s even a sentiment bot in there!

Probot apps are NodeJS apps that can respond to webhooks. I believe they require you host an application, but I haven’t tried to see if they’re easy to run in a Serverless environment yet. That could be fun to try.