Codeless Sentiment Analysis of HackerNews

This posts reports on a experiment with Node-RED (a project supported by the JS Foundation), using its visual data flow editor to fetch HN posts, filter those with positive words in the comments, and tweet them on @HackerGoodNews.

The flow

I recently came across Node-RED and wondered how good it would be as a tool to automate a few things I find more and more annoying to do manually. A friend of mine challenged me to use it to tweet “positive” posts from Hacker News (mostly because there was a sentiment analysis node as part of the built-in set of nodes). Extra points for making it without code (even though there is a function node to write arbitrary javascript to process messages).

Here is the result:

It works as follows:

Parse HN's RSS feed creates a new message (Node-RED’s unit of processing) every time a new post appears in HN’s RSS feed.

limit 5 msg/s ensures that no more that no more than 5 messages per second go through the rest of the flow. HackerNews seems to dislike too frequent requests, as made in the comment fetching node

delay 1 hour gives some time for the comments to appear

fetch comments requests the comment web page of the post

keep text only extracts the comments from the web page

join merges all the comments into one text

sentiment analyses the text

positive only lets only messages with a positive score through

title > 120 chars ? switches messages depending on whether the title of the post is longer than 120 characters

truncate title shortens the long title

long title produces a tweet with the shortened title (adding suspension mark) and the url of the post

short title simply produces a tweet with the title and the url of the post

The sentiment analysis is very simple : it uses the AFINN word list where words or phrases have been manually evaluated between -5 and 5. For example, “you’re a terrific fascist” has a (positive) score of 2 points, because terific is worth 4, and fascist -2… Well, I said it’s simple, not perfect (and to be honest, I browsed the list for a while to find this example ;) )

As a more honest example, here are the 5 most negative posts of last week-end:

Conclusion

Node-RED makes it really easy to build a tweeting bot that ignores sexual harassment and nuclear disasters without even coding (of course, that relies on a sane community commenting the news).

The only shortcoming I found concerns the feed parser which does not remember what was already processed. For example, if I were to restart my Node-RED instance, all the positive posts of the home page would be sent to twitter. That’s not a problem since Twitter detects and rejects duplicate statuses, but it would be more annoying if the messages were sent by email.

However, another (and maybe the biggest) strength of Node-RED is how easy it is to extend it by writing new nodes (with code, this time). That brings two solutions to my “problem”: either use one of the many existing nodes to hack some persistence into the feed parser (e.g. with a database node or a file node), or write my own feed parser. Good times ahead ! :)