Application errors monitoring with HipChat and Bash

The classical support flow – an error occurs, a user opens a ticket / phones support, company investigates, a fix is put in place, ticket is updated / user is called back – is very reactive. Here’s how we made it a bit proactive with HipChat and Bash for free to increase user satisfaction.

The Story behind

At one of the startups I’m involved with, almost an year ago we launched our Alpha product and invited some users to pilot it. We’ve put some basic monitoring to check if all our apps (as we use micro services) are up and running. We’ve integrated a cloud support service with a help bubble at the bottom right corner of our product and there we go.

Without any surprise, errors started popping up in the application logs every now and then, while our pilots were emailing and calling us to complain about them. As result we open an internal ticket, investigate, fix and get back to them with a solution. Sounds familiar, right? Like the classic support model you can see in almost every company – very reactive. At the end of the day most of the errors were caused by invalid user data to which we didn’t provide proper error messages or just required a quick db update – things that take us seconds but causing frustration to our users. Then we asked ourselves “Can we be proactive and turn that flow the other way around?”. An error occurs, we get notified, investigate, fix and call the user with something like “Hey, we have noticed you just had a problem and we are calling to help you / let you know we’ve fixed it.”

Our Solution

There are many application monitoring tools out there that claim to have easy integration but at the time we needed something the idea for HipChat popped up and we didn’t even go into evaluating any other products because that solution proved to be very simple and working for us.

We use HipChat for team communication as it integrates well with the other Atlassian products we use. So the idea is to tie up the application logs with HipChat, so when an error occurs, our team will get notified as soon as possible. Once we know that something has happened it’s very easy to investigate and take an action to resolve it. There’s a dedicated person that contacts users and in most of the cases issues get resolved in a few minutes, giving sense of fulfilment to us and building trust with our users.

Before we start here is an inventory list that will be needed

Source of errors – Application error logs

Access to application logs

Error detector – bash script (code provided below)

Task scheduler – Crontab on Unix

Error reporter – HipChat Integration

Admin access to a HipChat room

Humans to solve the errors and call the users

Let’s do it

Error logs

Let’s assume we have two applications running in Unix environment storing their log files in /var/log/mycompany. The full path to their files is:

/var/log/mycomapny/app1.log
/var/log/mycompany/app2.log

We want to get notified only about Warnings and Errors and here’s how they look like in the logs:

Error detector

The next step is to detect when something that we want to get notified about happens. For that purpose, we use a bash script (detector) that goes through all log records generated in the last 10 minutes and searches for ones, that we are interested in. The run the detector every 10 minutes, we use the default task scheduler of Unix – Crontab – with the following setup:

The most important thing here is the pattern used to detect if a log record is of an interest to us, so change lines 43 and 48 accordingly. Together these two patterns will match the log lines given in Error Logs section above.

Copy the code, open your favourite editor, put it in and save it as i.e. – ~/bin/detector.sh
Don’t forget to make it executable as it will not run – chmod 755 ~/bin/detector.sh

HipChat setup

The next step is to setup HipChat, so that detector can access it through the API.
I would strongly recommend to create a new room that is dedicated to Errors only, so when a new notification comes everyone knows that something has happened and an action is needed now.

Room setup

For this you will need admin access to HipChat or to a specific room.

Go to hipchat.com (or your domain if hosted) and login.

Then go to Rooms.

Create or Select a Room.

Select Integrations from the menu

Select the first option – Build Your Own integration

Name your integration – i.e. Error reporting – and Save

The next page should show something like this:

Take the url from the first box and put it inside your detector script – line 35

Let’s test it

put a fake error line at the end of your log file and run detector manually.

That’s it

Happy Error hunting!

Presented live at Sofia Atlassian User Group meet up, 5th of April 2017