Rapid7 Blog

The Role of Log Analysis in Our Technical Transformation

POST STATS:

SHARE

Hello, Logentries readers – I’m the VP of Technology at Motus, a SaaS company for mobile workforce management, headquartered in Boston. Motus has undergone a huge technical transformation in the last 18 months, and logging has been a big part of that transformation. I wanted to share with you some of our experiences and where we see the future of log management and analysis going here at Motus.

Where We Started…

18 months ago, Motus was hosted on three physical servers, and log management consisted of a terminal window, grep, and a lot of regular expressions. It wasn’t pretty, but it worked – mostly. Our infrastructure team could usually find things when something actively went wrong, but noticing problems through logs never happened. We had tried to use some on-premise log management tools, but they were either too expensive, too limited, or too complicated to make sense.

The real impetus for change came when we started on our migration to a services platform on virtual servers. Suddenly, a window full of terminal screens all tailing log files became impossible to manage. We decided to move to a cloud-based log aggregation service for several reasons:

Manageability – we needed a tool that would help us collect, organize, and search through log data coming from multiple environments

Outsourcing – we did not want to manage the storage, compute, or software needed to deal with the log data

Easy to use – this was a big one. We did not want to have to teach the whole team a complex search language.

After some prototyping and testing, we selected Logentries as our partner and moved on to…

Log Centralization, v1

Because we had a lot of legacy code and infrastructure, we chose to use the Logentries agent to monitor our existing log files. This had the advantage of not requiring us to change code or configuration in our applications – a big help when we were already doing a lot of changes. We started collecting log data and almost immediately were able to save time and effort when tracking down production issues.

One early win that we got from using Logentries was in helping us trac
k down an intermittent network issue at our hosting provider. We would see very irregular “storms” of packet loss across our environment, but could never pin it down to any sort of app traffic on our side. We set up alerts in Logentries to help us correlate them, and we added our hosting provider’s networking team to the alerts. Based on the times and durations of the alerts, they were able to track down the issue to a completely unrelated product that was somehow leaking through to our environment. Having the ability to define granular alerts and push that information to our hosting provider was instrumental in figuring out the mystery.

As we grew our services platform, however, it became clear that the way that we organized our logs wasn’t really scalable. Because we used the agent, we had individual hosts in Logentries for each virtual server and each log file. This quickly grew to be very cumbersome when trying to track down issues that crossed servers or applications.

In addition, we were moving very quickly from a virtual server model to a container model using Docker and Apache Mesos, so creating new hosts for all of those was out of the question. That led us to…

Log centralization, v2

The model that we’ve decided upon for our infrastructure is one log “host” at Logentries per application/environment. We have a userservice-staging log that accumulates logs from all the user service Docker containers running in the staging environment. Each log line contains the host on which the Docker container is running and the container ID, to aid developers and infrastructure teams to quickly identify the source of any problems. This approach vastly simplifies navigation through Logentries and doesn’t require the development team to know on which hosts each application is deployed.

We implemented this system in a couple of different ways:

For our legacy applications, we use a custom rsyslog configuration that uses an environment variable to provide each application’s Logentries token and forward the log data to Logentries. The container starts rsyslogd at startup and either monitors files or forwards syslog data directly.

For our new service applications, we configure both Tomcat and our application to use the Logentries log4j appender to send log data directly to Logentries. We have a startup script that reads the hostname (Docker container ID) and an environment variable containing the Docker host and modifies the log4j pattern to insert those before the application starts up.

This model has been a huge productivity win. The dev teams are much more easily able to visualize and manage the log data in one view, and alerts are easier to set up as well.

The Future

As our use of containers grows, having some sort of native solution is increasingly important. Logentries has recently announced their Docker logging solution, and it’s something that we’re actively looking at right now. We look forward to partnering with them as we continue to evolve our infrastructure and platform.