December 16, 2011

Logging is messy. Ever have logs fill up your disk and crash services as a result? Me too, and that sucks.

You can solve the disk-filling problem by rotation logs and expiring old ones, but there's a better solution that solves more problems: ship your logs somewhere else. Shipping logs somewhere centralized helps you more quickly access those logs later when you need to debug or do analytics.

There are plenty of tools in this area to help you solve log transport problems. Common syslog servers like rsyslog and syslog-ng are useful if syslog is your preferred transport. Other tools like Apache Flume, Facebook's Scribe, and logstash provide an infrastructure for reliable and robust log transport. Many tools that help solve log transportation problems also solve other problems, for example, rsyslog can do more than simply moving a log event to another server.

Starting with Log Files

For all of these systems, one pleasant feature is that in most cases, you don't need to make any application-level changes to start shipping your logs elsewhere: If you already log to files, these tools can read those files and ship them out to your central log repository.

Files are a great common ground. Can you 'tail -F' to read your logs? Perfect.

Even rsyslog and syslog-ng, while generally a syslog server, can both follow files and stream out logs as they are written to disk. In rsyslog, you use the imfile module. In syslog-ng, you use the file() driver. In Flume, you use the tail() source. In logstash, you use the file input plugin.

Filtering Logs

Most of the tools mentioned here support some kind of filtering, whether it's dropping certain logs or modifying them in-flight.

Logstash, for example supports dropping events matched by a certain pattern, parsing events into a structured piece of data like JSON, normalizing timestamps, and figuring out what events are single-line and what events are multi-line (like java stack traces). Flume lets you do similar filter behaviors in decorators

In rsyslog, you can use filter conditions and templates to selectively drop and modify events before they are output. Similarly, in syslog-ng, filters let you drop events and templates let you reshape the output event.

Final Destination

Where are you putting logs?

You could put them on a large disk server for backups and archival, but logs have valuable data in them and are worth mining.

Recall Sysadvent Day 10 which covered how to analyze logs stored in S3 using Pig on Amazon EC2. "Logs stored in S3" - how do you get your logs into S3? Flume supports S3 out of the box allowing you to ship your logs up to Amazon for later processing. Check out this blog post for an example of doing exactly this.

If you're looking for awesome log analytics and debugging, there are a few tools out there to help you do that without strong learning curves. Some open source tools include Graylog2 and logstash are both popular and have active communities. Hadoop's Hive and Pig can help, but may have slightly steeper learning curves. If you're looking for a hosted log searching service, there's papertrail. Hosted options also vary in features and scope; for example, Airbrake (previously called 'hoptoad') focuses on helping you analyze logged errors.

And then?

Companies like Splunk have figured out that there is money to be made from your logs, and web advertising companies log everything because logs are money, so don't just treat your logs like they're a painful artifact that can only be managed with aggressive log rotation policies.

Centralize your logs somewhere and build some tools around them. You'll get faster at debugging problems and be able to better answer business and operations questions.

Further Reading

Log4j has a cool feature called MDC and NDC that lets you log more than just a text message.