Lumberjack – Log file parsing and analysis for Clojure

I have just pushed a 0.1.0 version of a new project called Lumberjack. The goal is to be a library of functions to help parse and analyze log files in Clojure.

At work I have to occasionally pull down log files and do some visualization of log files from our Nginx webservers. I decided that this could be a useful project to play with to help me on my journey with Clojure and Open Source Software.

This library will read in a set of Nginx log files from a sequence, and parse them to a structure to be able to analyze them. It currently also provides functionality to be able to visualize the data as a set of time series graphs using Incanter, as that is currently the only graphing library I have seen so far.

A short future list of things I would like to be able to support that come to mind very quickly, and not at all comprehensive:

Update to support use of BufferedReader for very long log files so the whole file does not have to reside in memory before parsing, and take advantage of lazyness.

The ability to only construct records with a subset of the parsed data, such as request type, and timestamp.

The ability to parse log lines of different types, e.g. Apache, IIS or other formats

Additional graphs other than time series, e.g. bar graphs to show number of hits based off of IP Address.

Possibility of using futures, or another concurrency mechanism, to do some of the parsing and transformation of log lines into the data structures when working on large log files.

The above are just some of my thoughts on things that might fit well as updates to this as I start to use this more and flush out more use cases.

I would love comments on my code, and any other feedback that you may have. This is still early but I wanted to put something out there that might be of some use to others as well.

Software Developer in the Dallas/Fort Worth Metroplex. Developed software using Java, Microsoft .NET stack, Ruby, Clojure and Erlang; but enjoy playing with and learning about different technologies to find different ways of doing things.
Host of the podcast Functional Geekery (http://www.functionalgeekery.com/) and founder of DFW Erlang user group (http://www.meetup.com/DFW-Erlang-User-Group/).