system logs – analysis (with Splunk)

To recap, a useful system logging solution consists of four components: generation, transport, storage and analysis.

I will argue if you already have any logs at all, that your first step should be to build an analysis capability. This will let you begin to analyze the logs you already have, become familiar with your analysis tool on a smaller dataset and use the analysis tool to help debug any problems that you encounter while building the rest of the system.

I’ve been a big Splunk fan for years. The Splunk folks understand system and network administration and that shows in the design and capabilities of the product. The free “home” license is a great contribution to the community, too.

There is a lot of good documentation out there on getting started with Splunk, so I’ll focus on what it allowed me to find instead of the details of using it. I encourage you to experiment and try different kinds of searches, you’ll be surprised at what you find.

After starting Splunk, I pointed it at my /var/log directory, which has all the usual system logs, and also all my Apache logs. Splunk indexed about 2 million log events in less than 8 minutes, on my low-power Atom CPU with only 2G RAM and a single 150G IDE laptop disk.

In the 30 minutes or so, I found (all on a single host, all in the last 30 days)