Tuesday, February 28, 2012

Two weeks ago, along with the Fedora Developer's Conference in Brno, Czech Republic, a couple of logging and auditing folks from Red Hat, Balabit (syslog-ng), the MITRE Corporation, and Adiscon (me) stuck their heads together to talk about the future of structured logging. It quickly became clear that extending syslog in the CEE spirit is the right thing to do.

We observed that almost all technology is present to provide a rich framework to support structured logging. Actually, both syslog-ng and rsyslog provide the necessary plubming since long (for, example, as part of the RFC5424 effort), but that functionality is relatively seldom explored actively by other developers. A core problem in that regard is that most applications rely on the good old syslog() API, which does not provide structured logging by itself. Also, there is no common log storage database available, which tools could be based on.

In order to evolve syslog, we defined a three-layer architecture, with applications and logging libraries/APIs being the top layer, the syslogd the middle layer and the datastore the bottom layer. Multiple APIs must be supported as noone can expect projects to change their existing logging infrastructure. Also, existing frameworks like log4j or log4j and even glibc's syslog() will stay around for a while longer. New libraries (like ELAPI) will probably become more dominant for new applications. So how to tie these different libraries to the syslogd subsystem (the second layer)?

The solution is rather simple: we use what we already achieved in CEE and support cee-enhanced syslog on the system log socket. The core idea is very simple: we use the regular syslog message part, but include JSON-encoded structured data with it. To signify to the syslog system that this is actually cee-enhanced, a cookie string ("@cee:") is used in front of the JSON data. It is then easy to decide for the syslogd which message format it deals with: if the cookie is present and the rest of the message is a valid JSON representation, the message is cee-enhanced. If one of the two conditions fails, it is traditional syslog. As both conditions are checked together, it is highly unlikely that a legacy syslog message will ever fit into that criteria (and if it really does, nothing is lost: after all, the syslogd has correctly understood that format). It must be noted that the necessary parsing and internal plumbin is available both in syslog-ng as well as rsyslog (I committed the missing JSON parser, held back awaiting a more final CEE standard, yesterday).

The interface to the log database layer is currently not as well defined and needs to be worked on. Note that both syslog-ng and rsyslog support multiple datastores, so there already exist solutions. The group as whole was of the opinion that some unified API for a log data store would be useful and something that should be looked at as a longer-term target.

After reaching this rough consensus, we were delighted to see that most of the base technology is already and place and just needs to be tied correctly together. It is more an effort of doing detail implementations and documenting the various pieces (and how they work exactly together) than creating a totally new system (aka "can be quickly done"). We agreed that it probably is best to reach for the low-hanging fruit first: get structured logging integrated first, then do the other steps. So an initial milestone will be making sure cee-enhanced syslog is supported by all of the subsystem and only after this is done reach for the other things.

One of these next things definitely is a dictionary of field names (and exact structure) to be used to describe events in a standard way (for example a logon event). While the whole effort is highly inspired by CEE, it probably is best to try out initial efforts outside of the formal CEE framework. That will enable rapid development, discussion and the capability to check what works in practice. The experience gained in such PoC can than be feed back to the formal CEE process (along the old IETF mantra "running code and rough consensus first").

We agreed that such an effort is best be done in a tranparent and flexible open source process. With that, project lumberjack was born: an effort to provide better structured logging for Linux, being supported by many major players in that arena. We agreed that it would be a good idea if Red Hat provided some of the project infrastructure. This is why you find project lumberjack now at fedorahosted.org (note that the project will probably contain mostly specs and less code, which is kept in the individual project's repositories).

Tuesday, February 21, 2012

Strucuted logging is cool. A couple of month ago, I added support for log normalization and the 0.5 draft CEE standard to rsyslog. At last weeks Fedora Developer's Conference, there was a huge agreement that CEE-like JSON is a great way to enhance syslog logging. To follow up on this concept, I have integrated a JSON decoder into libee, so that it can now decode JSON with a single method call. It's a proof of concept, and for serious use performance optimization needs to be done. Besides that, it's already quite solid.

Also, I just added the mmjsonparse message modification module to rsyslog (available now in git master branch!). It checks if the message contains an "@JSON: " cookie and, if so, tries to parse the resulting string as JSON. If that succeeds, we obviously have a JSON-enhanced message and the individual name/value pairs are stored and can be used both in filters and output templates. This provides some really great opportunities when it comes to processing the structured data. Just think about RESTful interfaces and such!

Right now, everything is at proof of concept level, but works well enough for you to try it. I'll smoothen some edges but will release the versions rather soon. Probably the biggest drawback is that the JSON processor currently flattens the event, with structure being conveyed via field names. That means if you have a JSON object "SUPER" containing a number of fields "field1" to "fieldn", the current implementation will be a single level and the names are "SUPER.field1",... I did this in order to have a quick solution and one that fits into the existing framework. I'll work on creating real structure soon. It's not really hard, but I probably do some other PoCs first ;)

I considered several approaches, among them moving over to libcollection (part of ding-libs) or a pure JSON parser. The more I worked with the code, the more it turned out that libee already has a lot of the necessary plumbing and could simply been enhanced/modified under the hood. The big plus in that approach is that is immediately plugs in into rsyslog and the other solutions that already built on it. This even enables to use the new functionality in the v6 context (I originally thought I'd need to move on to rsyslog v7 for the name-value pair changes). Now that I have written mmjsonparse, this really seems to work out. No engine change was required, and I expect little need for change even for the final version. As such, I'll proceed in that direction. Actually, what I now use is kind of a hybrid approch: I use a lot of philosophy of libcollection, which showed me the right route to take. Then, I use cJSON, which is a really nice JSON parser. In the proof of concept, I use both cJSON's object model and libee's own. I expect to merge them, actually tightly integrating cJSON. The reason is that CEE has evolved quite a bit in the mean time, and many complex constructs are no longer required. As such, I can streamline the library as well, what not only reduces complexity but speeds up the whole process.

I would like to express my sincere thank to Dmitri Pal, Keith Robertson and Bill Heinbockel, which provided great advise and excellent discussion. And the best is that this part of the effort is just the beginning... Stay tuned for more!

Wednesday, February 01, 2012

Anyone out there having problems receiving Java exceptions via syslog? If so, please let me know. We have begun to look into log4j and potential formatting problems. However, we need someone who is really bugged by this in order to see real-world cases so that we can think about real-world cures. So if you have issues, please let me know.