Thursday, February 19, 2009

Now I join those mass of people who are asking for log samples. But I do for a good reason :) Also, I do not need a lot, a single log message works well for my needs. I need them to improve rsyslog so that the parser can even better handle exotic message formats. So the short story is if you have a syslog message, please provide it to me.

And here is the long story:

One of the strength of rsyslog is that it is very much focused on standards. That also means it tries to parse syslog messages according to the relevant RFCs. Unfortunately, syslog has been standardized only recently and so there is no real standard for what to expect inside the header. So rsyslog strength is also its weakness: if messages are ill-formed, results are often suboptimal.

I am working around this by doing smart guesswork inside the legacy syslog parser. However, every now and then some folks pop up with problems. And, more importantly, some others do not even ask. On my twitter account, I recently saw one such frustration. In that case, timestamps were duplicated. I guess that was caused by something unexpected inside the timestamp. However, I was not able to get down to the real problem, because I did not have access to the raw message. That's an important point: I need the raw message content, not what happens to usually be in the logfile. The later is already parsed, processed and recombined, so it does not tell me what the actual message is. But I need the actual message to improve the parser.

What I would like to do is create a very broad test suite with a vast amount of real-life syslog formats. The message text itself is actually not so important to me at this stage. It is the header format. If I get this, I'd like to analyze the different ways in which the format is malformed and then try to find ways to implement it inside the parser. If I find out that I can not detect the right format in all cases automatically, I may find ways to configure the different formats. The end result, I hope, will be far more plug-and-play message detection, something that should be of great benefit for all users.

Please contribute your logs! I need logs from many different devices, with many different versions. But I need only a few lines from each one. For each individual contributor, there is not a lot of effort required. Even a single log line would be great (ten or so be even greater). Just please don't mangle the logs and provide me with raw log messages. That's probably the hardest part. One way to do it is to sniff them off the wire, for example with WireShark. Another way is to use rsyslog itself. All you need is a special template and an output file using it:

$template rawmsg,"%rawmsg%\n"*.* /path/to/raw-file.log

Add this to your rsyslog.conf, restart rsyslog, make the device emit a few lines and mail me the result to rgerhards@gmail.com. You may also simply post the log sample to the sample log thread on the rsyslog forum - whatever you prefer. After you have done that, you can remove the lines from rsyslog.conf again. Before you mail me, it is a good idea to check if there is any sensitive information inside the log file. Feel free to delete any lines you have, but I would appreciate if you do not modify line contents. Also, it would be useful for me if you let me know which device, vendor and version produced the log.

I hope that you can help me improve the rsyslog parser even more. Besides, it will probably be a very interesting experiment to see how different syslog messages really are.

Monday, February 16, 2009

Finally having a new stable Debian is very good news in itself - congrats, Debian team. You work is much appreciated!

But this time, this was even better news for me. Have a look at the detail release notes and you know why: Debian now comes with a new syslogd, finally replacing sysklogd. And, guess what - rsyslog is the deamon of choice! So it is time to celebrate for the rsyslog community, too.

There were a couple of good reasons for Debian to switch to rsyslog. Among others, an "active upstream" was part of the sucess, thanks for that, folks (though I tend to think that after the more or less unmaintained sysklogd package it took not much to be considered "active and responsive" ;)).

Special thanks go to Michael Biebl, who worked really hard to make rsyslog available on Debian. It is one thing to write a great syslogd, it is a totally different one to integrate it into an distro's infrastructure. Michael has done a tremendous job, and I think this is his success at least as much as it mine. He is very eager to do all the details right and has provided excellent advise to me very often. Michael, thanks for all of this and I hope you'll share a virtual bottle of Champagne with me ;)

Also, the rsyslog community needs sincere thanks. Without folks that spread word and help others get rsyslog going this project wouldn't see the success it experiences today.

I am very happy to have rsyslog now running by default on Fedora and Debian, as well as a myriad of derivates. Thanks to everyone who helped made this happen. So on to a nice, little celebration!

Tuesday, February 10, 2009

A couple of days ago, I created a rsyslog group on LinkedIn. Then I was curios what happened. Well, nothing. Nothing at all. So I thought it was probably not the right time for such a thing.

And, surprise, surprise, I today browsed through LinkedIn and saw there were 16 join requests. Oops... there seem to be no email notifications for them. Bad... Well, I approved all folks. If you were one of them and now read this blog post: please accept my apologies! Obviously, this was just another time I screwed up on the Internet...

To prevent any further such incidents, I have now set the group to automatically approve everyone who is interested in joining. That's great for this type of group, actually I am happy for everyone who comes along ;)

Friday, February 06, 2009

I had an interesting question on the rsyslog mailing list that boils down to when rsyslog closes output files. So I thought I talk a bit about it in my blog, too.What we need to look at is when a file is closed. It is closed when there is need to. So, when is there need? There are currently three cases where need arises

I think a) needs no further explanation. Case b) should also be self-explanatory: if an output channel is set to a maximum size, and that size is reached, the file is closed and a new one re-opened. So for the time being let's focus on case c):

I simplified a bit. Actually, the file is not closed immediately when the file name changes. The file is kept open, in a kind of cache. So when the very same file name is used again, the file descriptor is taken from the cache and there is no need to call open and close APIs (very time consuming). The usual case is that something like HOSTNAME or TAG is used in dynamic filename generation. In these cases, it is quite common that a small set of different filenames is written to. So with the cache logic, we can ensure that we have good performance no matter in what order messages come in (generally, they appear random and thus there is a large probability that the next message will go to a different file on a sufficiently busy system). A file is actually closed only if the cache runs out of space (or cases a) or b) above happen).

Let's look at how this works. We have the following message sequence:

Host MsgA M1A M2B MaA M3B Mb

and we have a filename template, for simplicity, that consists of only %HOSTNAME%. What now happens is that with the first message the file "A" is opened. Obviously, messages M1 and M2 are written to file "A". Now, Ma comes in from host B. If the name is newly evaluated, Ma is written to file B. Then, M3 again to file A and Mb to file B.

As you can see, the messages are put into the right files, and these files are only opened once. So far, they have not been closed (and will not until either a) happens), because we have just two file descriptors and those can easily be kept in cache (the current default for the cache size, I think, 100).

Thursday, February 05, 2009

Today, I thought hard about the reliable plain TCP syslog issue. Remeber? I have ranted numerous times on why "plain tcp syslog is not reliable" (this link points to the initial entry), and I have shown that by design it is not possible to build a 100% reliable logging system without application level acks.

However, it hit me during my morning shower (when else?) that we can at least reduce the issue we have with the plain TCP syslog protocol. At the core of the issue is the local TCP stack's send buffer. It enhances performance but also causes our app to not know exactly what has been transmitted and what not. The larger the send buffer, the larger our "window of uncertainty" (WoU) about which messages made it to the remote end. So if we are prepared to sacrifice some performance, we can shrink this WoU. And we can simply do that by shrinking the send buffer. It's so simple that I wonder a shower was required...

In any case, I'll follow that route in rsyslog in the next days. But please don't get me wrong: plain TCP syslog will not be reliable if the idea works. It will just be less unreliable - but much less ;)

Wednesday, February 04, 2009

Adiscon, my company, has just released EventConsolidator, an easy-to-use tool for Windows event log consolidation targeted to the small business market. Unlike our full-blown EventReporter and MonitorWare Agent products, this is a purely agentless solution to monitor the standard Windows Event Log files. Also, it does not (yet) store any events in a central repository but rather works on the native event logs of the Windows machines it monitors.

The tool comes with basic display and searching abilities and some preconfigured reports. This is Adiscon's first move into that market segment. I am very interested to see which feedback we get from that tool. We are very open to all customer suggestions. I have to admit that I argued it may be better to do the essentials first and then look for what people really need rather than to build a complex one-size-fits all approach. So it will be interesting, at least for me, to see if that thought works out. Just for the records, EventConsolidator is commercial software, but a full featured free trial is available form the EventConsolidator site.

Monday, February 02, 2009

I was introduced to a very subtle effect of the Heartland breach. Remember, card processor Heartland has screwed up and, as some sources say, 100 million credit card numbers were stolen from them via a Trojan. That fact spread big news and, among others, started a discussion if PCI has been proven to be useless. But there seem to be additional effects: US customers seem to have lost a lot of credibility in international shopping.

In Adiscon's office, I heard today that we got a call from one of our card processors. Keep in mind that we are based in Germany. The card processor inquired about a recent transaction and asked us to check whether this could be credit card fraud. It was not, but he left us his phone number so that we could check with him in the future when we suspected fraud on transactions.

This is quite unusual and immediately drew my attention. I gave that guy a call. He explained that they are routinely checking US credit card transactions because some problems have been seen recently with US cards. He explained to me that the processor would like to protect merchants, because "if you ship the goods and the cardholder protests the charge ... weeks later ... you will be charged back but unable to recover the goods" (good point, btw). So I came up and asked if they were calling because of the Heartland breach. Not only, he said, but that would be an example (I deciphered this as a "yes"). So then I asked if they had not blacklisted the affected card numbers. Some statements followed, which I deciphered to mean "no". So the cards are still active and seem to cause issues (why else would a card processor begin to call its merchants?).

I know that heartland does not know exactly which card numbers have been stolen. But it is known that most probably any card processed within the past 10 month is highly suspect. So wouldn't it have been fair security practice to put these cards on the blacklist and issue new ones to the cardholders? Sure, that would be inconvenient (read: costly) and, probably more important, would have shown to everyone that someone has screwed up, but would that not be much better than putting both consumer and vendors at risk? Without an automatic blacklisting, consumers need to pay much more attention to their credit card bill.

An interesting side-effect is that US customers seem to have lost credit outside of the US. For example, it was suggested to me that we check each US-order in depth before delivering anything. If everyone else gets this advise, US customer's will probably find shopping overseas quite inconvenient...

If you loose your credit card, you are legally required to call your card issuer and report that loss. As long as you do not notify them, you are liable. If, on the other hand, someone in the card industry looses your card (number), nobody seems to be liable: Customers must check their statements and vendors must do in-depth checks (sigh) on their customers. Is this really good practice?

And what if a card is used to commit credit card fraud? No problem at all (for the card industry): either the cardholder will not notice it (and pay the fraud) or the cardholder protests the charge, in which case the merchant needs to pay. The later case involves some manual processing by the card industry: again, no problem! The merchant is charged a hefty protest fee. Looking at how hefty the fee is, it seems to be even profitable for the card industry if it takes that route.

Bottom line: who is responsible? Card industry (Heartland in this case). Who pays? Everyone else! Isn't that a nice business model? Where is the motivation to keep such a system really secure?

I think that really questions if the card industry is interested in security. PCI may not have failed (I tend to agree to Anton Chuvakin here). But it smells a bit like PCI and whetever else efforts can not succeed, because they are not deployed in a honestly security-aware environment but rather in one that needs good execuses for sloppy security. As long as the card industry does not do the right thing as soon as it costs the card industrie's money, real security can not be achieved.

I have now begun to roll out the rsyslog race patches. Before the weekend, I rolled out the patch for the debian_lenny and development branches and today the beta branch followed. I am now looking forward for feedback from the field. The patch for v3-stable is ready (and available via git), but I'd like to get at least a bit more feedback before I do another stable release.