The Problem

Data was going in to my logging server and getting mangled somewhere along the line. To complicate matters, only the windows hosts were affected, and even then it was sporadic. The truncated data was in the middle of the string, which left me to believe logstash was trying (and failing) to parse it.

The Culprit

With the help up my trusty sounding board Will, “Logstash guru” thegreenrobot, and nxlog dev b0ti, I was able to figure out the root cause- the problem wasn’t with Logstash at all, it was broken when rsyslog handed it to logstash. Complicating matters was when rsyslog was munging it- when it wrote the entries to a temp file, they were fine, but when it passed them to logstash, it truncated them on the way out the door.

Why? Because they’re invalid in the first place!

NXLog (the logging forwarder on windows) was collecting eventlog data and shipping it out with the full program names despite it being too long. That’s right- Section 4.1.3 of RTF3164 states:

This has traditionally been a freeform message that
gives some detailed information of the event. The TAG is a string of
ABNF alphanumeric characters that MUST NOT exceed 32 characters.

While you could (and I still do) consider this a bug in nxlog, b0ti pointed out that fixing this bug could potentially break many stable implementations to fix an edge case. While I don’t fault B0ti for not fixing it, I still needed to work around it.

The Fix

We need to shrink down that program name to 32 characters or less before it leaves nxlog… but there’s a catch.

It’s not *just* the program name, but the PID as well that gets truncated. So we have to adjust accordingly. The msdn documentation says the PID is a DWORD, which turns out to be an unsigned 32-bit integer, which can potentially be 10 characters. Add on another two for brackets and one for a colon and you’re already losing potentially 13 characters of the 32 characters allowed.

To help stretch our remaining 19 characters, lets see what we can axe. Most of the Microsoft processes either begin with “Microsoft_” or “Microsoft-Windows-“, which we honestly don’t need on the program name. By removing those, we get down to the nitty gritty of the actual program name.

Since it may STILL be too long (I’m looking at you Microsoft-Windows-TerminalServices-RemoteConnectionManager[2272]:), we need to strip those off the front, THEN take the 19 character substring of what remains. The result is an eventlog input in nxlog that looks something like this: