Blog

There are three log files that are scanned for attacks, the system log, the mail log and the amavis log. They all have different formats, but they all contain the IP address of the machine contacting our server in the log line of the failed access attempt. The program looks for a different pattern depending on the file it as accessing.

We’ll start with the system log file. The common break-in attempts look like this in the system.log file.

The top line is an attempt to access through VNC, the screen sharing program built-in to the server. The bottom line is someone trying to login to the system using ssh with a commonly used account name (Oracle is the name of a well known database application).

As different as each of those lines look, they contain a pair of common elements, namely the datetime and the IP address. The program scans lines using a regular expression (regex) to identify these attempts. If the regex matches the line, the IP address is captured at the same time like this for the bottom line, along with an actual line it would match:

What that says is look for a the words “Invalid user”, followed by a space, any string, a space, the word “from” and a set of 4 numbers separated by dots. The “.*” portion equates to any string, which in this case is the username they provided to try and break into the system. The 4 digits separated by dots are commonly known as a dotted-quad, which is the IP address the attempt originated from. And since the regex for the dotted-quad is enclosed in parentheses, that tells the program if the line matches the regex, store the dotted quad in a temporary variable.

The regex used to match the VNC line is a little more complicated, but not by much. You see the same elements where we are looking for specific words, any string, dotted quad, and more strings and specific words. Note the dotted quad is in parentheses again, so it can be stored in a temporary variable.

I also mentioned the datetime being one of the common elements in the line. That is always at the beginning of the line. There is a tiny trade-off in using a slightly more complicated regex to capture the datetime and the IP address vs. just capturing the IP address. Without getting into details, I prefer simpler regexes to match the portions of a line that vary and using a template to match portions of a line that are the same. Since the datetime is always at the beginning of the line and I only care about it if the regex is matched, it’s done as a separate step. This is just my preference, not a required way to parse the line. The template I use for matching the datetime is:

$template = "A3 x A2 x A2 x A2 x A2";
Jan 30 01:24:20

That template reads as, match 3 characters (A3), ignore one character (x), match two, ignore one, match two, ignore one, match two, ignore one, match two. So, anytime I match a regex, I will then use a template to grab the datetime. The year is missing in the log file, so we set it based on what day we are processing and finesse it when we are at year boundaries in December and January. More on that for another post.

The above info is for the system log, so let’s address the mail.log very briefly. Here is a failed email password attempt.

Note the similarities to the system.log. We have a string indicating an authentication failure, which includes the dotted-quad IP address and the datetime it occurred. Here is the regex we use to determine if this is a line we are going to process:

And, finally here’s the log info for amavis, which is our email spam filtering program. We have a much longer line in the log file to parse this time. This is all on one line in the log file, but I’ve broken it into several lines here for your reading enjoyment.

While all of the additional information is interesting, we just need to know if the program identified it as SPAM, and what the IP address is. And if that matches, we’ll grab the date from the front of the line with our template mentioned earlier.

So, that covers our logging files in a nutshell. We’ll look into the commands and configurations for the firewall in our next episode.