Wednesday, December 2, 2015

During a recent engagement, while hunting for
threats in a client's environment, I got tasked with having to analyze over a
terabyte worth of security (Security.evtx) event logs. A terabyte worth of logs amounts to, a lot of
logs. We are talking close to a thousand
logs, each containing approximately 400,000 events from dozens of Windows
servers, including multiple domain controllers.
Did I say, a lot of logs?

Unfortunately, this wasn't the only task of
the engagement, so I needed to go through these logs and I needed to do it
quickly. I needed to do it quickly
because like in most engagements, time is against you.

When you only have a few logs to look at, one
of my tools of choice on the Windows side is Event Log Explorer. Event Log Explorer is great. It is a robust, popular, GUI tool with
excellent filtering capabilities. On the
Linux side, I have used Log2timeline to convert dozens of evtx files to CSV
and then filter the CSV file for the data that I was looking for. But this was another animal, a different
beast. This beast needed a tool that
could parse a very large amounts of logs and have the ability to filter for
specific events within the data. The
answer to the problem came in the form of a tiny tool simply called Log Parser.

Log Parser is a free tool designed by
Microsoft. You can download
the tool here. According to the documentation from the site
the tool is described in this manner. “Log Parser is a powerful, versatile tool
that provides universal query access to text-based data such as log files, XML
files and CSV files.” That one-liner
perfectly sums up why the tool is so powerful, yet not as popular as other
tools. Log parser provides query access
to data. What does that mean? This means that if you want to parse data
with this tool you have to be somewhat comfortable with the Structured Query
Language (SQL). The tool will only cough
up data if it is fed SQL like queries.
The use of SQL like queries for filtering data is what gives the tool
its power and control, while at the same time becoming a stopping point and a
deal breaker for anyone not comfortable with SQL queries.

The purpose of this article is to attempt to
explain the basic queries required to get you started with the tool and in the
process show the power of the tool and how it helped me make small rocks out of big rocks.

Installing the Tools:

The tool is downloaded from here in
the form of an msi. It installs using a
graphical installation, very much like many other tools. Once installed the tool runs from the command
line only. For the purposes of the
article, I will be using a security log extracted from a Windows Server 2008R2
Domain Controller that I own, and use for testing such as this. If you want to follow along, you can extract
the Security.evtx log from a similar server or even your Windows 7
machine. The log is located under
\Windows\System32\winevt\Logs.

The Test:

Log Parser is a command line only
utility. To get started open up a
command prompt and navigate to the Log Parser installation directory located
under C:\Program Files (x86)\Log Parser 2.2.

The security log that I will be using for the
write-up is called LosDC.evtx. The log
contains exactly 5,731 entries. It is
not a large log, but it contains the data that we need to illustrate the usage
of the tool. I extracted the log and
placed it on my Windows 7 examination machine in a directory on the Desktop
called “Test.”

Now, the most basic SQL query that one can
run looks something like this. It is
called a select statement. “select *
from LosDC.evtx” The ‘select’, as you
suspected, selects data that matches your criteria from the columns in your log. In this instance we are not doing any
matching yet, we are simply telling the tool to select everything by using an
asterisk “*” from the LosDC.evtx log.
The tool needs to know what kind of file it is looking at. You tell the tool that is it reading data
from an event log with the -i:evt parameter, like so:

This query will send the first 10 lines of
the file to standard output. A lot of
data is going to be sent to the screen.
It is very difficult to make any use of this data at this point. The only positive that can come from this
command is that you can begin to see the names of the columns in the event log
like “TimeGenerated”, “EventID”, and so on.

An easier way to see the columns in the event
log is by using the datagrid output feature, which sends the data to a GUI,
like so:

Thanks to the GUI it is now easier to see the
TimeGenerated and EventID columns. Also,
I want to point out the “Strings” column, which contains data that is very
valuable to us. The majority of the
important data that we are after is going to be contained in this column. So let us take a closer look at it.

If we build upon our last query and we now
replace the asterisk "*" with the name of a specific column, the tool will now send
the data matching our criteria to standard output, like so:

Notice that the tool is now displaying only
the information that is found in the strings column. The data is displayed in a delimited
format. The data is being delimited by
pipes. Field number 5 contains the
username of the account, field number 8 contains the Log-On type, and field
number 18 contains the source IP of the system that was used to authenticate
against the domain controller.

You have probably seen this data displayed in
a prettier manner by Event Log Explorer.

Yet, is in fact the same data, and Log Parser
has the ability to extract this data from hundreds of log files quickly and
efficiently. But to accomplish this we
have to continue adding to our query. In
my recent case I was looking for the username, the log-on type, and source IP of
all successful logins. As mentioned earlier,
this data was being stored in field 5, field 8, and field 18 of the Strings
column. To extract that data we need to craft
a query that could extract these specific fields from the Strings column. To accomplish that, we have to introduce a Log
Parser function called extract_token.
The extract_token function gives Log Parser the ability to extract data
from delimited columns like the Strings column. To extract the data from the fifth delimited field in the strings column
we need to add this to our query:

extract_token(strings,5,'|') AS User

Let me break this down, extract_token is the
function. We open parenthesis and inside
of the parenthesis we tell the function to go into the strings column and pull
out the fifth field that is delimited by a pipe “|” and then we close
parenthesis. “AS User” is used so that
once the data is pulled out of the Strings column, it is displayed in a new column with the new name of “User”. It is
like telling the function “Hey, display this as 'User'.”

To pull the data from the eighth field in the
Strings column, we use this function:

extract_token(strings,8,'|') AS LogonType

And finally to pull the data from the
eighteenth field in the Strings column, we use this function:

\LosDC_4624_logons.csv from
C:\Users\carlos\Desktop\Test\LosDC.evtx where eventid in (4624)" -i:evt
-o:csv

The select statement is now selecting the TimeGenerated and EventID columns, followed by the three
extract_token functions to pull the data from the Strings column. Into is an optional clause that specifies
that the data be redirected to a file named
LosDC_4624_logons.csv in the Test directory. From specifies the file to be queried, which
is the LosDC.evtx log. Where is also an
optional clause which specifies data values to be displayed based on the
criteria described. The criteria
described in this query is 4624 events contained in the eventid column. The -o:csv is another output format like the
datagrid, except this one sends the data to a csv file rather than a GUI.

This is an example of what you can gather
from the resulting CSV file. This is
what you would see if you were to sort the data in the CSV file by user.

Notice the times, and source IP that was used
by user “larry” when he used the RDP protocol (Logon Type 10) to remotely
log-in to his system.

Cool, Right?

I want to point out that this log only
contained 5731 entries and that the data redirected to the CSV file consisted of
1,418 lines. That data was parsed and
redirected in less than 0.2 seconds

That is another example of the power of
the tool. Keep in mind that when you are parsing gigabytes worth of logs, the resulting CSV files are going to
be enormous. Below is an explorer
screenshot displaying the amount of security event logs from one the servers in
my case (Server name has been removed to protect the innocent).

The sample data from that server was
40GB. It was made up of 138 files each
with approximately 416,000 records in each log.

The tool parsed all of that that data in only
23 minutes.

It searched 60 million records and created a CSV
file with over 700,000 lines. Although
you can certainly open a CSV file with 700,000 lines in Excel or LibreOffice Calc,
it is probably not a good idea. Don't
forget that you can search the CSV file directly from the command prompt with find. Here is an example of searching the CSV
file for user "larry" to quickly see which machines user "larry" used to authenticate on the Domain.

And there you have it!

Conclusion:

This is a free and powerful tool that allows
you to query very large amounts of data for specific criteria contained within
the tables of your many event log files.
If this procedure helped your investigation, we would like to hear from
you. You can leave a comment or reach me
on twitter: @carlos_cajigas

Just found this while looking over the site. While I do not do bulk Event log views like you describe, your descriptions of how to use the tool and create the queries were great. Good stuff... THanks..