Hunting with YARA rules and ClamAV

Did you know the open-source anti-virus ClamAV supports YARA rules? What benefits can this bring to us? One of the important features ClamAV has is the file decomposition capability. Say that the file you want to analyze resides in an archive, or is a packed executable, then ClamAV will unarchive/unpack the file, and run the YARA engine on it.

Let’s start with a simple YARA rule to detect the string “!This program cannot be run in DOS mode”:

When we scan the notepad.exe PE file with this YARA rule, the rule (test1) triggers.

We can do the same with clamscan:

With option -d (database), we bypass ClamAV’s signature database defined in clamd.conf and instruct clamscan to use the YARA rule test1.yara.

As shown in the example above, using clamscan on the PE file notepad.exe also triggers the previously created YARA rule test1.yara: YARA.test1.UNOFFICIAL.

In this example we decided to use just one YARA rule for simplicity, but of course you can use several YARA rules together with ClamAV’s signature database. Just put your YARA rules (extension .yara or .yar) in the database folder.

As mentioned in the introduction, ClamAV can also look inside ZIP files and apply the YARA rules on all files found in archives:

This is something the standard YARA tool can not:

ClamAV’s YARA rules support does however have some limitations. You can not use modules (like the PE file module), or use YARA rule sets that contain external variables, tags, private rules, global rules, …Every rule must also have strings to search for (at least 2 bytes long). Rules with a condition and without strings are not supported.

Let us take a look at a rule to detect if a file is a PE file (see appendix for the details of the rule):

We get a warning from ClamAV: “yara rule contains no supported string”.

As ClamAV does not support rules without string: section. We must add a string to search for, even if the rule logic itself does not need it. Since a PE file contains string MZ, let’s search for that:

This time the rule triggers.

Now, a tricky case: how do we design a rule when we have no single string to search for? The ClamAV developers offer a work-around for such cases: search for any string, and add a condition checking for the presence OR absence of the string. Like this:

We search for string $a = “any string will do”, and we add condition ($a or not $a). It’s a bit of a hack, but it works.

ClamAV’s file decomposition features bring a lot to the table when it comes to YARA scanning, but in some cases it can be a bit too much. For example, ClamAV decompresses the VBA macro streams in Office documents for scanning. This means that we can use YARA rules to scan VBA source code. A simple rule searching for words AutoOpen and Declare would trigger on all Word documents with macros that run automatically and use the Windows API. Which is very nice to detect potential maldocs. However, ClamAV will apply this YARA rule to all files and decomposed/contained files. So if we feed ClamAV all kind of files (not only MS Office files), then the rule could also trigger (for example) on text files or e-mails that contain words AutoOpen and Declare.

If we could limit the scope of selected YARA rules to certain file types, this would help. Currently ClamAV supports signatures that are only applied to given file types (PE files, OLE files, …), unfortunately this is not supported for YARA files.

ClamAV is an interesting engine to run our YARA rules instead of the standard YARA engine. It has some limitations however, that can also generate false positives if we are not careful with the rules we use or design.

Deconstructing the YARA rule

Our example rule to detect a PE file contains just a condition:

uint16(0) = 0x5A4D and uint32(uint32(0x3C)) == 0x00004550

This rule does not use string searches. It checks a couple of values to determine if a file is a PE file. The checks it performs are:

see if the file starts with a MZ header, and;

contains a PE header.

First check: the first 2 bytes of the file are equal to MZ. uint16(0) = 0x5A4D.

Second check: the field (32-bit integer) at position 0x3C contains a pointer to a PE header. A PE header starts with bytes PE followed by 2 NULL bytes. uint32(uint32(0x3C)) == 0x00004550.

Functions uint16 and uint32 are little-endian, so we have to write the bytes in reverse order: MZ = 0x4D5A -> 0x5A4D