Scope of the database

Malware definition ? Any programs (source code or binaries) that could be used or triggered for an unknown usage or known usage often malicious.

When playing with honeynets or compromised systems, you are starting to get a nice collection of malware, malicious scripts and rootkits. The bigger the collection is, the more difficult it is to find past malware. The idea to solve this problem is to create a database containing a fingerprint of all the malware and the malware itself. With this we can be able to correlate information regarding malware and malicious activities in a system. The proposal plans to go a little bit deeper than only the database approach for two reasons :

Using a simple free tagging approach (OK implemented),

with some known prefix like CVEXXXX or RFCXXXX than can be interpreted when needed

Using an additional table to cross-references part of files (To be implemented),

States can be used to reference state of the malware (existing, false positive) but also on the container side (part of, integrated in binary, …). State should be defined as soon as possible for the basis.

Designing an efficient structure is far from being easy. Creating unique id based on submission will limit the possibility of decentralized database (for example merging database can be difficult as often ids are linear in a database). We would like to focus on hashing id per binary. A hash value got the problem of being well distributed and random and for some database this can be an issue. On the other side, we can easily calculate the id (hash) when having to lookup a specific binary.

Tagging Approach

We use a free tagging approach but some tag are interpreted by default. Like CVE:NNNNN or RFCXXXXX and alike. An object can have no or multiple tags. There is not limit.

Software

(console) add.pl <filename> The interface to add malware in the database. Don't forget that the filename will be recorded in the database (including the full path, sometimes required from compromised systems).

(console) add-tag.pl <fingerprint> <tag> Add a free tag to a known fingerprint.

(web) index.pl The simple cgi interface to view the data of the malware database. There is a special interface for "admin" where users can update information related to malware.

DNS Query to the malware database

You can query the database to check for the existence of a malware or not in the database. The purpose is to use a common and easy way to get the information. Using DNS query is a very common way to check or get information, RBL works like that to get black list of IPs from spammers.

A lot of application could benefit of checking hash against a malware database. If a hash is matching something in the database, this could give information regarding the security "state" of a file.

The malware DNS server is faking a DNS server but only answer to "TXT" query. If the record exists, the server is replying with the "origfilename" as a TXT record with a NOERROR status. For all other queries, the NXDOMAIN status code is returned. In a DNS request, the label can't exceed the size of 63 bytes. In order to avoid the size limitation (the SHA-2 in hex format is bigger), the request has to be split using a subdomain. The query can be splitted anywhere you want, the server rebuild the full hash by default (check the example below). The server is always sending AA (Authoritative Answer).