Table of Contents

1. Introduction

CME identifiers will identify "malware threats". At the most basic level, a "malware threat" is anything that has the potential to damage a computer system or network. Furthermore,

A malware threat can be identified by a signature.

A malware threat may or may not exploit a vulnerability.

A malware threat may or may not rely on user action in order to be effective.

The CME initiative assumes that it is possible to protect against a malware threat. Examples of malware threats include viruses, worms, and Trojan horses.

Malware threats will be represented by a collection of one or more "samples." A malware threat sample will likely contain multiple files (i.e., not consist of a single executable binary file). A CME identifier will be associated with one or more representative samples. Each sample in a CME identifier sample set should be equivalent with respect to deconfliction (see Section
5), but should illustrate an aspect of the malware threat not illustrated by any other sample.

As this paper will discuss, it is not necessarily possible to define malware threat attributes so that someone with their own threat sample will be able to find the correct CME identifier associated with the sample. More likely, the value of a CME identifier will be in the coordination of different security devices (e.g., host-based anti-virus (AV) products, gateway devices).

This document addresses operational aspects of the CME initiative, including
the purpose and scope of CME, how CME identifiers are assigned, and the initial
process for the deconfliction of malware threat samples.
The initiative expects this document to evolve as a result of discussions among
the CME Editorial Board and as additional identifiers are assigned over time.

2. Scope

The objective of the CME initiative is to provide common identifiers to those malware threats that are of primary significance from the perspective of anti-virus vendors, IT security managers, and the general public. CME identifiers will not be assigned to all malware threats, but a CME identifier should be assigned to any malware threat for which one or more of the following statements is/are true:

3. Identifiers

Initially, CME identifiers will be in the format CME-N where N is an integer
between 1 and 999, such as "CME-123". Digits will be added when
the remaining unused identifier space becomes too small.

Furthermore:

When necessary, CME IDs can be abbreviated (e.g., M123), but the official format (e.g., CME-123) should be used in places such as Web pages, alerts, encyclopedias, etc.

For the sake of successful text-based comparisons, leading zeros will always be omitted in an identifier. For example, CME-00123 will always be written as CME-123.

Identifiers will be randomly generated within each size range (e.g., CME-439 might be issued before CME-28). In this way it will not be possible for someone to assign their own identifier by guessing the next in sequence.

When more identifiers are needed, the CME
Editorial Board will decide how
many digits to add. Eventually, CME will use up to seven digits.

4. Identifier Assignment Process

The secure CME Submission Server, which is used only by authorized
members of the CME Sample Redistribution Group, went online in April 2005 for
assigning CME identifiers. Highlights of this portion of the process include:

CME identifiers are assigned to "malware threats" and not to individual threat components.

CME identifier distribution is largely automated.

Samples from participants should be submitted as close to signature generation time as possible.

Deconfliction will be done by the existing 24x7 malware analysis teams among
participating vendor organizations. Final deconfliction decisions will be by
consensus of the CME Sample Redistribution Group. Note that the deconfliction
process is the most difficult aspect of the CME identifier assignment process;
see Section
5 for a detailed discussion.

Samples will only be shared among the trusted CME Participants. A submitter will bundle all files of the sample into a zip archive. The archive will be encrypted with all keys contained on the CME PGP key ring so that all members of the Sample Redistribution Group will have access to the sample. Samples are not stored on the submission server, but are immediately redistributed to the Sample Redistribution Group. Initially, Sample Redistribution Group membership will overlap with the CME
Editorial Board.

The process for a CME Participant to acquire a CME identifier is:

A participant identifies a sample that is (a) critical in nature, and (b) does not yet have a CME identifier. (This second question is difficult and must be verified as part of the deconfliction process.)

The participant requests a CME identifier through an automated, Web-based interface. The request includes the sample and any available supporting data.

If there have not been any other requests for CME identifiers in a 2-hour time window, then the automatic program responds with a CME identifier and sends an email notification to all CME participants.

If there have been earlier requests within a 2-hour time window, then a "moratorium" period
is entered. The automatic program does not provide a CME identifier; instead,
it lists all recent requests. The current
Sample Redistribution Group
must review the recent requests to determine whether the new sample is a duplicate
of an existing request (see Section
5 for more on deconfliction). If the deconfliction process indicates
that the sample is equivalent to the earlier request, then the CME identifier
for the original request is used.
If the
sample is a new threat, then the current requester "overrides" the moratorium and forces a new CME identifier to be obtained. To prevent abuse of this system, only trusted users can override requests.

At present, all sample submission is performed by members of the CME Sample Redistribution Group.

5. Deconfliction

The deconfliction process answers the question, when
are two malware threats equivalent? If one malware threat is equivalent to another malware threat, they will both have the same CME identifier. If they are not equivalent, they will have different CME identifiers.

The deconfliction process is one of group consensus by the Board where the group follows the current technical CME
Deconfliction Guidelines in Section 5.1. When an appropriate guideline does not yet exist, the Board will formally define one when possible.

The general deconfliction process for CME is as follows:

The submitter provides a malware threat sample and as much analysis information as possible. Analysis information should indicate why the submitter thinks that the threat is different than other threats that have CME identifiers. Specific guidelines (Section
5.1) should be referenced when possible.

Members of the CME Editorial Board will evaluate the deconfliction information and will submit any information that indicates whether or not the sample is equivalent to a malware threat previously identified.

If there is disagreement over whether a sample requires a new CME identifier, email will be exchanged or a teleconference will be held so that consensus can be reached.

When multiple outbreaks are underway, it may take time for samples to be submitted. It will be crucial for the submitter, as well as others on the Board, to modify the request (e.g., add additional samples, add new supporting files, modify previously submitted analysis notes) to ensure deconfliction is completed accurately.

5.1 CME Deconfliction Guidelines

This is the initial list of guidelines for deconfliction as defined by the
Board. Additional guidelines will be identified based on operational content
decisions of the
Sample Redistribution Group
as additional CME identifiers are assigned. Ideally, the deconfliction process
will evolve over time to depend more and more on technical characteristics
of the malware threat samples and to follow more and more explicit guidelines.

For some guidelines, example cases are provided that refer to malware threats by name. After each name, the vendor(s) using the particular name is given in parentheses.

GUIDELINES:

G.1 Every file or component of a malware threat will be assigned the same CME identifier.

If a new outbreak downloads additional files from an external website, the downloaded files will get the same CME identifier as the file that initiated the download. In the cases where there is more than one downloaded file, or when the downloaded file changes (e.g., a modified version is uploaded), the CME identifier is assigned to all the additional files.

This will mean that a single file (e.g., a file downloaded by multiple threats) might be associated with more than one CME identifier. In their product, a vendor may only be able to identify the first CME identifier assigned to a file. However, other assigned CME identifiers should be provided in the vendor's encyclopedia.

Some files that are associated with a malware threat are excluded from CME identifier coverage. For example, valid and harmless .com files that might be used as part of a malware threat should not be assigned a CME identifier. See other guidelines for specific details.

Example case: Bagle.BE (Trend) outbreak in Feb. 2005 arrived as a downloader file, which downloaded additional files from several URLs included in the malware code. The downloader file and all of the additional downloaded files would have the same CME identifier.

G2. New files uploaded to a download site more than 48 hours after an initial outbreak will not be associated with any CME identifier.

There must be a limit on the number of files associated with a CME identifier.

Example case: None

G3. Log files generated by a malware threat and stored on the victim hard drive are not associated with the CME identifier.

A description of the log file would be an attribute contained in the CME identifier profile.

G4. A system file that is modified by a malware threat is not assigned a CME identifier directly. Rather, the fact that the system file is modified is an attribute of the CME-identified malware threat.

A description of the modification would be an attribute contained in the CME identifier profile.

Example case: Matcher.A (Trend) outbreak in July 2001 made a harmless modification to autoexec.bat. This file would not be identified by the CME identifier.

G5. Any file that is dropped by a malware threat is associated with the CME identifier, whether or not the file is malicious (subject to guideline G-6).

G6. Code that exploits a vulnerability that can be detected with a scanner
will be assigned a CME identifier, along with any related files.

Example case: Nimda.A (Trend) outbreak in September 2001 arrived as an email attachment, dropped several files on the hard drive, infected files, and spread as a network worm. A CME identifier would be assigned to the byte sequence captured by a scanner, as well as to the email attachment, dropped files, infected files, and downloaded files.

G7. Memory dumps will be assigned a CME identifier, along with any related
files.

Example case: the CodeRed outbreak in July 2001 caused a buffer overflow and never dropped any files to the hard drive. The memory dump of CodeRed would be assigned a CME identifier.

G8. Some tangible file (e.g., a packet capture) is required before a CME
identifier can be assigned.

Example case: The Slammer worm (outbreak January 2003) was contained in a single UDP packet. Until a packet capture was available, a CME identifier could not be assigned.

G9. Malware threats that have functional differences will be assigned different
CME identifiers.

A functional difference is defined to be any byte difference in the code. Examples include a difference of port number or email subject line. Vendors do not always report all functionally different malware threats to customers, choosing instead to associated multiple threats with a single name. In these cases, the single name would be associated in the vendor encyclopedia with multiple CME identifiers.

Example case: Many files were associated with Bagle activity on 3/1/05. Because of string differences and a difference of downloaded files, five different CME identifiers would have been assigned.

G10. A difference of attributes that are randomly generated by a malware
threat (e.g., randomly generated email subject lines) does not constitute a
functional difference.

G11. The packing method of a malware threat does not constitute a functional
difference.

G12. Each malware threat created by a single malware threat "construction
kit" will be given separate CME identifier if they are functionally different.
A separate CME identifier will be assigned to the construction kit itself.