Online sandbox services are an interesting concept. Individuals who suspect that a file or URL may be malicious can upload the file or URL to the portal of a malware analysis service, and in short order receive an answer. Anubis, Malwr (Figure A), and VirusTotal are examples of such a service.

"We inspected the samples submitted to the Anubis sandbox. These binaries are voluntarily submitted to the sandbox by users who want more information about the behavior of Windows PE executables. This data set contains over 30 million samples collected over a period of six years."

The team used sophisticated analytics to make sense of the data. The first step was to reduce the size of the data set from 32 M to 12 K. Next the team:

clustered data sets via binary similarity and submissions metadata;

used binary analysis techniques to inspect samples in the clusters;

extracted interesting features from the samples;

and trained a classifier to automatically discover malware.

The researchers found something of note: Malware used in several high-profile attack campaigns was found in the databases being studied. Not that unusual until the researchers correlated when (Time Before Public Disclosure) the malware was submitted. Some of their findings are shown in Figure B.

Tracing malware development

The research highlights the ongoing and important challenges associated with malware that is caught but mislabeled, and therefore not properly associated with advanced persistent threat (APT) campaigns. To that end, the researchers focused on the detection of what they call malware development — seeing if it's possible to identify the activity of malware developers and get the word out proactively.

"We use the term 'development' in a broad sense, to include anything that is submitted by the author of the file itself," mentions the report...."Our main goal is to automatically detect suspicious submissions that are likely related to malware development or to a misuse of the public sandbox. We also want to use the collected information for malware intelligence."

To accomplish their goal, the researchers figured out how to distinguish malware development samples from ordinary malware samples. Although not perfect, the team's prototype implementation was able to mine the data sets and collect substantial evidence related to malware developments.

"Our system automatically detected the development of a diversified group of real-world malware, ranging from generic trojans to advanced rootkits," adds the USENIX report. "To better understand the distribution of the different malware families, we verified the AV labels assigned to each reported cluster."

Listed below are the types of malware the team's automated tool found within the 1,474 clusters tested:

45 botnets

1,082 trojans

83 backdoors

4 keyloggers

65 worms

21 malware development tools

When asked what this all means, Graziano surmises, "The system can be deployed as an early-warning system to flag suspicious submissions. This system can be attached transparently to any sandbox and we expect similar results from other data sets."

It was suggested that the bad guys would then just stop using the online malware analysis services. "Mistakenly, people think the proposed system would stop these suspicious submissions," writes Graziano. "But, the truth is the bad guys have to interact with sandboxes and with security products in general to learn how they work in order to devise and test evasion techniques."

Graziano adds, "We believe the key message of the paper is that malware authors are abusing public sandboxes to test their code, and we do not need very sophisticated analysis tools to find them."

Note: Since the paper and USENIX Symposium, Mariano Graziano has become a security researcher in Cisco's Talos Security Intelligence and Research Group.