Enumerating Anti-Sandboxing Techniques

Enumerating Anti-Sandboxing Techniques

Fighting/writing malware is very much a cat and mouse game. One of several techniques used by Anti-Virus/EDR solutions is to detonate payloads in a sandbox and watch what happens. To combat this, malware writers (and pentesters) have been including checks in their payloads to identify when running in a sandbox to evade detection. However, these evasion techniques themselves are often being caught. So, to that end, I wondered if there might be a way to enumerate what anti-sandboxing checks are stealthier than others.

To test this, I created several benign payloads, each with a different anti-sandbox check, and uploaded them to Virus-Total, Hybrid Analysis, and MetaDefender. Doing this and recording the results can disclose what sort of anti-sandboxing checks are most easily detected and by whom. However, before we get started, a few notes:

Not all anti-sandboxing techniques are the same. For example, one technique, wherein the payload counts the number of CPU’s and only detonates if there are more than 3, can be coded in several different ways. For these tests, I went with the most basic approach. If a technique is caught, then we know which approach needs more investigation, and which should simply be avoided altogether.

Security solutions on VirusTotal and MetaDefender are configured by the vendor and may be configured differently in your environment.

There are plenty of anti-sandboxing techniques to use/choose from, I just selected ten out of ease.

Rules and definitions are updated constantly, therefore these results might vary a month from now. All testing was performed in June 2018.

For this step, I created a simple program, which when executed displays a random fact about cats.

#include <stdio.h>
#include <windows.h>
#pragma comment(linker, "/SUBSYSTEM:windows /ENTRY:mainCRTStartup")
void send_pop_up(char *fact)
{
MessageBox(NULL, fact, "Test", MB_OK);
}
void getCatFact()
{
char catFacts[20][100];
strcpy(catFacts[0], "There are cats who have survived falls from over 32 stories (320 meters) onto concrete.");
strcpy(catFacts[1], "A group of cats is called a clowder.");
strcpy(catFacts[2], "Cats have over 20 muscles that control their ears.");
strcpy(catFacts[3], "Cats sleep 70% of their lives.");
strcpy(catFacts[4], "A cat has been mayor of Talkeetna, Alaska, for 15 years. His name is Stubbs.");
strcpy(catFacts[5], "Cats can not taste sweetness.");
strcpy(catFacts[6], "Owning a cat can reduce the risk of stroke and heart attack by a third.");
strcpy(catFacts[7], "Wikipedia has a recording of a cat meowing because why not?");
strcpy(catFacts[8], "The world's largest cat measured 48.5 inches long.");
strcpy(catFacts[9], "Adult cats only meow to communicate with humans.");
strcpy(catFacts[10], "A cat usually has about 12 whiskers on each side of its face.");
strcpy(catFacts[11], "All cats have claws, and all except the cheetah sheath them when at rest.");
strcpy(catFacts[12], "Approximately 1/3 of cat owners think their pets are able to read their minds.");
strcpy(catFacts[13], "In the 1750s, Europeans introduced cats into the Americas to control pests.");
strcpy(catFacts[14], "A 2007 Gallup poll revealed that both men and women were equally likely to own a cat.");
strcpy(catFacts[15], "A cats heart beats nearly twice as fast as a human heart, at 110 to 140 beats a minute.");
strcpy(catFacts[16], "Cats spend nearly 1/3 of their waking hours cleaning themselves.");
strcpy(catFacts[17], "A female cat is called a queen or a molly.");
strcpy(catFacts[18], "Rome has more homeless cats per square mile than any other city in the world.");
strcpy(catFacts[19], "The richest cat is Blackie who was left 15 million by his owner, Ben Rea.");
time_t result = time(NULL);
send_pop_up(catFacts[result % 20]);
}
int main(void)
{
getCatFact();
}

I submitted the baseline app to Virus Total and surprisingly 5 out of 67 AV vendors think it’s malicious (or they just straight-up hate cats). Hybrid Analysis gives it a score of 36/100 and Meta Defender gives it a 2/31. While this is less than ideal for a simple harmless app, it does provide a baseline for our next steps.

Step 2: Add In Sandbox Detection Checks

The following checks were tested:

With each check, a separate executable was generated and submitted to Virus Total/Hybrid Analysis and MetaDefender. These sites were chosen as they provide a wide array of AV/EDR products to test against.

Number of CPU’s:

This will check the number of processors on the system. If there are less than two processors, exit.

Overall Results:

Virus Total:

Table part 1/2

Table part 2/2

Hybrid Analysis:

MetaDefender:

Total Test:

Conclusions:

First off, we can tell that there are many products out there that do a great job of detecting anti-sandbox techniques. Granted, many of these techniques on their own are not malicious, so every instance where our payload is flagged as being bad is really a false positive. That said, this information does allow us to narrow down on what techniques are, in their simplest form, effective in reducing detections. Using the information above, I can see that using sleep, AV/EDR process names and disk sizes by themselves will most likely get us flagged. As an attacker, I could spend time focusing on reducing the detection rate there, but instead, why not focus on something like domain membership, Uptime, or Ram Size? If you’re interested in testing your own anti-sandbox techniques, I’d recommend checking out the examples at CheckPlease.

With over a decade of industry experience, Hans Lakhan (@jarsnah12) has worked in both offensive and defensive roles. Before switching to red teaming, he spent 5 years working as a Technical Security Analyst for a Fortune 500 telecommunications company, specializing in networking, firewalls, vulnerability management, and VPNs.