Discovering evasive code in malicious websites

An increasing number of malicious websites are using more sophisticated techniques to evade detection by traditional methods. Attackers’ objectives can include malware distribution, data breaching, defacement and bitcoin mining. Using two types of decoy systems (‘honeyclients’), our team at NTT Secure Platform Laboratories has discovered five previously unknown evasion techniques that abuse the differences between JavaScript implementations.

Figure 1: An attack by a malicious website on a vulnerable browser.

NTT works to detect and monitor these websites using a combination of high and low-interaction honeyclients. A high-interaction honeyclient is a real browser that can precisely detect browser exploits and malware downloads. A low-interaction honeyclient is a browser emulator, which can emulate many different client profiles, trace complicated redirections and hook code executions in detail. These two methods are complementary and improve our overall analysis capabilities.

We usually detect malicious websites and confirm the evidence of maliciousness on the basis of both analysis results. However, attackers also develop more sophisticated techniques to evade our honeyclient analysis. They craft JavaScript code that controls whether to redirect clients to malicious URLs by abusing the differences among client environments. This evasive code is widely distributed through exploit kits, so the need to find a countermeasure is urgent.

Typical redirection abuses the browser fingerprint, conditionally redirecting vulnerable clients based on the browser or OS type in the User-Agent string. Evasive code, however, tests for known differences in JavaScript implementations, inferring whether a browser is vulnerable based on the response to the executed code. By using different JavaScript implementations in our high-interaction and low-interaction honeyclients, we were able to observe the evasive nature of the code and analyse the differences between code gathered by each honeyclient type.

To observe evasive code, we constructed redirect graphs and performed differential analysis on them. We then performed further manual analysis on the code to classify and identify particular evasion techniques based on code similarity.

Figure 2: Differential analysis was performed on redirect graphs from high and low-interaction honeyclients.

Over the course of four years, we investigated 8,500 JavaScript samples collected from captured HTTP transactions with 20,272 malicious websites. Our differential analysis extracted 2,410 pieces of JavaScript code from 1,166 collected traffic pairs of analysis targets. From these code pieces, 57 clusters and 224 noises were formed. This analysis identified 5 new evasion techniques that abuse differences between JavaScript implementations.

We found the following evasive code by manually analysing one representative point in each cluster.

Figure 3: Identified evasion techniques.

To determine whether these evasion techniques could be used as Indicators of Compromise (IOC), we investigated more than 860,000 URLs with Alexa top domain names. The setTimeout() evasive code was detected in 26 URLs, all of which were used in compromised websites. The other evasion techniques were used unintentionally in benign websites or were no longer used.

Figure 4: Differences in browser responses to the setTimeout() function.

We hope these findings help incident responders understand and analyse modern malicious websites, and contribute to improving the analysis capabilities of conventional honeyclients.

This article is based on a presentation given at the 30th Annual FIRST Conference in Kuala Lumpur, Malaysia. The slides can be found here [PDF].

Dr. Yuta Takata is a researcher at NTT R&D and has been a member of NTT-CERT in Japan since 2013. He focuses on developing honeyclients that effectively analyse websites and exhaustively extract malicious behaviours, for example, browser exploitations and malware infections.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Thank you for your question.
The high-interaction honeyclient is IE-based and the low-ineteraction one is HtmlUnit-based. Of course, we extend functions, such as logging and NW trace functions, to our honeyclients. But you can follow our experiments using the following tools.