Monday, November 24. 2008

Imagine a warm and bright Saturday afternoon the summer you were just eight years old. Can you remember playing hide and seek with other children from the neighbourhood or from school? Everybody likes to be the one hiding somewhere. You choose a seemingly hidden spot and wait. After a while it would become boring, if the spot is just too well concealed, so you declare a time-out and win the round. However once a hideout is known to any of the seekers, you will be found eventually. From the seeker's perspective, most likely hiding spots are being searched first, depending on where the seeker would hide if he were on the opposing team. Most likely, some even obvious spots will be missed during the first round, like right on top of you inside the trees. Seekers learn and will check there first in the next round. However the hiding party is learning as well, always coming up with tons of new hideouts and ideas to conceal themselves even better. But they will all be found eventually.
It is not surprising that discovering at least one person is rather easy if most of the group are hiding and few are seeking, so we'll assume the opposite: Many are searching, few are hiding.

The same game may be applied to Flash/SWF. An attacker wants to execute fraudulent code on a victim's machine. In this case, it should be sufficient to execute arbitrary code inside someone's flash player. The "seeker's" objective is (1) to recognise an attack, preferably before execution, and (2) to know the threat in detail. Obviously, the attacker's role in this game consists of suitable counterparts: (1) Hide the existence of an attack, at least until the code is being executed without being found before and (2) obfuscate the code to discourage easy analysis.

You may see certain similarities to the game, virus writers and the antivirus industry have been playing for some time now. The word 'virus' in this context may stand for trojan horses, spyware, malware or any kind of unwanted software. The dominating virus detection technique - at least referring to static analysis - is a signature match against a dictionary of known viruses (see antivirus software). Once a virus has been identified, a fingerprint of its program code or parts of the code results in a new signature for the dictionary. Round one for the hide and seek goes to the seekers. The natural response to avoid signature detection is a self-modifying code, otherwise known as polymorphic or metamorphic code (see computer virus).

Once again, applied to Flash, a signature approach seems appropriate. Flash code can not (easily) alter and strore itself. Even though Flash files are usually not stored locally for constant analysis by virus checkers, the static nature can be observed in the wild. But there is no reason, why a server should not be able to recreate a different version of the same SWF for each request, which is somewhat like an outsourced metamorphism. So, attackers score round two.

For the analysis of non-static code with static function range, heuristic approaches come to mind. [ Georgia Frantzeskou, Efstathios Stamatatos, and Stefanos Gritzalis - Suppοrting the Cybercrime Investigation Process: Effective Discrimination of Source Code Authors Based on Byte-Level Information - 2007] suggested a statistical classification method based on n-grams (see Ngram). The front row application for n-grams is language detection of written text. The occurrence of every N successive characters (including whitespace) of a text is counted and then compared relatively to a reference count of known language.

statistical classification - n-gram over bytecode ops

This classification method can be applied to Flash as well. Instead of N characters of a text, we'll take a sequence of N ABC OP-codes (aka. AVM2 bytecode). The figure shows a graph representation of several arbitrarily chosen SWF9/10 files and their distance based upon the n-gram analysis. (n=3, hidden edges by distance threshold).

Three clusters become apparent: {9,10}, {17,16,12,15,20} and {3,13,19,11,18,5,21,7}. Clustering is an expression of similarity between the SWF's bytecode. N-gram profiles contain characteristics of the compiler or IDE, std. libraries and the code's author(s), each with different intensity. I'd say, that's another point for the seekers.

Now, in order to defy heuristics, two ways pop into consciousness. Either imitate another profile's appearance by adding NOPs and dead code, or hide the bytecode entirely. While imitation techniques can be matched up by even more advanced filtering and statistical methods, we are going to explore more hiding and obfuscation on the byte level. The AVM2 incorporates a byte loader, which can be used to load and evaluate ABC bytecode during runtime. Consider that we can hide our code anywhere inside the SWF or load seemingly unsuspicious data from external sources - e.g. a picture, sound file, timing data or even data encoded as fake dead code. The data would then be transformed back to our original payload and handed over to the byte loader. Of course, our transformation algorithm and the byte loader itself must undergo a procedure, too, in order to look as harmless as possible. Fortunately there are numerous everyday tasks to be solved by data conversion and loading algorithms, so that our few lines of code can be fingerprinted heuristically without arising any suspicion.

With alchemy Adobe hands out a toolkit for fast ByteArray manipulation free of charge, which happens to coincide with bytecode obfuscation/deobfuscation as described. Hence Adobe scores yet another point for the attackers - yay.

All these elaborations are by no means only of theoretical nature. erlswf has been specifically designed to match the needs of SWF bytecode analysis up to this point in this train of thoughts.

A few concluding remarks: The game of hide and seek goes on forever. If anyone wondered what our current state of the game was, my personal guess would put it somewhere near the end of round one. That means, there is more to look forward to and much more to come.

Monday, February 18. 2008

Using the Erlang bit syntax it's an easy task to unpack the tags of an SWF file. With this thought in mind erlswf has been specifically designed to analyse SWF Tags and ActionScript ByteCode for security issues such as the previously mentioned oversized branch offset or pattern matching against URLs loaded during runtime. The toolkit could also be used to implement a transparent proxy filter for exchanging pictures inside Flash files on the fly. Or if you had no choice but to accept prebuilt SWFs from a third party (e.g. ad hosters), it would still be possible to check for arbitrary conditions or restrictions respectively prior to delivery.
The other pure erlang SWF library eswf places emphasis on SWF construction and related data formats (AMF, ABC).

Thursday, January 24. 2008

SWF - or otherwise known as the flash file format - recently caught my attention while discussing web security issues. It can be played on virtually any platform's browser nowadays, which makes it a perfect environment for cross-platform applications (including malware). But before getting into exploring our options of how to exploit the format, let's just get a brief insight into the binary structure of SWF.
The file starts with the string FWS or CWS, followed by an 8-bit version number and 32-bit file length field. In case of CWS all the remaining file contents are zlib compressed:

As indicated, the last tag is a tag with tag type 0 and length 0 hence resulting in a 16 bit representation of 0.
If we wanted to analyze an SWF file, it would be best to uncompress where needed, parse the header and then break down each tag by its code first. When doing so with real world data we may encounter undocumented or unknown codes. There can be several reasons for these mysterious tag codes, for example the file could be corrupted or our parser could be incomplete. More likely, however, is either that a commonly used - yet undocumented - tag was used correctly according to the programmer's point of view (tag type IDs 16, 29, 31, 38, 40, 42, 47, 49, 50, 51,52, 63, 72), OR the tag was deliberately marked with an unknown code in order to hide bytecode or other data.
We'll go along with the latter case, so let's assume - just for a moment - that we are programming a malware flash file. As such our code needs to avoid detection and should be obfuscated as well. The actionscript2 bytecode as located inside doAction tags can issue a branch action (aka. jump or goto) which is ordinarily being used for loops and conditions. Each branch action comes with a relative address of the next action. Example:

0x00: action 1
0x01: some actions...
...
0x10: jump -0x10

Ominously the branch offset (here -0x16) is not restricted to the current code block, but could jump to an entirely different tag instead, where the code is being executed as if it were a code block. Example:

This way the code inside tag1 is hidden from ordinary SWF analyzer tools and can still be executed. In order to make it even harder to find the hidden code, random bytecode could be inserted in between actual bytecode, or dormant bytecode (which is never executed) could be used as distraction. Fortunately this technique is also really easy to detect since a checker only needs to be able to check for uncommon branch offsets, however most disassemblers (such as flare) can be fooled.
Another interesting way to hide code, which is by far not the last one, would be a base64 encoded SWF file ebmedded in an image of another swf file, such as

<img src="data:application/x-shockwave-flash;base64,..."/>

In the end it does not really matter, which way your code is protected or even if it is hidden at all, because there is no security or malware check anywhere within a flash advertising deployment process. An evil attacker could simply buy ad space from an ad broker, the delivered ad is then quickly checked (possibly manually) for style guidelines such as size or close buttons, and finally delivered to their ad servers. That's the end of the (slightly simplified) deployment process.

Let's explore a few technical possibilities on how to protect yourself from flash malware. (Non-technical solutions such as contract fines or national law are not applicable for the anonymous evil hacker.) Java applets - for example - can have signatures. Since there is no way specified to embed cryptographic signatures in SWF files, and by the way only few people would grasp the signature's relevance anyway, this is not a viable option here. Then there is a sort of capability whitelisting: The SWF file could be checked against allowed capabilities, which include having obfuscated code hidden in unknown tags as described above. The check could be done automatically on client side (e.g. by a browser plugin) or by a proxy either intermediately or on server side. But such a capability filter is yet to be written.