11.4. Code Emulation

Code emulation is an extremely powerful virus detection technique. A virtual machine is implemented to simulate the CPU and memory management systems to mimic the code execution. Thus malicious code is simulated in the virtual machine of the scanner, and no actual virus code is executed by the real processor.

Some early methods of "code-emulation" used debugger interfaces to trace the code using the processor. However, such a solution is not safe enough because the virus code can jump out of the "emulated" environment during analysis.

Among the first antivirus programs was Skulason's F-PROT, which used software-based emulation for heuristic analysis. The third generation of F-PROT integrated the emulator and the scanning components to apply emulation to all computer virusesparticularly the difficult polymorphic viruses.

As an example, the registers and flags of a 16-bit Intel CPU can be defined with the following structures in C language:

The point of the code emulation is to mimic the instruction set of the CPU using virtual registers and flags. It is also important to define memory access functions to fetch 8-bit, 16-bit, and 32-bit data (and so on). Furthermore, the functionality of the operating system must be emulated to create a virtualized system that supports system APIs, file and memory management, and so on.

To mimic the execution of programs, the data from executable files is first fetched into memory buffers. Then a giant switch() statement of the emulator can analyze each instruction opcode, one by one. The current instruction, pointed by the virtual register IP (instruction pointer) is decoded, and the related virtual function for each instruction is executed. This changes the content of the virtual machine's memory, as well as the virtual registers and flags. The instruction pointer register IP is incremented after each executed instruction, and the iterations are counted.

Consider the code snippet of a 16-bit CPU emulator shown in Listing 11.8. First, the code selects the next instruction for execution with an internal read_mem() function that will access the already fetched buffers according to CS (code segment). Next, a while loop executes instructions according to preset conditions, such as the number of iterations. The execution also stops if the emulator experiences a fatal emulation error.

This example illustrates how the CPU emulator encounters a NOP and INT instruction during emulation. When a NOP (no operation) is executed, the IP register needs to be incremented. When an INT instruction is executed, the code in Listing 11.8 demonstrates what the emulator does when the DOS get version call is executed.

First, the state of the CPU stack is set, given that the INT instruction sets a return address on the top of the stack. The emu_int() function should normally handle most interrupt calls, but in this example only the DOS version check is handled and the false 3.38 DOS version is returned to the caller of the interrupt. As a result, a program executed in the virtual machine will receive the false version numbers when running in the virtual system. This illustrates that everything is under the emulator's control. Depending how well the emulator can mimic the real system functionality, the code has more or fewer chances to detect the fact that it is running in a virtual environment. Of course, the preceding code is over-simplified, but it demonstrates the typical structure of a generic CPU emulator. The 32-bit emulators differ only in complexity.

Polymorphic virus detection can be done by examining the content of the virtual machine's memory after a predefined number of maximum iterations, or whenever other stop conditions are met. Because polymorphic viruses decrypt themselves, the virus will readily present itself in the virtual machine's memory if emulated long enough. The question arises of how to decide when to stop the emulator. The following common methods are used:

Tracking of active instructions:
Active instructions are those instructions that change an 8-bit, 16-bit, or 32-bit value in the virtual machine's memory. The instruction becomes active when two memory modifications occur next to each other in memory. This is a typical decryption pattern in memory. Although not all decryptors can be profiled with this technique, it covers the most common cases. The emulator can execute instructions up to a predefined number of iterations, such as a quarter of a million, half a million, or even a million iterations, but only if the code continuously generates active instructions. Short decryptors typically generate a lot of active instructions, whereas longer decryptors with a lot of inserted junk will use active instructions less frequently. For example, this technique was used in the IBM Antivirus.

Tracking of decryptor using profiles:
This method takes advantage of the exact profile of each polymorphic decryptor. Most polymorphic engines use only a few instructions in their decryptor. Thus the first time an instruction is executed that is not in the profile, the first decrypted instruction is executed. This moment can be used to stop the emulator and attempt virus detection.

Stopping with break points:
Several predefined break points can be set for the emulator as conditions. For example, an instruction or a hash of a few instructions can be used from each polymorphic virus to stop the execution of the emulator whenever the decrypted virus body is likely to be executed. Other conditions can include the first interrupt or API call because polymorphic viruses typically do not use them in their decryptors (but some antiemulation viruses do use them).

First, the location of emulations need to be identified. For example, each known entry point of a program can be emulated. Moreover, each possible decryptor location is identified (this method can assume false decryptor detection because the detection of the decryptor itself will not produce a virus warning). Then the decryptor is executed for an efficient number of iterations, and the virus code is identified in the virtual machine of the scanner by checking for search strings (or by using other, previously discussed methods) in the "dirty" pages of the virtual machine's memory.

Note

A memory page becomes dirty when it is modified. Each modified page has a dirty flag, which is set the first time a change occurs in the page.

Such detection can be much faster than X-RAY-based scanning. However, it depends on the actual iterations of the decryptor loop. With short decryptors, the method will be fast enough to be useful. In the case of longer decryption loops (which have a lot of garbage instructions), even partial decryption of the virus code might not be fast enough because the number of necessary iterations can be extremely high, so the decryption of the virus would take more than several minutes in the virtual machine. This problem is also the greatest challenge for emulator-based virus heuristics.

11.4.1. Encrypted and Polymorphic Virus Detection Using Emulation

Consider the example in Listing 11.9, which shows an emulation of an encrypted virus { W95,W97M} /Fabi.9608 in a PE file. This virus places itself at the entry point of infected files. Emulation of the entry-point code will result in a quick decryption of virus code in the virtual machine's memory. Although this virus is not polymorphic, the basic principle of polymorphic virus detection is the same.

Fabi initializes the ESI pointer to the start of the encrypted virus body. The decryption loop decrypts each 32-bit word of the body with a 32-bit key. This key is set randomly but is also shifted in each decryption loop. At iteration 12, the virus generates an active instruction because two 32-bit words in memory are changed next to each other as the XOR instruction decrypts them. This can signal an emulator to continue emulation of the decryption loop, which will stop after about 38,000 iterations.

Note

Detection of the virus is possible before the constant virus body is completely decrypted. However, exact identification of this virus using emulation would require this many iterations to be executed in the virtual machine.

After a few thousand iterations, the virus is decrypted. The scanner can use any of the previously mentioned techniques to detect or identify the virus easily in the virtual machine's dirty pages. String scans are typically done periodically after a number of predefined iterations during emulation, so the complete decryption of the virus does not need to happen. Then the emulation can be further extended for exact identification.

The emulation of Fabi reveals the name of its creator, Vecna, with a Portuguese message that translates to "My poetry" (see Listing 11.11).

This technique is the most powerful detection method for polymorphic viruses. It can be applied to metamorphic viruses that use encryption layers. An antivirus product that does not support code emulation is ineffective against most polymorphic viruses because response time is seriously affected by complex polymorphic viruses. This is a danger to those scanners that do not implement emulation because polymorphic computer worms leave little room for prolonged response. Indeed, there are wildly adopted scanners that do not support emulation technology that will not be able to respond to complex threats.

The key idea in using the emulator relies on the trial-and-error detection approach. A computer file might be emulated from more than a hundred possible entry points, one after another, attempting to find viruses. Such detections are not cost effective in the long run, and they can only survive if the average computer's CPU performance can keep up with the increasing performance hunger of the code emulation. As scanners evolve, the effectiveness of the scanner on old platforms decreases. For example, emulation of Pentium CPUs on 8086 or even 286 processors is too slow to be effective nowadays. In addition, handheld devices have very limited CPU power, and thus complex mobile threats will be more challenging to detect and repair on the native platform. (For example, imagine a polymorphic virus running on a Pocket PC using an ARM processor. Such a virus would be slow, but the antivirus would be even slower.)

11.4.2. Dynamic Decryptor Detection

A relatively new scanning technique uses a combination of emulation and decryptor detection. For viruses with longer loops, such as W32/Dengue, virus emulation cannot perform very fast. The possible entry point of the virus decryptor can be identified in a virus-specific manner. During emulation, for example, specific algorithmic detection can check which areas of the virtual machine's memory have been changed. If there is a suspicious change, additional scanning code can check which instructions were executed during a limited number of iterations. Furthermore, the executed instructions can be profiled and the essential set of decryptor instructions can be identified. For example, a virus that always uses XOR decryption will execute a lot of XOR instructions in its decryptor. At the same time, certain instructions will never be executed, which can be used as exclusions. The combination of inclusions and exclusions will yield an excellent profile of polymorphic decryptors. This can be used by itself to detect the virus with enough filtering in place to keep false positives to the minimum, making the decryptor detection more fully proved and quick enough.

This technique cannot be used to repair the virus because the virus code must be emulated for a longer time (up to several minutes in bad cases) to decrypt the virus code completely using the emulator.

It appears that entry pointobscuring polymorphic viruses are a big problem for all kinds of scanning techniques that seek to be time and cost effective. Heuristic methods are not completely useless against such viruses. Modern emulation-based heuristics16 have a chance to detect such viruses because the decryptor of the virus is often represented at or near the entry point. Therefore, the decryptor will often be reached via emulation.

Note

Complete control of the scanning engine and emulator is mandatory to detect difficult polymorphic viruses effectively. If the actual scanning is not data-driven (p-code interpretation or some sort of executable object as part of the database) so that standalone code must be updated for detection, the actual scanner will not meet expectations because it cannot be updated fast enough. In that case, the virus researcher is in great troubleand customers will be in trouble, too17.

A relatively new dynamic technique to detect polymorphic viruses is done by using code optimalization techniques18 in an attempt to reduce the polymorphic decryptor to a certain core set of instructions, removing the junk. This technique works very effectively with simple polymorphic viruses. Suppose that emulation is used to detect a virus that has many jump instructions inserted into its polymorphic decryptor. Suppose further that the jump instructions are the only "garbage" instructions in the polymorphic decryptor. The code flow of the decryptor can be optimized in each loop of the decryptor as the emulator executes it. Each jump that points to another jump is identified as nonessential in the code, so the jumps can be removed one by one. Table 11.1 demonstrates a pseudo-decryption loop that uses jump garbage instructions J1..Jn and essential instructions I1..In. A possible optimalization of such loops is the removal of each jump instruction that points to another jump.

Table 11.1. A Pseudo-Decryption Loop and Optimization Phases

Loop 1

Loop 2

Loop 3

Loop 4

Loop 5

L:

I1:

L:

I1

L:

I1

L:

I1

L:

I1

J L1

J L2

J L3

J L3

J L3

L1:

J L2

L2:

J L3

L3:

I2

L3:

I2

L3:

I2

L2:

J L3

L3:

I2

I3

I3

I3

L3:

I2

I3

J L4

J L5

J L6

I3

J L4

L4:

J L5

L5:

J L6

L6:

I4

J L4

L4:

J L5

L5:

J L6

L6:

I4

LOOP L

L4:

J L5

L5:

J L6

L6:

I4

LOOP L

L5:

J L6

L6:

I4

LOOP L

L6:

I4

LOOP L

LOOP L

Similarly, the code of other junk instructions that do not play a role in changing state can be removed from the code flow, which makes emulation of the polymorphic code faster and provides a profile of the decryptor for identification.

Of course, just like any other method, code optimalizationbased decryptor detection has its limitations and cannot be applied universally. For example, the complex polymorphic garbage of the MtE mutation engine (as discussed in Chapter 7) cannot be optimized effectively. Other problems appear when multiple encryptions are used on top of each other, dependent on one another.