ΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝ
ADVANCED POLYMORPHISM
PRIMER
PART THE FIRST
ΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝ
By Dark Angel
Phalcon/Skism
ΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝ
With the recent proliferation of virus encryption "engines," I was
inspired to write my own. In a few short weeks, I was able to construct one
such routine which can hold its own. A polymorphic encryption routine is
nothing more than a complex code generator. Writing such a routine, while
not incredibly difficult, requires careful planning and perhaps more than a
few false starts.
The utility of true polymorphism is, by now, an accepted fact.
Scanning for the majority of viruses is a trivial task, involving merely the
identification of a specific pattern of bytes in executable files. This
approach is quick and may be used to detect nearly all known viruses.
However, polymorphism throws a monkey wrench into the works. Polymorphic
viruses encode each copy of the virus with a different decryption routine.
Since (theoretically) no bytes remain constant in each generated decryption
routine, virus detectors cannot rely on a simple pattern match to locate
these viruses. Instead, they are forced to use an algorithmic appproach
susceptible to "false positives," misleading reports of the existence of the
virus where it is not truly present. Creating a reliable algorithm to
detect the polymorphic routine takes far more effort than isolating a usable
scan string. Additionally, if a virus detector fails to find even one
instance of the virus, then that single instance will remain undetected and
spawn many more generations of the virus. Survival, of course, is the
ultimate goal of the virus.
Before attempting to write a polymorphic routine, it is necessary to
obtain a manual detailing the 80x86 instruction set. Without bit-level
manipulation of the opcodes, any polymorphic routine will be of limited
scope. The nice rigid structure of the 80x86 instruction set will be
readily apparent after a simple perusal of the opcodes. Exploitation of
this structured instruction set allows for the compact code generation
routines which lie at the heart of every significant polymorphic routine.
After examining the structure of the opcodes, the basic organisation of
the polymorphic routine should be laid out. Here, an understanding of the
basics behind such routines is required. The traditional approach treats
the decryption routine as a simple executable string, such as
"BB1301B900022E8137123483C302E2F6." A true (advanced) polymorphic routine,
by contrast, views the decryption routine as a conceptual algorithm, such
as, "Set up a 'pointer' register, that is, the register whose contents hold
a pointer to the memory to be decrypted. Set up a counter register. Use
the pointer register to decrypt one byte. Update the pointer register.
Decrement the count register, looping if it is not zero." Two routines
which fit this algorithm follow:
Sample Encryption 1
------ ---------- -
mov bx,offset startencrypt ; here, bx is the 'pointer' register
mov cx,viruslength / 2 ; and cx holds the # of iterations
decrypt_loop:
xor word ptr [bx],12h ; decrypt one word at a time
inc bx ; update the pointer register to
inc bx ; point to the next word
loop decrypt_loop ; and continue the decryption
startencrypt:
Sample Encryption 2
------ ---------- -
start:
mov bx,viruslength ; now bx holds the decryption length
mov bp,offset start ; bp is the 'pointer' register
decrypt_loop:
add byte ptr [bp+0Ch],33h ; bp+0Ch -> memory location to be
; decrypted at each iteration
inc bp ; update the pointer register
dec bx ; and the count register
jnz decrypt_loop ; loop if still more to decrypt
The number of possibilities is essentially infinite. Naturally,
treating the decryption as an algorithm rather than as an executable string
greatly increases the flexibility in creating the actual routine. Various
portions of the decryption algorithm may be tinkered with, allowing for
further variations. Using the example above, one possible variation is to
swap the order of the setup of the registers, i.e.
mov cx,viruslength
mov bx,offset startencrypt
in lieu of
mov bx,offset startencrypt
mov cx,viruslength
It is up to the individual to decide upon the specific variations which
should be included in the polymorphic routine. Depending upon the nature of
the variations and the structure of the polymorphic routine, each increase
in power may be accompanied with only a minimal sacrifice in code length.
The goal is for the routine to be capable of generating the greatest number
of variations in the least amount of code. It is therefore desirable to
write the polymorphic routine in a manner such that additional variations
may be easily accommodated. Modularity is helpful in this respect, as the
modest overhead is rapidly offset by substantial space savings.
The first step most polymorphic routines undergo is the determination
of the precise variation which is to be encoded. For example, a polymorphic
routine may decide that the decryption routine is to use word-length xor
encryption with bx as the pointer register, dx as a container for the
encryption value, and cx as the counter register. Once this information is
known, the routine should be able to calculate the initial value of each
variable. For example, if cx is the counter register for a byte-length
encryption, then it should hold the virus length. To increase variability,
the length of the encryption can be increased by a small, random amount.
Note that some variables, in particular the pointer register, may not be
known before encoding the rest of the routine. This detail is discussed
below.
Of course, selecting the variables and registers will not in and of
itself yield a valid decryption routine; the polymorphic routine must also
encode the actual instructions to perform the job! The cheesiest
polymorphic routines encode a single "mov" instruction for the assignment of
a value to a register. The more complex routines encode a series of
instructions which are functionally equivalent to the simple three byte
"mov" statement yet far different in form. For example,
mov ax, 808h
could be replaced with
mov ax, 303h ; ax = 303h
mov bx, 101h ; bx = 101h
add ax, bx ; ax = 404h
shl ax, 1 ; ax = 808h
Recall that the registers should be encoded in a random order. The
counter variable, for example, should not always be the first to be encoded.
Predictability, the bane of polymorphic routines, must be avoided at all
costs.
After the registers are encoded, the actual decryption loop should then
be encoded. The loop can perform a number of actions, the most significant
of which should be to manipulate the memory location, i.e. the actual
decryption instruction, and to update the pointer register, if necessary.
Finally, the loop instruction itself should be encoded. This can take many
forms, including "loop," "loopnz," "jnz," etc. Possible variations include
altering the decryption value register and the counter register during each
iteration.
This is the general pattern of encoding. By placing garbling, or "do-
nothing," instructions between the essential pieces of code, further
variability may be ensured. These instructions may take many forms. If the
encoding routines are well-designed, the garbler can take advantage of the
pre-existing code to generate null instructions, such as assignments to
unused registers.
Once the decryption routine has been written, it is necessary to
encrypt the virus code. The traditional approach gives the polymorphic
routine the job of encrypting the code. The polymorphic routine should
therefore "remember" how the precise variation used by the decryptor and
adjust the encryption routine in a complementary fashion. An alternate
approach is for the polymorphic routine to simultaneously encode both the
encryption and decryption routines. Although it adds overhead to the code,
it is an extremely flexible approach that easily accommodates variations
which may be later introduced into the polymorphic routine.
Variable-length decryptors come at a significant trade-off; the exact
start of the decryption cannot be known before encoding the decryptor.
There are two approaches to working around this limitation. The first is to
encode the pointer register in a single instruction, i.e. mov bx,185h and to
patch the initial value once it is known. This is simplistic, though
undesirable, as it decreases the variability of the routine. An alternate
approach is to encode the encryption instruction in the form xor word ptr
[bx+185h], cx (as in Sample Encryption 2, above) instead of xor word ptr
[bx], cx (as in Sample Encryption 1). This increases the flexibility of the
routine, as the initial value of the pointer register need not be any fixed
value; correct decryption may be assured by adjusting the offset in the
decryption instruction. It is then possible to encode the pointer register
with multiple instructions, increasing flexibility. However, using either
method alone increases the predictability of the generated code. A better
approach would be to incorporate both methods into a single polymorphic
routine and randomly selecting one during each run.
As an example of a polymorphic routine, I present DAME, Dark Angel's
Multiple Encryptor and a simple virus which utilises it. They appear in the
following article. DAME uses a variety of powerful techniques to achieve
full polymorphism. Additionally, it is easy to enhance; both the encoding
routines and the garblers can be extended algorithmically with minimal
effort. In the next issue, I will thoroughly comment and explain the
various parts of DAME.