Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Frankenstein. stitching malware from benign binaries

1.
Frankenstein: Stitching Malware from Benign Binaries Vishwath Mohan and Kevin W. Hamlen School of Electrical and Computer Science University of Texas at Dallas Abstract—This paper proposes a new self-camouﬂaging mal- defenses to classify them as suspicious. Subsequent, moreware propagation system, Frankenstein, that overcomes short- computationally expensive analyses can then be judiciouslycomings in the current generation of metamorphic malware. applied to these suspicious binaries to identify malware.Speciﬁcally, although mutants produced by current state-of-the-art metamorphic engines are diverse, they still contain many More advanced polymorphic techniques, such as polymor-characteristic binary features that reliably distinguish them from phic blending, try to overcome this weakness by modifyingbenign software. statistical information of binaries via byte padding or substi- Frankenstein forgoes the concept of a metamorphic engine tution [10]. However, the malware’s decryption routine (whichand instead creates mutants by stitching together instructions must remain unencrypted) is often sufﬁciently unique thatfrom non-malicious programs that have been classiﬁed as benignby local defenses. This makes it more difﬁcult for feature- it can be used as a signature to detect an entire family ofbased malware detectors to reliably use those byte sequences polymorphic malware. Semantic analysis techniques can there-as a signature to detect the malware. The instruction sequence fore single out and identify the unpacker to detect malwareharvesting process leverages recent advances in gadget discovery family members [11]. Virtualization-based obfuscators expressfor return-oriented programming. Preliminary tests show that malware as bytecode that is interpreted at runtime by a custommining just a few local programs is sufﬁcient to provide enoughgadgets to implement arbitrary functionality. VM. However, this shifts the obfuscation burden to concealing the (usually large) in-lined, custom VM. I. I NTRODUCTION Metamorphism is a more advanced approach to obfusca- The underground economy associated with malware has tion that, in lieu of encryption, replaces its malicious codegrown rapidly in the last few years. Recent studies demonstrate sequences with semantically equivalent code during propaga-that malware authors no longer need concern themselves with tion. This is accomplished using a metamorphic engine thatthe distribution of their creations to end user systems; they processes binary code and modiﬁes it to output a structurallycan leave that task to specialized pay-per-install services [1]. different but semantically identical copy. Since the mutationsIn such an environment, the primary concern of malware is all consist of purely non-encrypted, plaintext code, they tend toevading detection on infected machines as it carries out its exhibit statistical properties indistinguishable from other non-malicious task. encrypted, benign software. End-user machines are protected by real-time detection Simple metamorphic engines mutate by adding paddingsystems that rely heavily on static analysis. Static analysis (e.g., dead code), permuting registers, or inserting semantic no-is favored because it is faster and consumes fewer resources ops consisting of state-preserving instructions and loops. Morerelative to dynamic methods [2]–[6]. Resilience against static advanced engines additionally perform function reordering,analyses is therefore a high priority for malware obfuscation control ﬂow modiﬁcation or data structure modiﬁcation.technologies. Current metamorphic engines focus on achieving a high Oligormorphism, polymorphism, virtualization-based obfus- diversity of mutants in an effort to decrease the probabilitycation, and metamorphism are the main techniques used to that the mutants share any features that can serve as a basisevade static analyses. Oligomorphism uses simple invertible for signature-based detection. However, diversity does notoperations, such as XOR, to transform the malicious code necessarily lead to indistinguishability. For example, malwareand hide distinguishing features. The code is then recovered signatures that whitelist features (i.e., those that classify bi-by inverting the operation to deploy the obfuscated payload. naries as suspicious if they do not contain certain features)Polymorphism is an advancement of the same concept that actually become more effective as mutant diversity increases.encrypts most of the malicious code before propagation, Similarly, reverse-engineering current metamorphic enginesleaving only a decryption routine, which unpacks the malicious often reveals patterns that can be exploited to derive a suitablecode before execution. signature for detection. Both oligomorphism and polymorphism are statically de- Our system, Frankenstein, therefore adopts a different ap-tectable with high probability using statistical or semantic proach to metamorphism that is inspired by recent advancestechniques. Encrypting or otherwise transforming the code in return-oriented programming. Return-oriented programmingsigniﬁcantly changes statistical characteristics of the program, searches the address spaces of victim binaries for gadgets—such as byte frequency [7], [8] and entropy [9], prompting instruction sequences that end with the return instruction [12].

2.
Gadget Gadget Gadget Discovery Arrangement Assignment Code Generation Benign Semantic PE Template Code Injector Obfuscated Binaries Blueprint Copies Executable Synthesis Frankenstein Fig. 1. High-level architecture of FrankensteinPrevious work has shown that a sufﬁciently large code base II. D ESIGN(such as the standard C library libc), sufﬁces to ﬁnd a Turing-complete set of gadgets that can be used to exploit known Frankstein searches programs on the local machine forvulnerabilities [13]. Others have also shown that searching for gadgets, which it composes to form a semantically identicalgadgets is an automatable task [14]. copy of itself. The different components of Frankenstein are diagrammatically represented in Figure 1. Below, we describe We apply the idea of harvesting instructions to obfuscate our notion of a gadget, which differs from its usual deﬁni-malicious code. Rather than using a metamorphic engine to tion in the context of return-oriented programming, and thenmutate, we stitch together harvested code sequences from discuss each of the components in more detail.benign ﬁles on the infected system to create a semanticallyequivalent binary. By composing the new binary entirely outof byte sequences common to benign-classiﬁed binaries, the A. Gadgetsresulting mutants are less likely to match signatures that Our deﬁnition of a gadget is a more relaxed version of thatinclude both whitelisting and blacklisting of binary features. needed for return-oriented exploits [12], [14]. Return-oriented Our main contribution is a new method to obfuscate mal- programming, which is a form of control-ﬂow hijacking, reliesware that works by synthesizing copies entirely from byte on the ret instruction to transfer control from one sequencesequences that have already been classiﬁed as benign by local of instructions to the next. Thus, only sequences that end withdefenses. In doing so, we demonstrate a heretofore unrecog- a ret constitute viable gadgets. Since we statically stitchnized synergy between research on metamorphic obfuscation gadgets together, we are not bound by this constraint. Forand that on return-oriented programming. our purposes, a gadget is any sequence of bytes that are As a proof-of-concept, we present a toy implementation interpretable as valid x86 instructions.consisting of a binary obfuscator that generates stand-alone A second difference from return-oriented programming isx86 native code mutants from a speciﬁcation (described in that, as a purely static approach, we need not restrict theSection II), but that does not self-propagate. Experiments search to the address space of a currently running process.focus on obfuscating code whose size and functionality is Gadgets are therefore harvested from executable ﬁle images onrepresentative of the small, unencryptable portion of malware victim systems. Both these facts afford us a much larger pool(e.g., unpackers) rather than full malware payloads, to which of potential gadgets from which to construct mutants. This isalternative approaches are usually applicable (e.g., mimimor- advantageous since we would like each copy of Frankensteinphism [15]). Thus, we envision our approach as a complement to differ as much as possible from all other copies.to these alternatives rather than a replacement. Gadgets are categorized by their type (a semantic abstraction In Section II we present a high-level overview of Franken- of the kind of task they perform) and by a set of parametersstein and discuss its constituent components. Section III con- (instantiated with values that specialize the gadget to a par-tains the details of our prototype implementation, and Sec- ticular task). For example, the MovReg type represents anytion IV reports experimental results. Related work is discussed gadget that moves a value from one register to another. Allin Section V. Finally, Section VI summarizes and discusses MovReg gadgets have two parameters, InReg and OutReg,opportunities for future work. that represent the operation OutReg ← InReg.

3.
TABLE I G ADGET TYPES Gadget Type (t) Input ( ) Parameters (p) Semantic Deﬁnition NoOp — — No change to memory or registers DirectBranch Oﬀset — EIP ← EIP + Oﬀset DirectConditionalBranch Oﬀset cmp , Reg 1 , Reg 2 EIP ← EIP + Oﬀset if Reg 1 cmp Reg 2 LoadReg OutReg, InReg — OutReg ← InReg LoadConst OutReg, Value — OutReg ← Value LoadMemAddr OutReg, Addr — OutReg ← [Addr ] LoadMemReg OutReg, AddrReg Scale, Disp OutReg ← [AddrReg ∗ Scale + Disp] StoreMemAddr InReg, Addr — [Addr ] ← InReg StoreMemReg InReg, AddrReg Scale, Disp [AddrReg ∗ Scale + Disp] ← InReg Arithmetic OutReg, InReg 1 , InReg 2 aop OutReg ← InReg 1 aop InReg 2 TABLE II A complete set of gadget types, their associated parameters, E XAMPLES OF LOGICAL PREDICATESand the semantic task that each encodes is given in Table I.The symbols cmp and aop represent integer comparison Predicate Semantic Deﬁnition Suitable Gadgetsand modular arithmetic operations, respectively. The collectionis Turing-complete, and therefore sufﬁces to build arbitrary noop — NoOpcomputations. In contrast to gadget types for return-oriented move(L1 ,L2 ) L1 ← L2 All Loads/Stores add(L1 ,L2 ,L3 ) L1 ← L2 + L3 Arithmeticprogramming [14], our collection includes types for condi- sub(L1 ,L2 ,L3 ) L1 ← L2 − L3 Arithmetictional and non-conditional branches. These are unnecessary for jump(n, Why) Jump n blueprint steps DirectBranch,return-oriented programming since in that context every gadget if Why holds ConditionalBranchis reached via an appropriate return instruction injected intothe stack. To better resemble benign software, Frankenstein hypotenuse_squared :-uses more conventional control-ﬂows that include standard mov(L1, ’[0x401248]’),branching instructions. mov(L2, ’[0x40124C]’), Every gadget is also associated with a clobber list, which mul(L3, L1, L1), mul(L4, L2, L2) add(L5, L3, L4), mov(’EAX’, L5).represents secondary register and memory locations whosevalues the gadget modiﬁes. The clobber list is used to ﬁnd Fig. 2. A semantic blueprint to compute the square of a triangle’s hypotenusea sequence of gadgets that do not interfere with one another. The types deﬁned in the table are sufﬁcient to carry out ourinitial experiments, but future work should consider extending encode what is computed rather than how the computation isthe table to support obfuscation of more complex tasks. carried out. This in turn allows for a diverse set of gadgets to match a given portion of the semantic blueprint.B. Semantic Blueprint Figure 2 shows how predicates can be chained together Typical metamorphic malware recompiles itself from a byte- to form a clause. In the example, the memory locationscode intermediate language during propagation. Each mutant 0x401248 and 0x40124C contain the values of two sides ofcarries a freshly obfuscated intermediate form of itself for this a right-angled triangle, and the calculated length of the squarepurpose. (The intermediate form is data, which is easier to of the hypotenuse is stored in the EAX register. Variables L1–obfuscate than code.) In contrast, Frankenstein propagates by L5 can refer to memory locations or registers.re-synthesizing itself from a more abstract semantic blueprint. The level of abstraction (and with it the diversity of mutants)The semantic blueprint is a sequence of abstract machine can be tuned by adjusting the granularity of the predicates instates, where each step in the sequence is represented as a the blueprint. It is possible to create layers of predicates thatlogical predicate. Each predicate is a combination of an atomic can each be expressed as clauses of lower-layer predicates,term and zero or more locations. A location can be a speciﬁc each layer effectively abstracting higher-level operations. Forregister, memory address, immediate value, or a variable that example, two consecutive predicates that increment registerrefers to an arbitrary register or memory location. The jump r and then multiply it by 2 could be replaced by a singlepredicate instead has arguments consisting of a relative offset predicate that computes 2(r+1). The resulting predicate wouldinto the blueprint’s list of states and an optional condition. be satisﬁed by new gadget sequences, such as one that ﬁrst A subset of logical predicates used by Frankenstein is shown multiples by 2 and then adds 2. The trade-off is the greaterin Table II. For example, the move predicate is an abstraction search time required to discover implementations of moreof the movement of a value from one location L2 to another abstract gadgets. If the predicates are too abstract, the searchL1 . Thus, depending on the values of its locations, a move becomes intractable.predicate might be satisﬁable by any of the Load* or Store* Since gadget discovery is based on search, our proof-gadget types. The ﬂexibility of a predicate to match multiple of-concept implementation expresses semantic blueprints asgadget types allows the semantic blueprint to more abstractly predicates written a logic programming language (Prolog).

4.
Algorithm 1 Gadget discovery each register’s initial content is encoded as a fresh symbol inInput: σ0 (initial symbolic machine state), and the initial abstract state: σ(eax ) = EAX , and so on. [i1 , . . . , in ] (instruction sequence) After composing the effects of all instructions in a candidateOutput: G ⊆ T × Φ (matching gadget types) sequence, the ﬁnal symbolic output state is uniﬁed with each for j = 1 to n do possible gadget type t ∈ T . A gadget type t is conceptually a σj ← E[[ij ]]σj−1 state predicate, possibly containing uninstantiated parameters end for p. Uniﬁcation U(t, σ ) succeeds if there exists an instantiation G←∅ φ : Φ = p → ParamVals of the parameters such that sub- for all t ∈ T do stituting t according to φ yields a concrete predicate satisﬁed if U(t, σn ) is deﬁned then by symbolic state σ . In that case the uniﬁcation returns the φ ← U(t, σn ) parameter instantiation φ. Otherwise U (t, σ ) is undeﬁned G ← G ∪ {(t, φ)} (and the search continues). Two instruction sequences that end if match a particular gadget type are considered equivalent if end for they have identical instantiations of all parameters, excluding return G the clobber list. It is common for a large instruction sequence to be rec- ognized as multiple valid gadget types when considering itsWhile a full Prolog search engine is obviously too heavy- effect on different machine state variables. For example, theweight for inclusion in real malware, effective gadget search instruction sequencedoes not require the full capabilities of logic programming. mov ebx, dword ptr [eax*4 + 0xc]We expect a combination of uniﬁcation and simple depth- mov ecx, eaxﬁrst search to sufﬁce, and consequently believe that a much inc ecxslimmer implementation is possible. can be used as a LoadReg gadget representing ecx ← ebx , aC. Gadget Discovery loadMemReg gadget representing ebx ← [eax ∗ 4 + 0xc], or as an Arithmetic gadget representing ecx ← ecx + 1. In each The majority of the obfuscation process involves ﬁnding case, the clobber list includes all other state variables that area suitable set of gadgets that can be used to implement the modiﬁed. We also include additional constraints for sequencessemantic blueprint. The search process proceeds similarly to where memory indirection is involved. In our example gadgetgadget searches for return-oriented programming [14], but above, the value of eax ∗ 4 must be a valid memory addresswith several variations reﬂecting our different focus (obfusca- to ensure that it does not cause a crash when executed. Thesetion as opposed to ﬁnding one viable sequence from a limited constraints are expressed in the form of logical clauses as partcode base). of the arrangement layer (described below), which ensures that In the discovery phase, Frankenstein searches the local ﬁle we only ﬁnd valid solutions.system for binaries. From our experiments, our experiencehas been that 2–3 binaries from the system32 folder sufﬁces D. Gadget Arrangementto provide a code base from which to harvest a diverse, The next step involves ﬁnding a suitable combination ofTuring-complete set of gadgets on Microsoft Windows sys- gadget types that match the semantic blueprint. In Franken-tems. Frankenstein starts by collecting byte sequences from stein, gadget arrangement is a natural consequence of the waythe code sections of these binaries using a variable-length that the semantic blueprint is deﬁned. The logical predicatessliding window. The sliding window approach is simpler that we deﬁne in Table II also happen to be the lowest levelthan implementing (and obfuscating) a full disassembler, and in the layered approach to constructing predicates describedit increases the pool of available gadgets by including for previously. We call this the arrangement level because allconsideration the many misaligned instruction sequences that possible gadget arrangements are expressed in terms of theseall benign programs contain (but rarely execute). predicates. Each byte sequence is passed through an instruction decoder Given a clause deﬁned in terms of higher-level predicates,to produce an instruction sequence. Sequences containing logic programming can be used to reduce them to multipleinvalid op-codes or undesirable branches (such as calls or clauses composed entirely of arrangement layer predicates,returns) are discarded, and the remaining sequences are tested such that the deﬁnition of each clause in turn represents one orfor gadget viability using Algorithm 1. more potential gadget arrangements. We assume that malware Frankenstein performs gadget discovery with the aid of a authors have access to arbitrary high-level representations ofsmall abstract evaluator E : I → Σ → Σ that deﬁnes the effect the code, including requirements, design, and implementationof an instruction i ∈ I upon a symbolic machine state σ ∈ Σ. of the payload. They can therefore use this information toNotation E[[i]]σ denotes the resulting symbolic state, where express the malware in terms of the higher-level predicates.states σ : → e map locations (viz., registers, ﬂags, and At present, our prototype of Frankenstein does not havememory addresses) to symbolic expressions e. For example, support for higher level predicates, and expresses blueprints

5.
using predicates from the arrangement level only. However, branches. Even though this greatly reduces the number of gad-adding higher-level predicates is not conceptually difﬁcult, and gets available for incorporation into mutants, it neverthelesswe plan to include this feature in future versions of our system. sufﬁces to ﬁnd more than enough gadgets to implement our sample programs.E. Gadget Assignment The discovery module, implemented in Python, takes a set In the last phase, we use the discovered gadgets to ﬁnd of binaries and a semantic blueprint as input. It outputs a seriessatisﬁable assignments for each generated gadget arrangement. of Prolog predicates that deﬁne each discovered gadget, asWe leverage the uniﬁcation process of logic programming for well as a Prolog query that represents a viable combinationthis purpose, which is well-suited to this problem. Franken- of the gadget types speciﬁed in the arrangement. This isstein begins by converting each discovered gadget into an delivered to the assignment module, implemented in Prolog,extended version of one of the predicates deﬁned in Table II. which outputs all discovered solutions to the query. EachThe extension adds two terms to each predicate: a list of clob- solution is then converted into its equivalent byte sequencebered locations and an identiﬁcation number. The identiﬁcation by the executable synthesis module, implemented in Python,number associates each of these predicates with the instruction which injects the byte code into a template PE ﬁle. Forsequence that the gadget represents, while the clobber list ease of testing, the duplication module contains pre-fabricatedfacilitates discovery of sequences of non-interfering gadgets. templates for function prologues and epilogues, which are used Next, the predicates that form the deﬁnition of each of to modularize the synthesized byte sequence as a stand-alonethe reduced clauses obtained in the gadget arrangement phase function. We note that function prologues and epilogues couldabove are also extended to include variables that represent a easily be synthesized using the gadget discovery mechanismclobber list and identiﬁcation numbers. To each deﬁnition, we instead, if so desired.also add a generated list of constraints that prevent the param-eters of predicates from interfering with one another. Finally, IV. E XPERIMENTAL R ESULTSwe use constraint logic programming to solve each clause. The We tested our prototype by discovering gadgets in somefull set of solutions obtained represent all the possible gadget common Windows binaries. For our results, we only choseassignments that implement the original semantic blueprint. gadgets that contained 2–6 instructions. Our results are tabu- lated in Table III. We recorded the number of gadgets foundF. Executable Synthesis and time taken both with and without using the sliding window For each successful gadget assignment, Frankenstein masks protocol discussed in Section II. Surprisingly, we found thatall external calls in the code by converting them into computed using the sliding window protocol to discover misalignedjumps. As a result, Frankenstein’s mutants have no noteworthy sequences increased the gadget count by only 34% on averagesystem calls in their import address tables, concealing them but increased discovery time by 794%, a trade-off that does notfrom detectors that rely upon such features for ﬁngerprinting. seem worthwhile. We conjencture that increasing the number The last step is injecting the ﬁnished code into a correctly of instructions supported by the abstract evaluator will helpformatted binary so it can be propagated. Frankenstein has a balance these ratios somewhat, but that a better strategy isbinary parsing component and a seed binary that it uses as a likely to be one that searches for gadgets using a simple fall-template. For each mutant, it injects the code into the template through disassembly while increasing the number of binariesﬁle and updates all relevant metadata in the header. At this mined. All results we discuss hereafter are based on thepoint the new mutants are natively executable programs. numbers for the non-sliding window algorithm. The results show that even with the limited capacity of our III. I MPLEMENTATION prototype, 2–3 binaries are sufﬁcient to bring the number of To test the viability of our approach, we created a proto- gadgets above 100,000. On average we discovered about 46type stand-alone obfuscator that takes a gadget arrangement gadgets per KB of code, ﬁnding approximately 2338 gadgetsas input and produces a working portable executable (PE) per second.ﬁle as output. The prototype searches the local system for Next, we tested the prototype’s ability to synthesize workingprograms, mines them to discover gadgets, ﬁnds a suitable code. We chose two algorithms for our experiments: insertiongadget assignment, and realizes it as a PE ﬁle. The prototype sort and a loop that XORs an array of bytes using a one-was implemented in a combination of Python and Prolog. The time pad. Both programs contain operations commonly foundexperiments were performed on a quad-core virtual machine within the packers used by conventional malware. The seman-with 3 GB RAM running 64-bit Windows 7. The host ma- tic blueprints for these programs are shown in Figures 3 and 4.chine is an Intel i7 Q6500 quad-core laptop running 64-bit The semantic blueprints were reproduced as Prolog queries,Windows 7. with extensions to predicates and added constraints to ensure The gadget discovery, gadget assignment, and duplication non-interference between gadgets as detailed previously. Inphases implement the algorithms described in the previous both cases, only gadgets harvested from explorer.exesection. However, the abstract evaluator that analyzes and were used. The queries produced over 10,000 viable gadgetdiscovers gadgets currently supports only a limited sub- assignments each, with an average speed of 3 assignments perset of instructions—about 8 different instructions excluding second.

6.
TABLE III G ADGET DISCOVERY STATISTICS FOR SOME W INDOWS BINARIES Without Sliding Window With Sliding Window Binary Name File Size (KB) Gadgets Found Time Taken (s) Gadgets Found Time Taken (s) gcc.exe 1327 82885 29.70 97163 172.24 calc.exe 758 41914 22.09 60390 189.86 explorer.exe 2555 89617 40.31 127859 429.56 cmd.exe 295 17514 7.17 25008 88.34 notepad.exe 175 4512 1.82 6974 24.39 This high diversity can be attributed to multiple satisﬁable Input: sub-arrangements of gadgets, which can each be combined L1 = address of data, with every variation of all other sub-arrangements, leading to L2 = address of one-time pad, a combinatorially high number of unique overall arrangements. L3 = array length, Although this might appear to produce a large number of L4 = address of encrypted output similar variants, diversity can be ensured by harvesting gad- Blueprint: gets from different sets of binaries and additionally by only xor_encryption :- selecting assignments that have no gadgets in common with move(L5, 0), each other. jump(7, L5 = (L3-1)), To better understand the size increase induced by our move(L6, [L1+L5*4]), approach, we compared the sizes of 100 mutants generated move(L7, [L2+L5*4]), by Frankenstein for the one-time pad XOR algorithm against xor(L8, L6, L7), its corresponding compiler generated code. The compiled code move([L4+L5*4], L8), was generated with C++ using Visual Studio 2010 with basic add(L4, L4, 1), security checks turned on and optimization set to full. The add(L5, L5, 1), mean of the sizes of the generated mutants was 48 bytes jump(-7, always). compared to the 25 bytes produced by Visual Studio. The variance in size between the generated samples was 16. ThisFig. 3. Semantic blueprint for a simple XOR oligomorphism shows that the size of a mutant can be expected to be slightly less than double the size of its optimized compiler-generated version, an increase that we feel is an acceptable cost for the beneﬁt of obfuscation. Input: To assess the binary distribution of the generated mutants, L1 = address of array we generated 20 implementations of the XOR blueprint after L2 = length of array mining donor program explorer.exe for gadgets, and Blueprint: counted the number of n-grams that do not appear in the insertion_sort :- donor program and were shared by at least m mutants. Our move(L3, 1), results are tabulated in Table IV. Only about 20 such n-grams jump(14, L3 = L2), are common across 25% of our mutant population, and no n- move(L4, L3), grams are common across more than 35% of the population. move(L5, [L4*4+L1]), In addition, all the common n-grams are relatively short; no jump(8, L4 = 0), n-grams of length n ≥ 11 were shared. These are encouraging jump(7, [L1+(L4-1)*4] < L5), results because they indicate that few binary n-gram features sub(L4, L4, 1), are relevant for distinguishing malware instances from the move(L6, [L4*4+L1]), benign programs used for gadget harvesting. add(L4, L4, 1), Our experimental results are promising, and suggest that move([L4*4+L1], L6), developing a more comprehensive Frankenstein tool is a sub(L4, L4, 1), worthwhile endeavor. Speciﬁcally, we conjecture that a more jump(-7, always), comprehensive abstract evaluator that can analyze a greater move([L4*4+L1], L5), number of instructions can potentially ﬁnd far more gadgets, add(L3, L3, 1), and thus produce mutants that exhibit even greater diversity. jump(-13, always). V. R ELATED W ORK Gadget-based obfuscation is related to past and present Fig. 4. Semantic blueprint for insertion sort research in the areas of software security and compiler op- timization. Below we describe some related work.

7.
TABLE IV T HE NUMBER OF FRESH n- GRAMS SHARED BY AT LEAST m MUTANTS table of alternative sequences and then randomly choosing one from amongst them. This induces a degree of randomness with respect to generated code sequences, but does not ensure that mutant subset size (m out of 20) the generated sequences are vastly different, nor that they do n 3 4 5 6 7 not contain features widely recognized as malicious. 2 0 0 0 0 0 Frankenstein uses a more top-down approach. By starting 3 5 4 4 2 0 4 14 5 4 2 0 with a high-level representation of the payload logic and 5 19 8 4 2 0 searching benign ﬁles for viable gadgets, it implicitly com- 6 23 11 4 1 0 bines all 5 phases described above. This combination gives 7 26 12 3 1 0 8 26 9 1 1 0 it the ability to create mutants with a greater diversity than 9 24 9 1 1 0 standard bottom-up approaches. 10 23 9 1 1 0 11 0 0 0 0 0 total 160 (2.3%) 67 (1.0%) 22 (0.3%) 11 (0.2%) 0 (0%) C. Program Equivalence Reasoning about program equivalence arises in connection with translation validators and certifying compilers. A transla- tion validator shows that compiler optimizations are semantics-A. Return-Oriented Programming preserving by proving the semantic equivalence of the original As mentioned previously, Frankenstein borrows the idea of program and its compiler-optimized counterpart. Approachesgadgets from return-oriented programming (RoP). RoP is the include instrumenting the compiler [17], verifying a simulationlatest in the evolution of code injection attacks. Such attacks relationship between the two programs [18], and constructingﬁnd bytes within a binary’s address space that correspond value-graphs of the two programs and proving their syntactic(either intentionally or unintentionally) to a sequence of in- equivalence [19], [20].structions that perform a speciﬁc computation and end with Certifying compilers prove that object code respects thethe return instruction. Each such sequence forms a gadget. By semantics of the higher-level source code whence it wassearching through a binary for carefully chosen gadgets, it is generated. Most certifying compilers do not prove full programpossible to chain them together to perform arbitrary Turing- equivalence but instead reduce the complexity by consid-complete computations [12]. ering only a subset of veriﬁable properties, such as type- This chaining of gadgets is achieved by loading the stack safety [21], [22]. Certifying compilers output object code,with the starting address of each gadget in the chain and type speciﬁcations, and code annotations. The annotationstransferring control to the ﬁrst gadget. Every subsequent return and type speciﬁcations can then be fed into a certiﬁer whichinstruction then transfers control to the next gadget in the either outputs a proof of correctness or a counterexample thatchain. RoP is thus a form of control-ﬂow hijacking, and violates type safety.depends on a known exploit for a given binary in order to Our work differs from these related ﬁelds in that it doessmash its stack and ﬁll it with the appropriate addresses. not attempt to provide any formal evidence of semantic equiv-B. Metamorphic Engines alence for mutants. That is, although all mutants satisfy the abstract speciﬁcation whence they were generated, the mutator Metamorphic malware changes the structure of its payload is under no obligation to provide any evidence of semanticwith each generation to evade discovery. Metamorphic engines preservation or equivalence. Thus, there is neither validationtypically do this using a bottom-up approach: Starting with a nor certiﬁcation. Frankenstein does, however, leverage manydisassembler to recover assembly code for the payload, they theoretical foundations underlying this past research, includingperform a series of obfuscation phases, followed by application pre- and post-conditions for semantic blueprint speciﬁcation,of an assembler to generate mutated native code. Most engines abstract (symbolic) interpretation for gadget discovery, and ab-use a combination of the following ﬁve phases to obfuscate stract machine semantics for gadget analysis and arrangement.their payloads [16]: Garbage insertion adds unreachable codeto the original code. Code substitution replaces opcodes with D. Superoptimizing Compilersfunctionally equivalent but structurally different opcodes. Codeinsertion in-lines semantically ineffectual code sequences or Superoptimization refers to the transformation of a loop-freeharmless computations. Register swapping reallocates regis- code sequence into the most optimal set of assembly-level in-ters, and control ﬂow scrambling adds jumps and reorders structions. Optimality in this context is decided by the speed offunction calls. the generated sequence, and hence superoptimizing compilers Frankenstein’s gadget-based obfuscation is a more prin- attempt to ﬁnd the fastest sequence of assembly instructionscipled approach to metamorphism. It both widens the pool that are equivalent to the input code. Such compilers use aof possible mutations for greater diversity and tailors its lookup table populated with parametrized replacement rules tomutations to local defenses for more targeted attacks. For ex- perform their optimizations. The lookup table can be generatedample, code substitution in a metamorphic engine is typically manually as is the case with peephole optimizers, or generatedperformed by comparing instruction opcodes against a ﬁxed automatically based on a training set of binaries [23].