Usage: translate.py [options] [file-in] [file-out] command [script]
Translate bytes according to a Python expression
Example: translate.py -o svchost.exe.dec svchost.exe 'byte ^ 0x10'
"byte" is the current byte in the file, 'byte ^ 0x10' does an X0R 0x10
Extra functions:
rol(byte, count)
ror(byte, count)
IFF(expression, valueTrue, valueFalse)
Sani1(byte)
Sani2(byte)
Variable "position" is an index into the input file, starting at 0
Source code put in the public domain by Didier Stevens, no Copyright
Use at your own risk
https://DidierStevens.com
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-o OUTPUT, --output=OUTPUT
Output file (default is stdout)
-s SCRIPT, --script=SCRIPT
Script with definitions to include
-f, --fullread Full read of the file
-r REGEX, --regex=REGEX
Regex to search input file for and apply function to
-R FILTERREGEX, --filterregex=FILTERREGEX
Regex to filter input file for and apply function to
-e EXECUTE, --execute=EXECUTE
Commands to execute
-2 SECONDBYTESTREAM, --secondbytestream=SECONDBYTESTREAM
Second bytestream
-m, --man print manual
Manual:
Translate.py is a Python script to perform bitwise operations on files
(like XOR, ROL/ROR, ...). You specify the bitwise operation to perform
as a Python expression, and pass it as a command-line argument.
translate.py malware -o malware.decoded "byte ^ 0x10"
This will read file malware, perform XOR 0x10 on each byte (this is,
expressed in Python: byte ^ 0x10), and write the result to file
malware.decoded.
byte is a variable containing the current byte from the input file.
Your expression has to evaluate to the modified byte. For complex
manipulation, you can define your own functions in a script file and
load this with translate.py, like this:
translate.py malware -o malware.decoded "Process(byte)" process.py
process.py must contain the definition of function Process. Function
Process must return the modified byte.
Another variable is also available: position. This variable contains
the position of the current byte in the input file, starting from 0.
If only part of the file has to be manipulated, while leaving the rest
unchanged, you can do it like this:
def Process(byte):
if position >= 0x10 and position < 0x20:
return byte ^ 0x10
else:
return byte
This example will perform an XOR 0x10 operation from the 17th byte
till the 32nd byte included. All other bytes remain unchanged.
Because Python has built-in shift operators (<< and >>) but no rotate
operators, I've defined 2 rotate functions that operate on a byte: rol
(rotate left) and ror (rotate right). They accept 2 arguments: the
byte to rotate and the number of bit positions to rotate. For example,
rol(0x01, 2) gives 0x04.
translate.py malware -o malware.decoded "rol(byte, 2)"
Another function I defined is IFF (the IF Function): IFF(expression,
valueTrue, valueFalse). This function allows you to write conditional
code without an if statement. When expression evaluates to True, IFF
returns valueTrue, otherwise it returns valueFalse.
And yet 2 other functions I defined are Sani1 and Sani2. They can help
you with input/output sanitization: Sani1 accepts a byte as input and
returns the same byte, except if it is a control character. All
control characters (except VT, LF and CR) are replaced by a space
character (0x20). Sani2 is like Sani1, but sanitizes even more bytes:
it sanitizes control characters like Sani1, and also all bytes equal
to 0x80 and higher.
translate.py malware -o malware.decoded "IFF(position >= 0x10 and
position < 0x20, byte ^ 0x10, byte)"
By default this program translates individual bytes via the provided
Python expression. With option -f (fullread), translate.py reads the
input file as one byte sequence and passes it to the function
specified by the expression. This function needs to take one string as
an argument and return one string (the translated file).
Option -r (regex) uses a regular expression to search through the file
and then calls the provided function with a match argument for each
matched string. The return value of the function (a string) is used to
replace the matched string.
Option -R (filterregex) is similar to option -r (regex), except that
it does not operate on the complete file, but on the file filtered for
the regex.
Here are 2 examples with a regex. The input file (test-ah.txt)
contains the following: 1234&H41&H42&H43&H444321
The first command will search for strings &Hxx and replace them with
the character represented in ASCII by hexadecimal number xx:
translate.py -r "&H(..)" test-ah.txt "lambda m: chr(int(m.groups()[0],
16))"
Output: 1234ABCD4321
The second command is exactly the same as the first command, except
that it uses option -R in stead or -r:
translate.py -R "&H(..)" test-ah.txt "lambda m: chr(int(m.groups()[0],
16))"
Output: ABCD
Option -e (execute) is used to execute Python commands before the
command is executed. This can, for example, be used to import modules.
Here is an example to decompress a Flash file (.swf):
translate.py -f -e "import zlib" sample.swf "lambda b:
zlib.decompress(b[8:])"
A second file can be used as input with option -2. The value of the
current byte of the second input file is stored in variable byte2
(this too advances byte per byte together with the primary input
file).
Example:
translate.py -2 #021230 #Scbpbt "byte + byte2 - 0x30"
Output:
Secret
In stead of using an input filename, the content can also be passed in
the argument. To achieve this, precede the text with character #.
If the text to pass via the argument contains control characters or
non-printable characters, hexadecimal (#h#) or base64 (#b#) can be
used.
Example:
translate.py #h#89B5B4AEFDB4AEFDBCFDAEB8BEAFB8A9FC "byte ^0xDD"
Output:
This is a secret!

You’ll be surprised by the gain in performance: about 10%
Translating a 3MB file with the original ROL (rolling 4 bits) takes 168 seconds, translating the same file with the faster ROL takes 155 seconds.

There is a huge overhead in the translation of each byte by the eval function:
outbyte = eval(command)

For every byte, Python has to parse, compile and execute the command. Parsing and compiling takes much more time than the loop in the original ROL command. This is _very_ ineffecient, but _very_ flexible. You can provide your own Python expression without having to edit the translate program.

I used a loop in the ROL and ROR commands for didactic reasons. Manipulating bits is very foreign for most people, even programmers. I believe my version is more readable and understandable, and thus extendable by other people.

But you’re right, removing inner loops adds to the performance. But in this specific case, most CPU cycles go to the eval function, and not to the loop.

Anyways, thanks for your comment, I’ll have to think about how to include your code. Maybe I can leave the original ROL and use your code for the ROR 😉

I think the posting process somehow managed to steal some of my text (especially since the first function is copy&pasted from your post), but yes, that’s what I wanted to write. 🙂

And _especially_ for didactic reasons I think the code should be as good as possible, since other people are learning from it. The folks who don’t understand bit operations should probably stay away from decryption & malware analysis altogether… might do more harm than good. 😉

As for the optimality of the rest of the code, I’ve only skimmed it I’m affraid. I was actually looking for an efficient way of doing ROL/ROR in .py, and that’s how I stumbled over your code. I have plenty of experience with Python, and by accident I work in the malware anlysis industry myself. 🙂 Getting back to the efficiency issue, I’m probably going to write a ROL/ROR module in C/asm to make it efficient enough. That code might even be worth including…