July 30, 2014

This is my list of practical suggestions to people developing a pintool. Since I dealt with these previously I thought to jot them down to help others. By applying this you should be somewhat closer to avoid your pintool from unexpected termination.

Start from scratch. So you use a sample pintool to develop your own. Rather than to modify the sample, start with an empty project and gradually build it up by taking elements from the sample.

Simplicity. Keep the code-base small and easy to understand.

Testing. As a part of development, aim to test if all blocks have been exercised. Refrain from adding unreachable blocks.

Errors. Check for errors as early as possible, specially when returning from a Pin API.

Safe memory dereference. Whenever you have to dereference the target's memory use PIN_SafeCopy. If you want to read an integer you should use this function, too, rather than the dereference "*" operator.

Thread safety. Be aware the target may be running with multiple threads. Possibly, you want your pintool to be thread safe.

Multi-threading. Sometimes you want your variables to be stored in the thread context to have the ability to distinguish the analysis between threads. In that case looking at the sample inscount_tls.cpp is a good start.

Probe mode. Use of probe mode is always preferred as it gives better performance. However, only limited Pin facilities available in probe mode.

Limit instrumentation. Consider restricting the instrumentation to routines or libraries and even can avoid the instrumentation of shared libraries to get better performance.

Standard library. It's good idea to use C++ standard library in a pintool as it provides the most frequently needed data structures.

Visual Studio. Visual C++ project file is available with Pin framework in MyPinTool folder. Alternatively, you can create one for yourself after looking at an earlier post.

Trace vs Ins. Instruction instrumentation is practically the same as trace instrumentation. You can do instruction analysis from the trace by iterating through the instructions.

Output. Having output routines in Fini makes the application to run faster than having them in analysis functions. However if the application terminates unexpectedly and so Fini is not called there will be no results shown. By having output routines in analysis functions makes the application to run slower but if the application terminates unexpectedly partial results may be shown.

July 26, 2014

SAR stands for Shift Arithmetic Right and the instruction performs arithmetic shift. The instruction preserves the sign of the value to be shifted and so the vacant bits are filled according to the sign-bit.

Compilers generate SAR instruction when right shift operator ">>" is used on a signed integer.

The use of SAR instruction can potentially lead to create a signedness bug if it's assumed the shift is unsigned.

Given the following simplified example.

char retItem(char* arr, int value){ return arr[value>>24];}

If value is positive the code is working as expected. However if value is negative the program can read out of the bounds of arr.

Other example would be to compare the signed value after the shift to an unsigned value leading to implicit conversion that may lead to trigger bug.

In my experiment, in several cases, it is seen that memory is being dereferenced involving SAR instruction. These places may be worthy to look for bugs, specially if the value to be shifted is a user input or is a controlled one.

If an unsigned jump is followed by a signed shift that could be a potential to look for bugs as well.

Regular expressions or scripts can be used to search for patterns of occurrences of SAR instructions. When it's not feasible to review all occurrences of SAR, a pintool may be used to highlight what SAR instructions have been executed, and only focus on those executed.

July 22, 2014

Earlier this year a post was published of examining data format without using the program that reads the format. That post discusses patterns to look for, in order to identify certain constructs. This post focuses on static methods of examining code that can be either the complete code section of the file, memory dump, or just fragment. It also describes selected ideas what patterns to look for when examining a given code.

The reason one may look for patterns in code is to locate certain functionalities or to get high-level understanding of what the code does. Others may look for certain construct that may be the key part of the program in security point of view.

It's true to say one can expect this to be a rapid method compared to other methods such as line-by-line instruction analysis.

But, it's always good to read documentation, if possible at all, to get an overview of the expectations.

There are methods that more effective if performed on small region. Therefore to narrow the scope of the search wherein to look for pattern is something good to do at the beginning of analysis albeit it's not always feasible to do with enough certainty. Anyway, one can always widen the search region if required at a later stage.

Compilers tend to produce executable files with particular layout. Some have the library code at the beginning of the code section, while others have it at the end of the code section.

If there is no information about the compiler or no information about the layout there are other ways to locate the library in the code.

You may look for library function calls that can be visible in disassembler. Library code may have distinct color in disassembler.

Library/runtime code often have many implementations of functions to use the advantage of latest hardware. An example is MSVC. And so SSE instructions/functions may indicate the presence of library/runtime code.

Library code can be spotted by looking for strings can be associated with particular libraries.

Library/runtime code can be spotted by looking for constant values that can be associated with particular libraries such as cryptographic libraries that tend to have many constants.

To guess the compiler that was used to generate the code is possible by analyzing the library/runtime code.

In case the code is just a fragment of user code you may consider examining the instructions how they are encoded. Intel encodings are redundant and one instruction can have multiple encodings. This is something to make guess on what compiler was used.

If multiple encodings of an instruction is found in a binary the code that could be generated with a polymorphic encoder.

Also, code has other characteristics that may differ between compilers such as padding and stack allocation.

Imports and exports as well as strings can tell a lot. You may check where they are referenced in the code.

Debugging symbols can help awfully lot if the disassembler can handle that. Sometimes it's available sometimes it's not.

No matter what code you're looking at it most likely deals with input data. That case it may get the data from file, from network, via standard API calls. These are valuable areas to audit for security problems, and it's possible to follow how the data returned by these APIs. It may require to analyze caller functions as usually these APIs are wrapped around many calls before using the input.

Just like when reading the data the code may write data, or send data via standard API calls. These areas may be security-sensitive.

Programs have centralized, well-established functions. These functions, for example, read dword values, read data into structures and propagate any other internal storage. Discovery of these functions not considered hard, they are normally small, and have instructions of memory read and write. By looking where they referenced from we can find good attack surfaces.

Good to keep in mind that code sections can contain data besides code. But normally data is stored in data section. In the disassembler it's convenient to see how the data is referenced, and may decide if there is an attack surface nearby.

CRC and hash constants may indicate there is some data which is being CRC'd or hashed. You may figure out where is that data from and how can you perform security testing around.

When a library is using a parameter hardcoded it's often encoded as a part of the instruction rather than stored in data section of the executable. Example encoding looks like mov eax, <param> or mov al, <param>.

When a data format is parsed often a magic value is tested. Looking for instructions like cmp reg, <magic> or cmp dword ptr [addr], <magic> or similar instructions can help to locate attack surfaces.

Longer strings may be broken into immediate values and compared with multiple cmp instructions.

Looking for strcmp function calls is good idea to look for if you want to find code that test for data format as often strcmp functions are used for this purpose.

If the code is optimized for speed there are many ways to confirm. Normally the readability of code bad, for example when the code performs division or use the same memory address for multiple variables. If EBP register is used in arithmetic or other than to store stack base address that could indicate the code is optimized.

Perhaps there are circumstances when looking at the frequency of instructions, looking for undocumented instructions, or rare instruction, or instructions that not present can give us valuable clues that help the examination.

Intuitively going through the code and looking for undefined patterns can be good idea if the scientific ways have been exhausted.

July 14, 2014

Few months ago I encountered a bug when a fuzzed flash file is being rendered by Flash Player in Firefox. This bug can be reached only in the non-default configuration described below so very unlikely you are affected by this bug.

To trigger the bug the flash player module has to be loaded into Firefox's virtual address space. And this can be achieved if Flash Player protected mode is disabled and Firefox plugin container process is disabled too.

I had reported this bug to Adobe and they opened a case PSIRT-2707 on 14/April/2014 but so far Adobe didn't confirm whether or not it was able to reproduce the bug or the exception state reported.

Again, the bug doesn't affect the default configuration, and so very unlikely you're affected by this. However, users using Firefox with plugin-container disabled as well as Flash Player plugin with protected mode disabled are affected by this issue.

The original report is about Flash Player 13_0_0_182 and Firefox 28.0 but the testcase fails with Flash Player 14_0_0_145 and Firefox 30.0 (latest available till today).

These are the steps to reproduce the bug.

Edit mms.cfg to have ProtectedMode=0 to disable protected mode in Flash Player

These settings above required to get Flash Player plugin loaded in firefox.exe's address space.

Start Firefox from command prompt opened previously

Open fuzzed.swf in Firefox (drag n drop should work)

Attach firefox.exe process to Windbg when you notice that Firefox is hanging

Exception should occur in few second. If you see the out-of-memory error in the debugger log without exception you may restart the browser and try again.

The fuzzed flash file has the following changes compared to the template file. The value of the first item in the integer pool has been changed to a large value. TagLength of DoAbc tag and FileSize of the main header have been therefore updated to maintain the integrity of the flash file.