December 28, 2011

Being said that the design of Graphical User Interface should not introduce new functionality in the application. The GUI and the functionality should be separated, and implemented on different levels which pretty much seems to be logical for me, but I'd add something important to this general claim.

There are people, including me, who are more of visual types. For me, visual things excite the imagination and when representing things visually, I can create new things from them that have some kind of value.

It happened to me numerous times to get new ideas involving awesome functionalities during the design of GUI. These ideas didn't come to my mind when I was focusing exclusively on functionality design.

Freedom inhibits creativity. When you're on the design of GUI, there are many restrictions (that you can see) and this stimulates the brain to create some new ideas.

I've imagined an application consists of multiple functions. Actually, it is a mix and improvement of some tools I wrote earlier. I'd put these small ideas into one application, by doing so, there is the possibility to connect those small functions to work together and would take the advantage of it. Also, I have some ideas about functionalities that I've seen in other applications, albeit that need change according to imagination.

The basic is as follows. The input is the (unknown) binary file. You can browse the file, can discover the structure of it, can display distribution diagrams and others. Also, can apply particular algorithms on data such as decoding compressed stream or display multimedia content.

December 24, 2011

I'm writing about how to find possible TIMESTAMP values in raw binary data.

Here is the first part of data format analysis by guessing but you can read this entry independently anyway.

TIMESTAMP is a DWORD value, represents Unix time, and we assume it's encoded as little-endian somewhere in the binary.

The prototype program reads the binary data into a byte array. It iterates through the array elements by reading four bytes at each position in the array, interprets as DWORD, and compares if the value falls into the date range we are after.

The looser date range you set the more likely to find false positives in your search result.

I've tested the code with a relatively strict date range, i.e. one month, and didn't find any false positives but did find TIMESTAMP. There were cases when multiple TIMESTAMPs found in PE files; for example one in the header and one in the .rdata section.

If you cannot set a loose date range, and there is a likelihood you came across to false positives you can filter out the results by applying some of the below.

You know more about the date e.g. it cannot be Saturday or Sunday?

It should be in a high entropy region of the dump, or on the contrary?

There might be other values stored around, do you know more about those values? Presence of scattered zeroes?

December 18, 2011

Usually, the most straightforward way to do an analysis of data format is to observe how the data is parsed by the application. However, there might be cases when the application parses the data is not available. In that case, the analysis migh be done by relying on the data itself and on general tools only.

In this post, I discuss three random ideas that might be helpful to explore unknown data formats without using the application parses the data.

To spot the presence of run-length encoding (RLE) you might look at string chunks in a relatively non-redundant (high entropy) data dump. String chunks might indicate that the original data contains string; also the string chunks in RLE dump might be found in the original data multiple times, likely, as a part of longer strings.

To spot the presence of IA32 code, you might look at instruction patterns such as CALL or JUMP in a relatively redundant data dump. The above instructions have bytecode of E8 and E9 followed by four bytes relative address that could also be included in the validation for more precise discovery.

Repetitive bytes could mean padding bytes, for example, to align a structure. Scattered bytes, usually 00s or FFs, could indicate the representation of positive or negative numbers. It could mean either pointer, or offset, or table of numbers, etc.

I call this process abductive reverse engineering. We don't have deductive evidence about the meaning of the structures in the data but we are able to form an explanatory hypothesis about their meaning.

The stream consists of records. The record consists of the size field followed by the data field.

The problem lies when the algorithm calculates the offset of record like below.

Offset of data plus size. The result of this operation could lead to overflow. If this fields are not properly sanitized, and the result points to previous record, the previous records will be processed again, and again...

I observed that multiple high-profile applications have this weakness during processing similar data structures could lead application hang.

It is possible to detect this algorithm weakness by code review but you certainly have an eye on what you are reviewing.

It is unlikely to detect this weakness by fuzz testing because the result of the overflow must point to previous record. If size is at least 32-bit wide you have a tiny chance to hit any exact result that would point to previous record.

December 3, 2011

Some time ago, you saw something you were impressed about it. That time, you didn't have a detailed understanding about the inner working of the thing but you found it interesting anyway.

These days, you need to accomplish something. You think this would be similar one to the thing you've seen before. You are trying to look after it on the Internet but you can't find any useful material. You ask your friends about it and they are able to provide something but those don't seem to be relevant for you.

You're looking into the technology to research more info. You find things that seem to be close to the one you've imagined however they include other approach that is not suitable to you. The good new is, at least, you've realized it is possible to accomplish something in a way even if it's not suitable.

More findings reveal that you can't get the thing in a way you've originally imagined. However, you know what you can get, and you know a way how to get that even though that way is not suitable to you. This point you know the question. So you have to look after the answer now.

You know if you do reverse engineering you'll find the answer but actually it looks to be a long process at first glance. You restrict the code by try to focus on the relevant areas and you realized that you can use symbols that makes the code human readable. The start seems to be tedious but you keep looking and exploring the code even if you don't exactly know what you're doing. You have finally found something you are after. It's a small but impressive finding.

You realized to understand much more about the technology by this small finding. You're more confident now. You have an idea how similar technologies are designed.