Monday 7 October 2013

Finding Contained Files

It’s not a file carving problem. I had both files. I just needed to be sure that file A was contained inside file B.

With a hex editor I could find parts of file A inside file B, but it looked like file A was split up and scattered at different locations in file B.

I Googled a bit for a tool, but nothing came up, so I wrote my own Python program.

With my new tool I was able to get assured that file msi49.tmp was inside file c8400.msi:

You can see that file msi49.tmp is one contiguous sequence inside file c8400.msi starting at position 0x3A7200.

But I was more interested to know if file msi49.tmp was also inside file Cisco_Jabber.msi:

And you can see it is, but not as one contiguous sequence. It’s split in 3 sequences.

This tool can also be used to find a downloaded file inside a pcap/pcapng file. I downloaded AnalyzePESig_V0_0_0_2.zip while taking a Wireshark capture.

Or to find a file opened by an application. Here I look into the process dump:

The only limitation is that both files need to be read into memory. But when I’ve time, I’ll turn this into a plugin for the Volatility framework.

The program looks for sequences of at least 10 bytes long (this is an option). If your file is divided in sequences smaller than 10 bytes, then my program will not find the embedded file. Unless you lower the minimum length, but don’t go as low as 1 byte, because then you’re likely to be finding random data.

I’m not 100% sure that my program will find all possible cases of embedded files. No problem if it’s one contiguous sequence, or several sequences in logical order. But I’ve to review my algorithm to be sure it will also find all possible cases of embedded files with sequences in random order. I think it will, but I need to prove it.

@Richard. msi49.tmp is an UPX packed DLL file that is used each time a new profile is created. So each time a user logs on for the first time on a Windows machine, a profile needs to be created, and this dll will execute to configure Jabber.

This is something you see with other software too.

This DLL is not part of the Jabber application, that’s why I couldn’t find it in the deployment repository (cab file inside the msi file).

This DLL is used for the configuration, and as such is stored as a binary stream inside the msi file.

Thanks for the info. I’m familiar with msi files, I created them back in my life as a dev. They are essentially a file format using tables.

Files embedded in msi files are usually put inside a cab file.
But this was not the case here with the tmp file, the reason, I found out, is that this tmp file is not part of the installation, but is needed for the configuration. See previous comment.

Really a great tool. It would be useful to make it recursive, I mean add an option to search a file in a list of files, (actually I’m using a batch file, something like “for each file found in directories run find-file-in-file.py”)

Just asking as a beginner, if I understand it correctly, some of the malicious files such as PDF has embedded malicious files, normally we can view this in hexdump. However, I believe youre goal is to check more especially if the embedded file is mixed up. From your tool, If I only new a file (test.pdf) and I dont know the embedded file, can I just do the command “find-file-in-file.py test.pdf” and the embedded file name will be known using that command? I believe you also have a tool to embed a file (I just forgot the name).