Slicing, Dicing and Splicing Hex in the interests of forensics and data recovery.

Sunday, December 31, 2017

Practical Exercise - Image Carving

So who's ready to carve?

Or as Gordon would say " Let's Carve or F#!K OFF "

In the last post we talked about some simple carving of a JPEG image file using a hex editor.

Before we get to carried away we should practice a couple of simple carving of images from 'unallocated'. What do I mean by 'unallocated' I hear you ask well...

There are a couple of approaches to carving and recovering files from file systems.

Firstly is the "File System" approach. That is, we use the fileystem's knowledge of where the deleted file was to begin our journey of recovery.

For example, when a file is deleted in a FAT32 filesystem, the directory entry has the first byte of the entry overwritten with 'E5'. The directory entry still contains; the filename (minus the first character), the filesize and the first cluster number. These can be vital to assist in the recovery process.

For a valid file we could look up the cluster number in the FAT table and find all the fragments as each FAT entry points to the next cluster number.

However when a file is deleted the FAT table entries are zeroed so we cannot trace the file fragments. We will go through a worked example of this later.

The second technique for file recovery is to ignore the filesystem and treat the disk as one big block of data. We can either do this on the whole disk image or we can just export the unallocated portion of the disk. We can then use our knowledge of what type of file we are trying to recover to attempt to find the file/s in question.

Carve 1

Select the beginning of block at the the start of the JPEG at 0x1258. Now we search for the end of the file with the hex FFD9.

The D9 of FFD9 is at end of the file is at offset 0x2DE4. We select this as the end of the block. Copy the block out to a new file. In the filename I like to include 3 things, the file I am carving from, the start and end offset. So lets call it Carve1_1258_2DE4.jpg and wallah...

Carve 3

For those who like to common hexinate files it looks like an OLE Compound File (OLECF) that is used in Word, Powerpoint, Excel from 1997-2003. They have a distinct 8 byte header D0CF11E0A1B11AE1. For more info have a look at http://www.forensicswiki.org/wiki/OLE_Compound_File

So this example looks like there is another file in the unallocated space. But we will concentrate on the JPEG we are searching for. So we search for FFD8FFE0 as before.

Interesting to note that it is on a nice byte boundary of 0x2000 ie 8192 bytes or 16 sectors of 512 bytes. This will be important later but let's move on to carving the JPEG. Search for FFD9. We find it at 0x4424. We save it as Carve3_2000_4424.jpg.

Carve3_2000_4424.jpg

Huh, this doesn't seem right. The first part looks like the devilishly handsome you know who!! But what happened to the rest. So let's look back at our file we carved out. If we scroll up from the bottom we see some weird stuff. We see some references to a directory structure "theme/theme/themeManager.xml" ...

That stuff should not be in our JPEG. So here is our Aha moment... no not 'Take on me' Aha more like a 'that's interesting' Aha.

Aha - Take On Me (1985)https://www.youtube.com/watch?v=djV11Xbc914

We saw the first part of unallocated was an OLE file then we found our JPEG but it looks like maybe we have some of the OLE file mixed in our JPEG causing it to not decode properly.

So now what could be happening. Perhaps FRAGMENTATION!!!.

What is this fragmentation sorcery you speak of?

Well let's back up a bit first.

So for a new filesystem out of the box, we have a nice clean storage device. A new file would be stored in sequential blocks on a disk. We store a file in logical blocks called clusters. Each cluster is made up as of a number of the smallest traditional Hard Disk units called a sectors (512 bytes). The cluster is an arbitrary unit and is the smallest addressable unit the operating system can address. For example in a FAT32 filesystem a cluster may be 4 sectors (2048 bytes) or 8 sectors (8192 bytes) etc.

So why isn't this fixed?

Well mainly for a reason of a trade off. If the cluster size is too big we can waste a lot of space. e.g. if our cluster is 32kBytes and our file is 100 bytes we are wasting nearly 32kBytes (slack space). But, if we make each cluster really small say, 1 sector, we run out of the maximum storage space pretty quickly as the size of the table to address all these sectors (FAT) becomes almost as big percentage of our storage e.g. a 2TB disk using a 1 cluster/sector would need 16GB of FAT to store all sectors addresses and there are 2 FATs on the disk for redundancy

So when we have many files and we delete some, create some new files, delete some more file our disk becomes fragmented. So when we go to save a file we have lots of gaps in our disk from the files that have been deleted and the operating system would like to reuse them. The FAT file system will store the sequential cluster number for each file e.g. 202,203,207,412,902 could be the non-sequential cluster numbers for a 5 cluster file. This is fine for an allocated file but what happens when the file is deleted. The directory entry has the first byte overwritten with E5, it also stores the first cluster number but the FAT entry is overwritten with zeros.

This is OK for a deleted file that has sequential cluster numbers but for a typical file with non-sequential cluster numbers we are.... well... stuffed!

The things we use for our advantage is to know the cluster size and the type of file we are searching for. The cluster size is good as we only need to look at the boundary of clusters for the file we are searching for. The file type is useful as we know what we are looking at. A text file or a ZIP file look very different in hex.

Now back to our file. If we have a look at the highlighted section in our carved file, we remember that our OLE file was 0x2000 bytes long, that could be a clue for our cluster size 0x2000 is 8192 bytes or 16 sectors. This is a good clue that our cluster size of 16 or fraction of this maybe 8 or 4.

So looking back through our data Carve3.bin we see that if we step forward in multiples of 0x2000 bytes we see that if our assumption of a cluster size of 0x2000 were true that the second cluster looks strange. Prior to 0x8000 is a a run of all zeros which is not normal for a sequential run of a JPEG which usually has high entropy data.

So let's try maybe half the cluster size of 0x1000 or 4092 bytes (8 sectors). If we find the start of the JPEG be searching for FFD8FFE0 we found at 0x2000. We then search forward one 'trial cluster' of 0x1000 we find that there is no continuity of high entropy data we would normally see in the data part of a JPEG. So our initial assumption of a cluster size of 0x2000 was wrong. So let's move forward with a cluster size of 0x1000.

If we move forward from 0x3000 to 0x4000 we see some nice data that has high entropy again.

So it looks like our assumption of cluster size 0x1000 might be correct, so if we move forward another cluster 0x1000 we see we are not in JPEG type high entropy data anymore.

So maybe the JPEG finishes in this last cluster i.e from 0x3000 to 0x4000. So lets search forward from 0x4000 looking for FFD9 and we find a hit at 0x4424.

So if we try making up a the file of:

0x2000 to 0x3000 and

0x4000 to 0x4424

If we combine those parts we have a file Carve3_2000_3000_4000_4224.jpg. In a hex editor we simply copy the first part 0x2000 to 0x3000 to a file then we copy 0x4000 to 0x4224 and append that to our file. Now let's check the results.

Carve3_2000_3000_4000_4224.jpg

Wow that looks good if I don't say so myself.... and my best profile too!

So that was quite a hexinating journey. So what did we cover. Carving a sequential JPEG from unallocated space right up to a fragmented carve. Good work. What you have learnt is the basis of every file recovery.