Undeleted

Modern filesystems make forensic file recovery much more difficult. Tools like Foremost and Scalpel identify data structures and carve files from a hard disk image.

IT experts and investigators have many reasons for reconstructing deleted files. Whether an intruder has deleted a log to conceal an attack or a user has destroyed a digital photo collection with an accidental rm -rf, you might someday face the need to recover deleted data. In the past, recovery experts could easily retrieve a lost file because an earlier generation of filesystems simply deleted the directory entry. The meta information that described the physical location of the data on the disk was preserved, and tools like The Coroner's Toolkit (TCT [1]) and The Sleuth Kit (TSK [2]) could uncover the information necessary for restoring the file.

Today, many filesystems delete the full set of meta information, leaving the data blocks. Putting these pieces together correctly is called file carving – forensic experts carve the raw data off the disk and reconstruct the files from it. The more fragmented the filesystem, the harder this task become.

Many open source tools automate the carving process: The list is headed by Foremost [3] and its derivative Scalpel [4], but other tools include PhotoRec [5] and FTimes [6]. PhotoRec does not support generic carving for any file type, and FTimes is so hard to use it is not worthwhile for most users.

Foremost and Scalpel are not interested in the underlying filesystem. They simply expect the data blocks of the files to reside sequentially in the image under investigation. The tools will find images in dd dumps, RAM dumps, or swap files. Carving will help to identify and reconstruct files on corrupt filesystems, in slack space, or even after installation of a new operating system, as long as the required data blocks still exist.

Of course, none of these tools can perform miracles, and they are not designed to retrieve data from physically damaged hard disks. Also, the carving process cannot access data blocks that have been overwritten.

Because carving tools do not rely on the filesystem, they need other sources of information to discover where a file starts and ends. Fortunately, many file types have known structures. The header and footer are often all that is needed to identify the file type and location. The Linux file command also uses header and footer information to identify file types.

File carvers investigate the whole hard disk, or disk image, to locate known headers and footers. They then carve out the blocks between the header and footer and store the data as a new file. Some file types do not possess unique footers. Carvers will at least guess where the file ends on the knowledge of where the next header starts. Of course, any amount of unidentified data could reside between the end of the file and the next header.

Image formats are an exception. For example, each JPEG file starts with a byte sequence of 0xFFD8, typically followed by 0xFFE00010. File carvers are thus very good at identifying JPEG images. However, if some blocks have been overwritten, or if the file is fragmented, the tools will restore only a part of the file at best (Figure 1).

Figure 1: File carvers ignore the filesystem and carve the images directly from data blocks. In cases of fragmented files, the carver returns an imperfect photo, but this image might be sufficient to identify the subject.

Foremost and Scalpel

Jesse Kornblum and Kris Kendall from the United States Air Force Office of Special Investigations developed Foremost in March 2001 as a tool for analyzing and recovering deleted files. The Foremost carving tool is inspired by an earlier program called CarvThis, which was created back in 1999 by Defense Computer Forensic Lab but never released to the general public. Foremost is now open source, and Nick Mikus maintains the source code after giving the program a major boost in the scope of his Master's degree.

Golden G. Richard III developed a separate program dubbed Scalpel based on Foremost 0.69. For a long time, Scalpel was regarded as an advanced tool. Some sources even claim that the Foremost developers recommend Scalpel themselves [7]. To be more accurate, both projects are under active development. Although Scalpel was far superior to its predecessor in 2005 – with the ability to analyze images around 10 times faster – Foremost has caught up recently thanks to Nick Mikus, and it is actually superior to its derivative for some tasks.

Both Foremost and Scalpel use configuration files to specify which files to search for (Listing 1). The first column designates the file type and also specifies the file extension to add to any files the program finds. Files for which the case is relevant in the header and footer have a y in column two; this is n for all others. The next column defines the maximum file size, followed by the header byte sequence, and the footer byte sequence if it exists. The \x string introduces a byte in hexadecimal notation; the other possibilities are \s for a space and ? as a wildcard for any character. Other options can follow at the end.

Listing 1

Configuration

Fast Finder

Because of its origins, Scalpel uses the same configuration file as Foremost, although the two tools work differently internally. Both tools find more or less the same files, but there are some discrepancies in file identification. Forensic experts are thus well advised to use both programs.

Versions 0.9.1 and later of Foremost use a new approach to identifying ZIP, JPEG, Office, and other formats. The formats are implemented directly in Foremost, meaning that the program does not need header and footer information in the configuration file for the identification process. Foremost enables this new detection function if you set the -t flag at the command line followed by the required file types:

foremost -T -t jpg,gif,pdf -i imagefile

Supported formats are listed in Table 1. To enable all of these built-ins, just set the -t all option. The previous command line also sets the -T option to tell Foremost to write any files it finds to a directory that uses a name with a timestamp. This makes it easier to organize the forensic investigation, in that each new run writes its results to a new directory.

Space Requirements

The possibility of false positives means that the carver identifies a huge amount of data, so make sure you have enough free space on the target filesystem. The carving process doesn't necessarily require large amounts of copying. Virtual filesystems, such as CarvFS [8], are designed to access the data directly from the original image. CarvFS, which is based on FUSE (Filesystem in Userspace), only expects the carving tool to provide a table that describes which files are available at which physical locations. The CarvFS filesystem originated with the Dutch police's Open Computer Forensics Architecture (OCFA) project (see the article on OCFA in this issue), and it is intended for situations in which copying all the files to a separate location would result in huge volumes of data. In other cases, however, copying the data is more efficient than accessing it from the original image.

A typical Foremost run without built-ins is shown in Listing 2. The image for this example comes courtesy of the Digital Forensic Research Workshop (DFRWS [9]) challenge. DFRWS ran this competition in 2006 to test file carvers and promote their development. At the end of the competition, the organizers published a list of the files in the image.