Friday, April 24, 2009

UnDBX v0.13: Breaking the 2GB Barrier

I finally got some email feedback that pointed to a problem in UnDBX (thanks Darren Veach). In a nutshell: UnDBX v0.12 can't open .dbx files that are larger than 2 gigabytes (GB).

I never bothered to test UnDBX with such files, because the maximum file size of the .dbx files that are used by Outlook Express is 2GB (see Microsoft's KB article 903095). This means that such large files are, by definition, corrupted, and UnDBX was never meant to be used as a recovery tool.

But people do try to use UnDBX for email recovery. And some even complain when it fails. Fair enough.

I decided to look into the 2GB issue, but was not able to get Outlook Express to generate .dbx files larger than 2GB. OE started acting up when a .dbx reached 2GB, spewing error messages whenever I attempted to add messages to a full folder. But the corresponding .dbx file was not corrupted and its size never exceeded 2GB.

In the end I generated a file larger than 2GB on my Linux box by generating a large file full of zeros:

dd if=/dev/zero of=Zero.dbx bs=1024 count=2097152

and then concatenated it to the end of a valid .dbx file:

cat Original.dbx Zero.dbx > Inbox.dbx

To my surprise, UnDBX was not able to even open the file, let alone extract messages from it.

The reason for this is simple: the standard C-library file handling functions fopen, fseek, fread, etc. use signed 32-bit sized offsets to access files. This limits the size of files that can accessed to 2GB. Accessing larger files requires the use of the 64-bit analogues - fopen64, fread64, etc. And if you add the following in your source code, before including stdio.h, then these functions replace the 32-bit functions:

#define _FILE_OFFSET_BITS 64

I made the necessary modifications, and along the way paid some attention to proper use of long and int types so that UnDBX can be compiled and run on 64-bit platforms.

The bottom line is that UnDBX can now open .dbx files larger than 2GB. But whether or not it can actually extract some or any messages from these files, depends on how badly corrupted they are. Don't expect miracles.