random ramblings on Delphi, programming, Delphi programming, and all the rest

Tuesday, September 26, 2006

GpStructuredStorage internals

[This is a second article in the 'embedded file system' series. If you missed the first part, you can read it here.]

GpStructuredStorage compound file is organized in 1 KB blocks. First block contains header, then the content alternates between a file allocation table (FAT) fragment block and 256 blocks managed by the preceeding FAT fragment. Each block can be represented by a number - header block is block #0, first fat fragment is block #1 and so on.

block number and size of the internal file containing global storage attributes

block number of the first FAT block (always 1)

block number of the first unused file block

block number and size of the root folder

storage system version (at the moment $01000200, or 1.0.2.0)

Each FAT fragment contains 256 32-bit numbers, one for each of the 256 following file blocks. This number is either 0 if block is last in the FAT chain, or it is a number of the next file block in the chain. Each block is linked into exactly one chain. It can either belong to a file/folder or to a empty block list. Address of the first block in the empty list is stored in the header ([first unused block] entry).

FAT structure defines the limitations of my file format - last block must have number that is less than 4294967294. As block are 1 KB in size, total file size cannot exceed 4294967294 * 1 KB, which is slightly less than 4 TB. Enough for all practical purposes, I think.

Folders are simple - each folder is just a file. It contains file information records and is terminated with two 0 bytes.

File information record is a variable length record containing file name (up to 64 KB), attributes, length, and address of the first file block (additional blocks are reached by following the FAT chain).

At the moment, only two attributes are defined. One specifies that file is actually a subfolder, and another designates a special file containing file attributes (for discussion of attributes see the previous article).

ATTRIBUTES$0001 = attrIsFolder $0002 = attrIsAttributeFile

That's just about everything that is to tell about the compound file format. Armed with this knowledge, one can easily write a compound file browser/repair utility.

When you passed a value larger than 63 as a fourth parameter to the TestFile method, TestFile made some incorrect assumptions and compared value that was larger than $FFFF to another value that was truncated to two bytes. As a result of that, error was raised (incorrectly).

I have updated test suite at http://gp.17slon.com/gp/files/testgpstructuredstorage_src.zip.