Sparse Files: ESE can now give space back to the file system

A long-standing limitation of ESE is that database files never shrink, even after deleting substantial amounts of the data. While the space is freed and is re-usable within the database, the on-disk size does not change. The main limitation is that the first page of each table can never be moved, so even if the file is shrunk from the end, a single unmovable page can prevent shrinking the file. This is similar to an unmovable file preventing the shrinking of a disk partition in Volume Manager. In Windows 8.1, ESE started using Sparse Files to release space back to the file system, even in the middle of database files.

What is a Sparse File?

A sparse file has ‘holes’.

The File System has a special FSCTL called FSCTL_SET_ZERO_DATA. Normally it is used to write a bunch of zeroes to part of a file, but if the file is marked as sparse, and this space gets released to the file system.

But what does the File System do with sparse regions?

Any reads from the file don’t go to disk; instead the FS knows to just return zeroes.

Any writes will need to first allocate a cluster from the file system, and then write to this new block. The new block could be in a completely different part of the disk though -- and the file is now fragmented.

Downsides of Sparse Files

Sparse Files are only supported on NTFS and ReFS; not FAT (and all of the FAT variants).

I do not know of any copy utility (cmd.exe copy, xcopy, robocopy) that preserves the ‘sparseness’ of a file. They all expand the copied file to its full length.

Windows 8.1 and earlier did not do a great job of allocating contiguous space when the sparse regions were re-written. For example, if there was a 10 MB hole in the middle of a file, and the application writes a single 1 MB chunk, the file system often split up that 1 MB in to many 64 kb segments (depending on the fragmented free space of the volume).

Fragmented files hurt sequential access performance, which is very important for fast data access on spinning media (disks). SSDs can also be affected by block size, since it is difficult to know their underlying ‘native’ block size. SSDs may need to do a ‘read-modify-write’ if a program writes to less than the native block size.

Prior to Windows 8.1, memory-mapped files could not be made sparse.

We may get out-of-disk errors in places where we never got them before.

How does ESE use Sparse Files?

ESE manages space hierarchically. We will allocate a large multi-page block of space to be used by a table. We call these blocks of space ‘extents’. The default size of an extent is 16 pages. A table has a main btree, but will also have a btree for each of its secondary indices, and also one for its Long Values (LVs).

An extent starts out being ‘Owned’ by the table, and initially all of the extent is ‘Available’. As the btrees grow and require more space, the pages are allocated from the ‘Available’ part of the extent. Only when there are no more pages in the Available Extent will we ask for another Extent.

When enough records on a page are deleted, the page can be freed, and returned back to the Available Extent.

When an entire extent is available, it can then be freed back to the file system.

Manually Examining Sparse Files

Many utilities don’t know whether a file is Sparse or not. Just doing a ‘dir’ won’t help. But compact.exe can be used. Normally it’s used for examining NTFS-compressed files, but it turns out that the Win32 API GetCompressedFileSize works for sparse files as well.

Let's use an example. Indexed DB is an HTML 5 feature, and Internet Explorer uses ESE for its storage. Here we see that the AppQuota.edb file is 8,454,144 bytes large, and takes up 8,454,144 bytes on disk. It has no space savings whatsoever.

Listing c:\Users\martinc\AppData\Local\Microsoft\Internet Explorer\Indexed DB\ New files added to this directory will not be compressed.

8454144 : 8454144 = 1.0 to 1 AppQuota.edb151060480 : 139788288 = 1.1 to 1 d Internet.edbOf 2 files within 1 directories1 are compressed and 1 are not compressed.159,514,624 total bytes of data are stored in 148,242,432 bytes.The compression ratio is 1.1 to 1.

Now we can run esentutl.exe to manually walk the space trees and free up all of the space that it can. Use the -z option (Zero, originally meant to write zeroes to the deleted parts of a database) along with -t (Trim).

It is recommended that you immediately perform a full backup of this database. If you restore a backup made before the repair, the database will be rolled back to the state it was in at the time of that backup.

Operation completed successfully in 0.313 seconds.

(Ignore the bits about deleting the logfiles – that’s meant for securely zeroing out data, and it’s a reminder that the potentially-sensitive deleted data may still exist in the transaction logfiles.)

Hey look, it even mentions AE -- that’s Available Extents, mentioned above. We just trimmed (freed) 210 pages back to the file system. Compact.exe now says it only takes up 1,572,864 bytes:

JET_bitShrinkDatabaseRealtime affects the runtime behavior of what happens when enough data is deleted in tables to free an extent. ESE will make that extent sparse, but unfortunately there are some other background tasks which may then cause some data to be written in this sparse area. It's not ideal, but it's reality.