Fragmentation of Dynamically-Expanding VHDs Under Hyper-V

Post navigation

Most of us are aware of how inefficient the NTFS file system is and the massive amount of fragmentation that can easily result. But this week I saw a prime example of how crucial it is to make defragmenting part of your regular maintenance routine with Windows servers. It is common to run defragmentation software on a schedule or as a background process on all our Windows systems, but what about your virtual disks under Hyper-V? Many administrators neglect this and if your VHDs are a fixed size this probably isn’t a major issue since the entire file is preallocated on creation. This week I saw a Hyper-V server running four VMs, all with dynamically-expanding VHDs. Two of them were in around 350 fragments each! The complaint? Very poor file system performance on those virtual machines. For those who aren’t aware of how this happens or have bought into the urban myth that file fragmentation isn’t a real problem under modern versions of NTFS, let’s go over a little refresher on File Fragmentation 101.

Let’s say I have a freshly formatted NTFS partition I create two dynamically-expanding VHD files on. When I initially create the files they are very small. For example they may be stored on the disk as follows (this is only a rough example and these numbers don’t reflect actual allocation in any way):

VHD1.vhd
5 MB

Free Space1 MB

VHD2.vhd
5 MB

Remaining Free Space

As both VHD files grow, you will quickly end up with a mess like the following since NTFS will be unable to grow the files into the small amount of space it allocated between them:

VHD1.vhd
6 MB

VHD2.vhd
6 MB

VHD1.vhd
26 MB

VHD2.vhd
61 MB

VHD1.vhd
100 MB

Free Space1 MB

VHD2.vhd
50 MB

Remaining Free Space

Of course within those virtual disks files may become fragmented (especially if the VM is using NTFS as well). Thus the problem is compounded. For this reason Microsoft actually recommends using fixed VHD files (not dynamic) unless you perform regular manual defragmentation on the Hyper-V server. The advantage of using dynamically-expanding VHDs is that you don’t have to guess how large each VHD will grow. You can even allocate more space than you physically have, allowing them to grow until the drive is eventually full.

Another misconception I’ve heard before is the idea that fragmentation is less of an issue on RAID arrays. RAID has nothing to do with fragmentation. The file system is spanned across disks in a RAID array and in the case of mirroring at the same location in each mirror, but the RAID controller itself does nothing to your file system that affects fragmentation at all. Let’s take a simple four-disk RAID 10 array for example with a stripe size of 128 KB. A single 1 MB written at the start of the array will be broken into eight parts (A-H below) as follows:

Disk 0

Disk 1

Disk 2

Disk 3

A1

B1

A2

B2

C1

D1

C2

D2

E1

F1

E2

F2

G1

H1

G2

H2

Now let’s say we start with two files exactly 256 KB in size (assuming the cluster size of the file system is less, which it almost certainly will be). Stored at the beginning of the array. Then we grow File1, then grow File2, etc. Until both files are 1 MB, resulting in several fragments for each file in the file system. We could easily end up with something like this (each color represents one of the two files). Note that now each file is more than eight parts. That’s because unless they grow by exactly 128 KB (and NTFS leaves no space between), each fragment is going to span stripe-units. And parts of both files will be occupying the same stripe units. The only way this MIGHT not happen is if the strip size and cluster size of the file system are exactly the same (which I’ve seen recommended by some people but have never seen any hard evidence this improves performance at all–in fact there are negative consequences for both larger cluster sizes and smaller stripe sizes).

Disk 0

Disk 1

Disk 2

Disk 3

A1

B1

A2

B2

A1

B1

A2

B2

C1

C1

D1

C2

C2

D2

E1

D1

E1

E2

D2

E2

F1

G1

F1

F2

G2

F2

G1

H1

G2

H2

I1

H1

I2

H2

I1

J1

I2

J2

K1

J1

K2

J2

K1

K2

And this is only a simple example where each file is in a mere four fragments! Imagine what a mess it can get when files get split into dozens or hundreds of fragments. So having a RAID array doesn’t minimize the need for file defragmentation at all.