Disk Fragmentation: More Than Just a Performance Killer

One of my consulting client’s main file servers started freezing up about 2 months ago. The problem started gradually, freezing up once every few weeks, but eventually it was freezing every few days. Intermittent problems are the most difficult problems to solve because they can be so elusive. The server is an HP DL740 four-processor 2.0GHz server with 32GB of memory connected to two external disk subsystems. The hard disks are configured in two RAID 10 arrays with 146GB hard disks connected to a 6404 RAID controller. This is the client’s main file and print server and has been in service for roughly 2 years.

When the server freezes, there are no error messages, and users can't access data on any of the server shares. The server has two arrays: The first array is tied to the C and D drives, and the second array corresponds to the E drive. Sometimes you can log on to the server with the attached keyboard, but as soon as you access the D or E drive, the server console session hangs. If the server is rebooted, users can access the server shares until it freezes again.

Initially, I suspected a hardware/hard disk problem because one of the disks recently failed and had to be replaced. I ran the latest HP online diagnostics tool, but it reported that everything was OK. Over a weekend, I started the latest HP offline diagnostics and ran the hardware tests in a continuous loop, but I still didn’t find anything wrong with the server. I also upgraded all the firmware and drivers on the server, but the problem persisted. Both the HP Event Viewer and the Windows Server 2003 Event Viewer didn't reveal any clues about why the server continued to freeze, although it seemed to freeze when it was under a heavier load.

I opened a case with HP Technical Support, but they couldn't find anything wrong with the server and suggested the problem could be OS-related. To make matters worse, the server was running out of disk space and I had to quickly make a decision whether to expand an unstable server or evaluate alternatives for additional disk space. While troubleshooting the server, I looked at the disk space situation and noticed that the server drives were almost full. The D drive is 540GB and had 30GB free and the E drive is 280GB and had 10GB free. This client was eating through disk space very rapidly--at the rate of roughly 3GB per day. This was a serious problem. Servers should have no less than 14 percent (ideally 20 percent) of free disk space. NTFS does an OK job of preventing disk fragmentation as long as you have enough contiguous free disk space to write new files, but as the disk fills up, fragmentation occurs at an exponential rate. I used the Windows 2003 Disk Tools to run a disk analysis on the hard disks and found that both D and E drives were heavily fragmented. Unfortunately, I was stuck between a rock and a hard place because I needed to move files off the server, but it was becoming unstable and I didn’t want to risk another server crash.

Was disk fragmentation causing the server to freeze? Everyone has heard of disk fragmentation hurting server performance--but could it freeze a server? I decided to download an evaluation copy of Diskeeper Enterprise (http://www.diskeeper.com/profile/submit-select.aspx?a=l&PId=102 ) because it would likely defragment the disk faster and better than the Windows 2003 Disk Defragmenter. Before running Diskeeper, the D drive had an average of 5.5 disk fragments per file and E had an average of 2.5 fragments per file. After the defragmentation process, files on the D drive had an average of 2.9 fragments per file and the E drive had an average of 1.8 fragments per file. Not perfect, but it was a significant improvement.

After a week with no server hangs, we concluded that disk fragmentation was indeed the cause of the problem. Now that the server was stable, I installed an additional eight 146GB drives in each disk subsystem and created a 1.2TB RAID 10 array on the server. With the additional disk space I was able to move off a couple of large folders from the D and E drives. With more than 50 percent of free space on each drive, Diskeeper was significantly more effective at reducing disk fragmentation. On servers that are heavily used, consider using a product like Diskeeper to keep disk fragmentation to a minimum. Not only will this improve the performance of your disk subsystem, it can prevent server instability on busy servers.

Tip

Windows Server 2003 R2 ships on two CD-ROMs. As with Windows 2003, you must run Adprep/forestprep and Adprep/domainprep on your Windows 2003 or Windows 2000 forest before you can introduce the first R2 domain controller (DC). Unfortunately, R2 ships with two different versions of Adprep: one on the first CD-ROM under \i386 and one on the second CD-ROM in the \Cmpnents\r2\adprep folder. Make sure to run Adprep from the second CD-ROM and not from the first; otherwise, you'll receive an error message that the Adprep didn't successfully complete when you try to introduce the first R2 DC into the Active Directory (AD) forest.

Discuss this Article 5

It depends on the condition of your disks. In a worst case scenario it could take probably 12 to 24 hours. However if the server is not crashing you could probably let it run for two hours until it's defragmented. If you're only performing the disk defrag on Sundays it's going to take several weeks to get it done. In our experience Disk Keeper does a good job at defragmenting the disk and keeping it defragmented - must better than the built-in defrag tools in Windows Server. With Disk Keeper you can defrag while users are still accessing the server, but give it a lower priority. My suggestion is to get Disk Keeper, especially with your maintenance window limitations.

I know this is a pretty old post, but it seems to describe my current dilemma precisely - can you tell me how large the disk arrays were that you defragmented and about how long it took? I only have a 2 hour maintenance window on sunday mornings - I have two logical drives that need defragmenting, each about 650 GB and are composed of a single physical volume of 300 GB 15K SAS drives in a raid10. I can request larger maintenance windows but I have to give 30 days notice and coordinate with a lot of customers. Do you recall how long yours took?

Good article; heavy defragmentation can often be an often-overlooked cause of disk related issues. Recently, at my workplace lab, (about 10-15 people with heavy usage of shared network resources for engineering and scientific modeling) we had a problem with drastically reduced performance from the server side. Our sysadmin spent some time trying to get a handle on the situation....turned out to be nasty fragmentation on the drives (both server as well as workstation). Some time spent with Diskeeper 2007 fixed the problem quickly. Now most of our heavy use workstations run diskeeper for continuous automatic background defragmentation. Very nice software.

Unfortunately for this client, there is no quiet time. They run differential backups during the week and full backups on the weekends, so there is no convienient window to run a defrag. That's why we went with Diskeeper.