RAID-5 performance is slow on Poweredge 2800 with PERC 4e/Di

I'm having a performance issue with reading and writing on a Poweredge 2800.

I have a 7 disk RAID-5 array controlled by a PERC 4e/Di. When I copy a 1 GB file from the server to my workstation, it copies at an average of about 4.68 megabytes per second. I've tried other copy configurations (workstation to workstation, other switches, switch ports, etc) but I think I can rule out all of those because when I copy my 1 GB file from a USB flash drive, it copies at 9.09 MB/s.

Why this disparity between reading from the RAID and a USB drive?

This is a server for a small office, and although it is running SQL, Exchange and serving files, there are only about 25 people in the office. When I'm not transferring files, CPU and network usage reported in Task Manager are nominal (average less than 10% and 2.5%, respectively). The system has 4 GB of memory. Windows sees two disks, C: and D:. Both are on the same Virtual Disk on the RAID controller (Bad setup, I know).

The Dell Openmanage application says the PERC 4e/DI firmware version is 521X, and it has 256MB memory. Cache options are Adaptive Read-ahead, Write-back, and Cache I/O.

How can I increase the speed of reading and writing to the RAID array?

What speed does your Flash drive support? Your botleneck is not your hard drive, but it's your USB flash drive. If your USB drive is a low speed one, you may get lucky to have it peaks at 10MB/s
Check this out:http://en.wikipedia.org/wiki/Universal_Serial_Bus

No no, the flash drive is reading at 2x the speed of the RAID. I believe 9.09 MB/s is near the limit for our 100mb ethernet. Why is the 320 megabit/s RAID reading at only 5 megabyte/s? (Writing is approximately as slow for all of these)

I know. SCSI is backward compatible to SATA so I have seen a lot of people build RAID with SATA drives on SCSI controller. It is not recommended performance wise, but it is budgetary. I hope this is not the case.

The Dell driver you posted is a lot newer than what I am running (by date, anyway). Is there a risk of corrupting the array while upgrading the driver? I will have to wait until Saturday for the driver.

Hopefully, you were thinking about compatibility between SAS and SATA, and not SCSI and SATA when you were talking about people building RAID based on SATA disks? Because, SCSI is not even compatible on the pin level to SATA.

@pixelchef

Problem of the performance with your RAID lays in your RAID controller. Dell's PERC controller is an "old school" controller which does support RAID 5, but it doesn't have hardware based XOR (specialized CPU which does RAID calculations) engine so the perfromance is degrading with each new drive above three added to the RAID 5 array.

I know that it is not possible in your situation (guessing), but it would be much better to reorganise your drives for your applications to something like:
RAID 1 (2 disks) for OS and server applications
RAID 1 (2 disks) for SQL server databases and log files
RAID 5 (3 disks) for Exchange server
+ global hot spare drive.

It's only getting 8MB/s though. Surely you're not suggesting that PERC as a whole is "old school". Also, I hope you're not really suggesting that his controller was designed for a max local transfer speed of 9MB/s.

Perhaps I didn't fully absorb what you said. Sorry if I seem like a pompous jerk. That is not my intention.

this modell of PERC controller is an "old school" model in a way I described - it does not have a hardware based XOR engine, but it runs as a software on a controller. You have to understand the fact that RAID controller that feature RAID 5 support are not all the same thing. RAID 5 is very consuming alghorhithm so if the controller doesn't do it very fast, you can have an 8 Gbps fiber channel storage, it still be very, very slow. Now, as I said before, PERC is more of a software based controller, and it works well with RAID 5 arrays consisting of three disc drives. Add another drive to the array, and you'll have about 25% degradation in performance, already. Unfortunatelly, the degradation is almost exponential with addition of more drives. So, try to imagine how PERC is getting all wet with seven of them.

I'll try to describe you one other example: try to build much simpleer RAID 1 array using on-board (integrated) RAID 0/1 controller on any server, or using RAID controller that costs something like 1k USD on the same server with the same hard discs... you'll see the difference from the start, when they begin to sync drives. And RAID 1 is really nothing compared to RAID 5 math calculations done by the controller.

And don't worry, you're not anything close to jerk, and I'm not offended by your comment :-)

>The Dell Openmanage application says the PERC 4e/DI firmware version is 521X, and it has 256MB memory. Cache options are Adaptive Read-ahead, Write-back, and Cache I/O
Your configuration is theoretically capable of 70MB/sec (ish) and 840-1700 IOs per second (IOPS). The write-back is the fastest write policy, so you're OK there.

- Before testing any further, you should defrag the file system - a PE2800 is a few years old now, so presumably so is teh file system. Download a trial version copy of Diskeeper from http://www.diskeeper.com/ and defrag the file systems
- Try testing write performance with a smaller file - say 150MB so that it fits in the cache - you should see very good performance.
- Copying a file from C: to D: causes spindle contention - that is; you are reading from the same disks you are trying to write to. Try copying from a known fast network source.
- Once you fill write cache, you are down to underlying disk performance and the RAID 5 write penalty. For every host write to a RAID 5 set, you generate four disk operations:
1. Read original data
2. Read original parity
3. Create new parity (XOR in hardware on RAID controller)
4. Write new parity
5. Write new data.

So - you can expect underlying RAID 5 write performance to be between about one quarter and one half that of a single disk. Ouch.

I see. The first comment you made was "Dells PERC". When you clarified what you said, it makes sense now- "This model of PERC". You see, I have PERC6/i in my PowerEdge, and I wouldn't call it "old school". I thought you were bashing all of the PERC universe :P LOL

I still don't think 8MB/s is right regardless of hardware processing or not.

Well, as I saw the tech. specs of the PERC 4 we're talking about, and since I have passed more than few LSI's technical bootcamps, I can tell you - this PERC 4 is based on a really old LSI's chip which represented an entry modell four years ago. In general, it wasn't exploding in performance even in those days compared to the more powerfull (read: more expensive) controllers.

Well I guess I have made the problem worse then. As you can see in the related thread, other experts suggested that the RAID would be faster with more disks, so I maxed out what I could in the chassis. Although I am using a different test file now, performance does seem to be worse now than it was before I expanded the RAID.

Someone said it's a bad design to have C: and D: on the same virtual disk, and I know -- this was set up before I was in charge.

Generally true : the more spindles the faster a striped array will work. Stand by that comment without hesitation. However what appears to have happened is you have found a few more disks (of different make) and added them to the array only to find the controller now appears to be the bottle neck. Which is one of the action points we had set out before...
And with the PERC4 controller, it was also highlighted that the increased CPU was most likely due to managing the IO (it has to calculate the parity rather than onboard the controller).
Just as a refresher on those action points :OK, where to from here...
1) find out about a second CPU
2) increase swap file / page file size to be 8 gig - will need to tidy up the disk (temp files and so on) - so run the defrag afterwards and after "cleaning" - a weekend job.
3) Have you ever configured and installed RAID before ? it can be "tricky"
4) Do you have ALL the install disks to be able to rebuild the machine if needed ?
5) PERC4 is a bit ordinary, but you could add a few more disks without any problem and then reconfigure - remembering that the more spindles the better in a striping based raid.
6) Cache Policy should be Cached IO and would be inclined to change stripe size to be 64kb / maybe bigger given the SQL - which works on 8K page sizes.
7) Check all BIOS and Driver versions - see if there are any upgrades / updates.
8) Find out about changing that C: partition to increase allocated space - we will want to find out more about temp files and ideally put the swap file back on C: drive
9) Would be inclined to look at PERC6 being dual channel you can have RAID 1 on one channel (needs 2 disks) for system and page file, then RAID 10 on second channel (needs a minimum 4 disks, and the more the better) on the other channel for SQL databases. You still have to share "office" files, so partition the first array for some shared space - in which case might want to investigate the higher speed (15kRPM) newer disks for that task and use all the 10K (plus a new one) for the databases - it is going to be better off.

Think you need to digest the above, do some more homework, answer a couple of those SQL questions, maybe talk to a DELL engineer about upgrading to PERC6 and getting them involved to install and configure. There is a bit of work involved in the physical setup, and then there is still the "tuning" of SQL.

Now, you also mentioned that a new hardware project was on the cards. What has happened with that ? Can you get the new SQL Server ? Again from that previous thread :If you are getting a new box in a couple of months, then think that it will be best just to hobble this one together to extract whatever improvement is possible without spending too much.

That 256 is cache - and not "standard" - maybe it was an option 4 years ago - cannot remember the check boxes back then :)

Can certainly add a couple of disks to that controller and it will help - remember the more spindles (physical disks) in a striped based array the better.

The C: and D: difference - well yes the physical disk throughput will be determined by the bus speed from the array, but windows doesn't neccessarily know that and manages the IO queues slightly differently. True, IO is a kernal mode operation and interrupt based, It does make a small difference. It makes a huge difference down the road on the new box when you get a multi-channel controller.

If it shows 2 CPU's then yes it is dual-core - so CPU should not really be a big problem - think we can reasonably assume that CPU is busy because of IO.

New box should be the SQL server - can spec it up later - but will be something like 1xQuad-Core 2.5++ghz 6mb Cache, 8 gig memory, 6 disks x 73 or 146gig 15Krpm disks (2 as raid1 for system, 4 as raid10 for database), perc6e controller with 512mb cache.

Still need to think about that network backbone and those daisy chained switches as well...
I think that all this has highlighted is a need to update hardware :)

No problems, the more you get an opportunity to discuss and share thoughts, the better. Always welcome to e-mail and discuss (see bio). And welcome to EE, you will enjoy it.

But there is always a delicate balancing act (as far as I am concerned) matching Hardware with Applications and then trying to come up with the magic formula. We sometimes (maybe often) find the hardware guys not overly concerned with the Software space and vice-versa - please no e-mails on that last sentence :) .

While there are several broad rules, each site can be quite different and you really need to know the full and complete story. Even in the Application world, there are differences, for example, configuring SBS is quite different to configuring the individual components if installed seperately / individually. Then there are things like SQL server where log files are essentially sequential in nature whilst data tends to be highly random, so the choice of RAID for those two different aspects are really a bit different. By the way, this happens to be an SBS site, with a thought (or plan ?) to seperate out the SQL server aspect.

There are sometimes certain conditions that will have a bigger impact more so than being "right or wrong". And then there are arguments like Raid 5 versus Raid 10, and with the newer controllers, the performance in Raid 6 with improved fault tolerence over Raid 5, or using both channels to improve total throughput, and so on - even the hardware guys debate these differences.

There is one comment that does require a bit more "qualification" and that is the performance degradation on the PERC4 controller with additional spindles - which does occur with CPU saturation and the lack of any caching as standard - which was the original PERC4, and the original drivers did have a small (?) problem.

The LSI chips back then as used in PERC4 controllers essentially relied on the CPU for parity calculations and now it is pretty common to see dedicated chips for XOR and I/O on the controller. And then there are a couple of performance improvements in making sure all the correct drivers are loaded and kept up to date. The drivers being a point which was made on previous thread and you have also picked up on only to find that it hasn't been updated.

As for your comments, there is absolutely nothing wrong in anything you have said except "pompous jerk" and "tad childish" :) :)

As for your approach, there is nothing wrong with that either, but did you really read the previous / related thread first, or just started to answer ?

Mark, I must say - congratulations to you for taking the time to summarize all the general aspects of the RAID and some everyday facts considering hardware and software guys. :-)

Still, I have to add few things to your post.

First, about SQL server. I can confirm your thoughts, but have to explain one thing. If we're talking about different aspects of internal SQL I/O activities like logs and data, than it all depends on the volume of data SQL will operate. If the load on the server requires serious optimization, than first thing that will be done is to put the logs and data onto different arrays, with different block sizes, and depending on database(s) size, different types of RAID.

Second, I can see that you were refering to my comment on LSI chip and performance degradation with additional drives in the array. As I said, and you repeated, LSI chip that represents the core of the PERC 4 RAID controller doesn't have it's own XOR engine, so obviously it relies on server's CPU. When RAID controller uses CPU to calculate RAID data, the "the more spindles the faster a striped array will work" doesn't stand. In those circurmastences adding more drives will actually decrease the performance of the RAID. That can be measured easly using various IOmeter and CPU monitoring tools.

Third, I can absoultetly agree with you that mixing drives from different manufacturers doesn't help in this matter. Actually, sometimes there are situations where even drives with the same part number from the server manufacturer can be of a slightly different specifications. I had few scenarios where drives had a difference of exactly 15 rpm between them (10,000 and 10,015 rpm) which triggered regular internal sync operations that would slow down the whole RAID considerably.

And that's it from me for now. Again, I really enyojed reading your previous post :-)

Much relieved - thought I was going to get a serve about the hardware / software comment :) Thanks for your comments...

SQL can also depend on recovery modes. Minimally logged databases or essentially read only databases (like a data warehouse) have a different requirement again...

I was a hardware guy, now a software guy and get confused as to which hat should be worn from time to time :)

MrMintanet as the "newbie" this is a great thread to be part of (ideally this is what they should all be like - not always the case). We are all here to converse, discuss and learn from each other at the same time solve problems for the asker...

And now we need to hear from the Asker as to how we may be able to help further...

If the author has the same config before and it worked well, we should focus on the issue after adding more disks. My question is for the new disks you just added. Do they have the same speed with the old disks? In the mix mode configuration, RAID will run at the slower speed one. You should update your firmware to the entire system so your controller and disks are running on the same firmware.

When I added the Fujitsu drives, I expanded both C: and D:. Now C: has 17.5GB free. But why move swap back to C:? How could that affect performance?

I don't have a contact person at Dell. Should I just call DTS and explain my situation with upgrading the PERC card, or would one of you send me the name of a good Dell engineer?

I found that the CPU is 64-bit, and another consultant recommended to get SBS 08, which includes a SQL license for separate hardware. I haven't done more research on the 'new hardware' question since the last thread, but this seems like a good path to take. I'm going to try to limp along till May when I graduate (Iowa State University).

@lnkevin:
The new drives are 10,000 rpm but the Fujitsus are actually 10,025 while Seagates are 10,008. You seem to have hit that nail on the head. So, Ill hit eBay for a new batch. Should I for all Seagate or Fujitsu?
Is there any firmware/drivers besides the PERC card? Are you saying I can upgrade firmware on the disks themselves?

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

>>@lnkevin:
The new drives are 10,000 rpm but the Fujitsus are actually 10,025 while Seagates are 10,008. You seem to have hit that nail on the head. So, Ill hit eBay for a new batch. Should I for all Seagate
or Fujitsu?

Not to sound like an attention starved fool, but i suggested that yesterday. See:
04.02.2009 at 03:02PM PDT, ID: 24055063

Well, as I already told in my previous post - even a small difference in the exact rpm of every single spindle (drive) can take you to the overall downgraded performance. Also, let me reinstate the fact that even drives with the same spec and part numbers (including Seagate and Fujitsu) can be different.

But, still - I'm can assure you that that is not the only thing that hit the overall performance.

If we're talking about few GB of SQL database, I wouldn't say that you really need to move it to another partition or another RAID array. For example, even server software using SQL Express can expand the database to 4 Gigs or so, and it won't hit the performance barrier. Good example is Backup Exec or WSUS (Mintanet, don't you darre to say anything ;) whos SQL Express databases can grow considerably.

Still, if we are talking about performance in hope that you really do have a plan of installing another server - I'd like to ask you to consult guys like Mark or humble myself here, and try to find the best configuration of arrays.

And again, I cannot underline enough the fact that your 7-drive RAID 5 array controlled by PERC 4 is something you cannot do much about. Actually, I think that the idea of installing another server later this year will be a great oportunity to move the data to the new server, and completely reinstall good old Dell with different, much better (much, much, much better) array config.

Well, I was in a software business for about 5 years, than moved to the system administration side, and today I have both software and hardware guys arround me, trying to stay normal and calm when they start to cry on my shoulder like babies :-)

So, yes, I can understand your thoughts... we're in the same kind of story. :)

ummm, knoppix = linux ? but yes, there are various testing kits that could be used.

But I think we have already established that the previous thread it was going to be "easy" to add a few more disks, and having done that I think we have established that the PERC4 controller is probably not up to speed to cope, or, there is some other underlying issue at hand.

Can try to use same disks on the array - maybe configure two arrays - could use segates on one array and fujitsu's on another (is it the dual channel version ?)

First thing is to call Dell Support and talk to them about the various options. might even be a config option or three on the PERC card.

As for SBS 2008 - need the premium edition to be licensed for a second box...

The PERC 6, being a SAS controller, would not work with the backplane or SCSI 320 drives in the server. If we had to go that route, we might be better off just replacing the server.

With that said, this Gold Tech Support rep said that upgrading the firmware, drivers, etc may provide an acceptable performance increase. He also suggested to try disabling "patrol read" as this constant disk activity will decrease performance.

That is a pretty good plan, the drivers can make a world of difference - and then again, if they are not so old, then maybe not. Regardless they should be the most current version.

Did you discuss having two arrays ? Might be worth discussing that and maybe changing stripes to 32k

and sorry about the perc6 faux-par should have been and could have been raised earlier. Replacing the server !?
that is a lot more work than a new controller and half a dozen disks, so do not get too disparaged by that.

Patrol read - thought that was perc 4/SC or DC, ah well, memory is not so good after all... From a different manual to the perc 4/di

Patrol Read:
The Patrol Read function is designed as a preventive measure to detect hard drive errors. Errors before drive failure can threaten data integrity. Patrol Read can find and possibly resolve any potential problem with physical drives prior to host access. This can enhance overall system performance because error recovery during a normal I/O operation may not be necessary.

Patrol Read Behavior
The following is an overview of Patrol Read behavior:

Patrol Read runs on all disks on the adapter that are configured as part of an array including hot spares. Patrol Read will not run on unconfigured drives, which are drives that are not part of an array or that are in a ready state.

Patrol Read adjusts the amount of RAID controller resources dedicated to Patrol Read operations based on outstanding disk I/O. For example, if the server is busy processing I/O operation, then Patrol Read will use less resources to allow the I/O to take a higher priority.

Patrol Read operates on all configured physical drives on the controller and there is no method to deselect drives from the Patrol Read operations.

If the server reboots during a Patrol Read iteration, Patrol Read will restart from zero percent if in Auto Mode. In Manual Mode, Patrol Read does not restart upon a reboot. Manual Mode assumes you have selected a window of time dedicated to running Patrol Read and the server will be available during that time.

Configuration
You can use the BIOS Configuration Utility to configure Patrol Read. Dell OpenManage Array Manager and OpenManage System Storage Management cannot configure Patrol Read. Patrol Read can be started and stopped using MegaPR from within Window and Linux.

Guessing about (un)improvement of the performance with drivers update isn't bad. At least, there's always hope. But, while downloading new drivers, check out the change history provided with the drivers. There you'll find if there are any changes in the history between the drivers you have at the moment and the new ones, that resolves performance issues.

At the same time, I don't have any practical experience with "Patrol Read" function, since I'm much more into IBM storage world. However, my own thinking is that the data always has the highest security. Still, that's why you're using RAID and backup, right? You can try disabling the function, and see if there's real performance increase involved. But, before you do that, I strongly suggest to try only by updating the drivers and see through few tests (like the initial one that triggered this thread with your dissapointment about copy speed) how the RAID does the job.

Also, maybe I missed the point, but - why swap file increase to 8 GB? If you're not using 64-bit version of OS and server apps, 8 GB won't do any real advantage, at least not with the overall performance. Can someone give some thoughts about this one? Mark? Anyone?

Sounds to me that he needs to do diagnostics before he gets too into this mess. I would run diagnostics over all possible things. You could spend the whole day to find out that one of the drives in your array is not properly seated and you're getting an error.

It's a long shot, but it is far more practical than the road you're about to go down. There is a great deal of risk in your plan.

I discussed two arrays with another consultant. He thinks I may be able to remove a drive from the RAID array in the RAID BIOS. We have enough free space to do either proceedure below. Here are his notes. These are two different ideas he provided for how to handle this.

That's interesting about Patrol Read. The Dell tech was suggesting that, I think, as a temporary measure to see if it improves performance.

1.
in raid array bios, remove disks 0,1,2 and rebuild remaining array, if this is possible
change disk 0 to dynamic
mirror c drive as 2nd partiton on disk 1
make new array 0123 (DISK2) copy data to it as new d drive

using windows, copy from old d drive to temp drive, reformat 0123 d partition then copy data back

dangerous, but plausable (option 2). Make very sure you have very good backups.

first step is to load all the new drivers, increase swap (to be twice the size of physical memory), change stripe size, check out those config settings and see what happens from that point.

then second step would be to seperate out / split the arrays. maybe 2 seagate = raid1 = system, 4 fujitsu = raid10 = data, maybe the remaing 2 could even by third array for temp files and office work space = raid1.

Hello all, I've updated drivers, firmware, and OpenManage, and I'm not far from where I started.

I did find out something I thought was interesting. I made a virtual disk out of the hot spare, and I can read a 1 GB file from that drive over the network, during the day, at a respectable 8.48MB/s on our 100 mb network. That's much better than the 6.37MB/s I got immediately before that, reading from the 7-drive RAID-5 array. Write performance was still an abysmal 3.73MB/s. Maybe that doesn't confirm the performance issues with the PERC 4, I don't know. I would have expected better write performance.

On the solution side of things, I was wondering about replacing all of the disks in the RAID 5 array with just two or three very large drives in RAID 1. I should be able to do a bare-metal restore from my backup to the new RAID 1. If there is any problem, I could swap the RAID 5 disks back in and go back to the drawing board. Is that a good idea? It seems much less risky to me than the last couple of comments.

RAD 1 is only two drives - performance is like direct reads and writes to a single drive. The mirror gives you a level of redundancy. But still it is desirable to split up and have a few different arrays to split up the different usage - like raid1 for system and say another raid for database. You really, really, should be looking at perc6 raid controller.

If you need better write performance, RAID 10 is recommended. RAID 10 is mirror set over multiple disks (minimum 4 disks) so the load is spreading out and performance is improved. The more disks you have the better performance it is on RAID 10.

In which case, multi-raid and try to get that pipe chocka-block full of IO.

RAID 1 is basic mirroring, not striped at all. What can spread the read and write operations is trying to get concurrent access to multiple disks at the same time. The way to achieve that is to have a multi-raid system. The easiest and pretty fast is to have three arrays - maybe even just raid 1 (being 2 disks per raid 1) and move the individual high io files to different arrays. The downside of that is that the capacity in any one array is limited to the maximum of a disk. That is one reason why you might move to raid 5 and raid 10 and such like - increase the overall capacity. But they are also striped based, so in theory can improve performance. Downside of R5 is it is parirty striped, so need to extra effort to calculate the parity everytime (that is where write performance is mostly affected). R10 however is striped and then mirrored.

With 6 or 7 disks, you have some choices. You could do 1 x R5 for system and office space + 1 R10 for database, or 3 x R1 one for each of system, database, logs or 1 x R1 for system and 1 x R10 for database et al. So there are quite a few possibilities. It really depends on how you want to configure the currently available disks. But having everything on the one disk system all the time is not going to help. You have kind of already seen that with the virtual disk - two things, it is like a second disk system, it is not raid 5.

Ok, Dell suggests upgrading to a PERC 4 DC. This model has a processor and should be faster than the PERC 4 DI.

They also recommend getting the backplane daughtercard to split the backplane into two channels of four drives each. Once this is in place, I could do whatever I want on the RAID modes - maybe R5 for system and files, R10 for SQL, Exchange.

Whatever the case, I certainly hope to be getting to the bottom of this! :)

These guys were great. I finally found the solution on my own, but their patient advice and guidance was invaluable. Thank you to all. I'm splitting the points evenly, as I'd never be able to objectively determine the value of everyone's contributions in relation to all others.

I disabled signing on my workstation, and can copy to the server at 8,595 KB/s. Copy from the server is unchanged, and I think that is because the server is still signing its messages. The task manager shows network utilization around 90% for the copy to the server, and somewhat less, and jumpy utilization copying from the server. During my FTP test, the graph was pegged in the 90% range for nearly the whole time, and even at 99% several times.

If you want to move up through the ranks in your technology career, talent and hard work are the bare necessities. But they aren’t enough to make you stand out.
Expanding your skills, actively promoting your accomplishments and using promotion st…

Data center, now-a-days, is referred as the home of all the advanced technologies. In-fact, most of the businesses are now establishing their entire organizational structure around the IT capabilities.

Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…