I had 4 drives in a RAID10 set and 2 spare disks. We've been monitoring the perfmon Average Disk seconds per transfer (reads and writes) counter. We bought 2 more disks, rebuilt the RAID10 set using 6 disks (2 span) and the performance stayed the same. Is there some limitation that RAID10 only improves based on 4 disk multiples (4, 8, 12, etc)?

I'm using a Dell R510. This is a SQL server. we kept the same files on the volume.

I'm using Dell Perc H700 1GB non-volatile cache.

ADs/T is around 200ms.

These are 15K sas drives 600GB

I don't know how Dell PERC controller does the spans. It just asks how many disks I want in the span (2 or 3 in this case).

The reason we added spindles was to increase the overall time per transaction. we went from RAID1 to RAID10 (4 disks) and the performance doubled. we were hoping for a 33% increase. ADs/T is recommended at 20ms max. we are at 100. we realize we weren't going to get to 20 by adding 2 disks, but since before I was here they were at 400ms, they were going to be ok for the time being in the 50's 60's.

transfers per second are around 850

I understand the logic behind adding disks not speeding up an individual transfer against a given disk. But for all intents and purposes, ADs/T (or read or write too) is a measure of the amount a time it takes to do something against a given disk/volume according to windows. So if a whole transaction take 40 milliseconds but in reality it's writing to 4 disks, it's spending theoretically 10ms per disk if done in parallel. There's always the chance it's writing as fast as it can to one disk and then going on to the next, but how can someone tell which it's doing?

so for that matter, the time it takes for 4 disks should be proportional to the 6 disks. windows should see it as faster even though if a disk has reached it's max potential, each disk won't be faster.

ssd isn't an option for us due to the large size of our indexes, the total space we need for all of our sql files, and the cost. SSD sizes just aren't there yet though for smaller size files it might make sense.

if adding disks won't help improve speed how else does one explain the 50% performance increase when going from 2 to 4 disks and then nothing when going from 4 to 6?

This question came from our site for professional and enthusiast programmers.

Please help us help you by answering the following: 1. Which controller are you using? 2. When you say "2 span" do you mean that you are using RAID 0+1 (a mirror of two stripes) instead of RAID 1+0 (a stripe across three mirrors)? 3. How many reads/writes per second are you dealing with? 4. You mentioned that the "average disk seconds per transfer" counter remained constant; what was the value? 5. Was there a specific performance issue or symptom that led you down the path of adding spindles? If so, what was the symptom?
–
SkyhawkApr 18 '11 at 19:22

@MrVault: what do the disk queue lengths look like now, and what were they before the change?
–
Bob JarvisApr 18 '11 at 19:24

I also don't have the disk queue stats anymore unfortunately. i wasn't sure how applicable they were given the caching going on. i've read articles saying that ADQL is no longer as reliable as people thought it was. we've maxed out the disks in the server so if they are doing transactions as fast as possible and the disk queue length is high, then there's not much we can do with the given hardware. I was more trying to figure out how it's possible the stats would stay the same relatively.
–
user78793Apr 18 '11 at 19:43

1

Short answer: Your disks are still queuing because of too many requests. Add more RAM. SQLServer is pretty good about intelligently caching disk access in RAM to reduce the overall reads/writes necessary.
–
HyppyApr 18 '11 at 20:26

You should dig deeper into SQL performance stats as well as drive stats. If you have lots of connections it may help to make multiple smaller TempDB files. (read up on that, it was a major gain for us. http://msdn.microsoft.com/en-us/library/ms175527.aspx) If you have a buffer cache ratio below 93 you may want more memory. One statistic isn't enough to diagnose the issue. There is more than one way to thrash storage.

If the box is 2008 use the Resource Monitor to determine which DB files are the busiest. Getting them on different spindles is the best performance fix. Specifically, if you have tons of reads and writes rather than mostly reads you should look into getting a second drive controller, a couple of smaller 15k external drives, and getting your LOG and TempDB files away from the drive set with the data on it.

You may be hitting a wall with the bus or even the processor in being able to process the data. If this is truly the bottleneck you will want to look into getting another machine, pulling the 2 disks that you have added and adding them to a raid10 on the other machine.

At this point you would then setup a parallel filesystem across the two RAIDs on the machines to improve performance further (you should see a large increase in data read and write rates)

Adding more disks only increases your performance if your bottleneck was queuing because of having too many tasks and not enough disks. It doesn't make the individual disks any faster, and in fact adds a certain amount of overhead. In the end, reading data from the disk works exactly the same way as it did before, and it still takes just as long to do. The difference is that you can do it on multiple disks at the same time -- which only helps if you're keeping all of the disks busy.

Switching to a higher-speed disk (like SSDs) will likely help quite a bit. One single SSD can outperform a whole box full of winchester drives if latency and seek time are the issue.

An interesting solution that is still somewhat in development but already shows a lot of promise is Facebook's flashcache module. It's Linux only, but then again if performance is an issue, then you shouldn't be using windows anyway. Some benchmarks done by Vadim Tkachenko at Percona shows that flashcache can get you some 99% of the performance of running on SSD with only a fraction of the cost -- assuming obviously that your hit distribution isn't evenly spread across the entire dataset.

I don't know if that's a "dirty little secret", it should be common sense for a sysadmin worth his oats.
–
Chris S♦Jun 24 '11 at 18:42

1

-1: "but then again if performance is an issue, then you shouldn't be using windows anyway. " What a silly thing to say.
–
samsmithFeb 13 '12 at 18:45

2

Windows bashing aside, proposing a linux-based "solution" to a clearly-marked Windows question is also unhelpful. Should have stopped after your second paragraph.
–
rmalayterFeb 13 '12 at 19:20

Linux-vs-Windows server performance has been beaten to death, so I won't rehash all the arguments. But the gist is that unless the benchmark was funded my Microsoft, then optimally-configured Linux (and other Unix variants) tend to be approximately 2x faster on TPC and SPEC benchmarks than optimally-configured Windows Server on equivalent hardware. There are technical reasons for this that I won't go into here. But there is a reason why Google, Facebook, NASDAQ, and most other very data-heavy corporations run on Linux. (And it isn't because of marketing)
–
tylerlFeb 14 '12 at 0:43

1

@rmalayter: The question was not marked windows when this answer was submitted.
–
tylerlFeb 14 '12 at 0:50