Blog

A few years ago, Erika and I went to the VW dealership to trade in her old Jetta and get a new one. The choice of a Jetta was already a foregone conclusion – she loved VWs at the time – and it was just a matter of picking out colors and options.

Bringing the Jetta Home

We took a test drive of the base model, and then the sales guy asked, “Do you wanna also test drive the 1.8L Turbo version?”

Me (immediately): “No.”

Erika: “Sure, why not?”

Me: “You were just saying how this one is so much peppier than yours. You – and by you I mean we – don’t need a turbo.”

Erika: “Come on, let’s try it.”

As soon as she accelerated onto a highway on-ramp, felt the power surge, and heard the turbo whistle, that was the end of that. Suddenly, I felt like a SAN administrator and Erika was the DBA. “No no no,” I was saying, “You don’t need that. You’re never going to go that fast, and I know because you yell at me when I take highway on-ramps that fast. Let’s not spend the extra money if we’re not getting extra capacity. Besides, let’s bring it back to numbers – let’s measure how fast the base version is to 60mph, and then measure the turbo version.”

She couldn’t hear me because I was in the back seat and she was negotiating price with the sales guy. The seat-of-the-pants feeling of speed was enough for her, and often it’s good enough for us DBAs too.

Measuring Your SAN the Easy Way

I’ve written about how to test your SAN’s performance with SQLIO, but I’ll be honest with you: that’s the hard way. It takes knowledge and time, and you only have one of those. (I’ll be charitable and not tell you which one.)

Instead, let’s get seat-of-the-pants numbers for your storage. Go download the portable edition of CrystalDiskMark (NOT CrystalDiskInfo) and put it on a network share. Run it on an idle server (not your live SQL Server, because it’ll slow things down while it runs.) It’ll look like this:

CrystalDiskMark

Across the top, there’s three dropdowns:

5 – the number of test passes you want to run. If you want a fast seat-of-the-pants guess, do 1, but keep in mind it can be wildly variant between passes if something else happens to be going on in the SAN.

4000MB – the test file size. I like using 4000MB to reduce the chances that I’m just hitting cache and getting artificially fast numbers. Smaller test file sizes may look fast but don’t really reflect how a large database will work.

E: – the drive letter to test. Keep an eye on the free space there – you don’t want to create a test file that can run your server out of drive space.

After making your choices, click the All button. While it runs, here’s an explanation of each row’s results:

512K – random large operations one at a time. This doesn’t really match up to how SQL Server works.

4K – random tiny operations one at a time. This is somewhat akin to a lightly loaded OLTP server.

4K QD32 – random tiny operations, but many done at a time. This is somewhat akin to an active OLTP server.

The more astute readers (and by that I mean you, you good-looking charmer) will notice that 4K operations don’t really measure SQL Server’s IO. SQL Server stores stuff on disk in 8K pages, and zooming out a little, groups of 8 8KB pages (64K extents). We’re not looking to get an exact representation of SQL Server’s IO patterns here – we’re just trying to get a fast, one-button-click-easy measurement of how storage performs. Usually I find that during the first round of storage tests, it’s not performing well period – and it doesn’t make sense to bring SQL Server into the game just yet.

Interpreting CrystalDiskMark Results

For magnetic hard drives (individually or in RAID arrays), sequential operations (the top column) are often 10x-100x the rest of the results. This metric is often limited by how the computer is connected to the storage, and you can get those numbers from the bandwidth rates in Kendra Little’s “How Big Is Your Pipe?” bandwidth reference poster. Keep in mind that the MB/sec numbers on the poster are theoretical limits, and in practice, we’ve got 5%-20% overhead involved.

For solid state drives, the difference between sequential and random operations isn’t always as dramatic, but it can still be 2-3x. If there’s no difference, then I’d look even closer at the connectivity method – the SSDs are probably outperforming the connection method (like 3Gb SATA, 1Gb iSCSI, or 2/4Gb FC.)

So what’s a good or bad number? If your server boots from a mirrored pair of local drives, and stores its SQL Server data somewhere else (like on a larger array or on a SAN), then test the local mirrored pair too. Compare the numbers for where you’re storing the valuable, high-performance data to where you’re storing the OS, and you might be surprised. Often I find that the OS’s drives perform even better because we just haven’t configured and tuned our storage.

Keep these original CrystalDiskMark screenshots in a shared folder for the group to access, and then challenge everyone involved to do better. Simple tuning techniques like tweaking the read/write bias on the RAID controller’s cache, right-sizing the NTFS allocation units, and working with different stripe sizes can usually yield double the storage performance without spending a dime.

Brent, thanks for the helpful articles I really appreciate them. I just ran this on the data drive of a new win2k8r2, SQL 2008 r2 OLTP test cluster. Here are the numbers. I think they look OK but not 100%sure. Thoughts?

Hi, Bryan. Something about that seems off – you shouldn’t usually higher speeds for 4k QD32 reads than you see on sequential reads. Make sure you follow the instructions in the post carefully with the number of test passes and the size of the test file. If the results repeat, I’d be interested to hear more about the server and the storage involved.

I ran it a few more times with 5 passes at 4000MB and hit a range if 350 to 400MB/s. Sequential reads were 50-100MB/s faster.

We are using HP Proliant servers and an IBM SAN. Looking in Device Manager I can see that we have Qlogic fiber HBA’s, driver date 11/17/2010, seems kind of old. I have a meeting later today with the storage admins and will ask more questions

Actually that isn’t too bad – I too use QLE2460’s and the driver date is 12/2/2009. No issues to report – they are performing as expected. Do you have the SANsurfer HBA Manager installed? If so you can glean some pretty good performance related information from it. Since I’m tied into an EMC, I do use PowerPath as well.

Hi Allen, SANsurfer is not installed. We have an IBM DS5300 with 300GB 15k disks. Here is what out storage admin has to say about the configuration. This is from an e-mail that was sent to me.

Test has 13 raid-5 raid groups of 7 drives each.

Prod has 22 raid-5 raid groups of 7 drives each

Each raid-5 raid group is presented to the SVC as a lun. Then the SVC combines all of these luns into a managed disk group (or a pool in version 6) which then vdisk are “carved” out for each host. The vdisk is striped across multiple luns in a managed disk group.

Hard to picture, but the disk that the host “sees” is actually striped twice, (once by the svc and once by the raid-5 raid group on the actual disk array.

Steven – I don’t usually compare/contrast the results from the two tools. I use CrystalDiskMark as a fast “easy button” for people to do quick throughput testing, and then if time allows, I use SQLIO to do much more in-depth testing.

I actually took your SQLIO post and automated it a bit more, putting a nice SSRS front-end on it so I can easily identify saturation and things of that nature, but I’m all for a tool that can get me some quick numbers as well. Thanks, Brent.

IOPS are IOPS regardless of whether the storage is local or a SAN. If you drop a file on to a drive and read/write to it, the speed at which you can read/write is what you are testing regardless. The one kicker with a SAN – and Brent alluded to it – is the cache. You want the file to be big enough to know you aren’t just hitting the cache, but rather are hitting the disks.

Very! It gives you insight into what the disk performance is like for that guest with its given storage configuration. Now, whether or not your virtual disk configuration is what’s slowing you down is a different story– you’d need to investigate to confirm how much of a difference that was making.

I really thought you were going to post some numbers here but it turns out you’re just a tease :]

I’ve been considering starting a “san users anonymous” since these vendors hate it when people talk actual numbers. Maybe they have a point but the response shouldn’t be to shut people up, it should be to help them get the best performance.

Dustin – yep, ideally there’d be some kind of Geekbench-style repository for storage data. When I was a SAN admin, I found that to be extremely difficult, though, because it depends on so many variables. There’s the number of drives, the make/model of drives, the RAID controller or storage processor, how the storage is connected, the stripe size, the NTFS allocation unit size, multipathing software, and of course version numbers for everything involved. It’s hard to compare apples to apples with complete strangers in the storage world.

I would agree that while there are a lot of variables, most all of them have metrics and theoretical expectations – from the drives, controllers, ports, cables, switches, HBA’s, etc. A baseline has to established by collecting and analyzing the theoretical capabilities taking into account the configuration. I spent nearly three days reading up on every piece of literature I could find on my hardware before I even started to run tests because the tests mean nothing without some sort of expectations to match the results up against.

Allen – yep, but to share benchmarks publicly, you’d have to include all of the variables. Otherwise, it just dumbs down to, “I have storage and it can read 400MB/sec.” While that *is* useful to you, it’s not terribly useful to the public.

While I share what I’ve gathered, I mostly do that so I can be corrected in the event I interpreted something wrong. This was my first ever attempt at using SQLIO, using your post as a guideline (I believe you saw it, but so others know what I’m talking about):

I personally do that because I’m a numbers/math person and feel that’s the only way to make sense of the output using these sorts of tools. Knowing I’m getting so many IOPS or throughput does me absolutely no good personally unless I can confirm or deny that it’s meeting the theoretical expectations. If I’m getting 400MB/sec, but the theoretical throughput is 600MB/sec, I know something isn’t right and will dig until I find out why I have a gap in expectations and results.

Something worth mentioning is that using CrystalDiskMark in this manner only tests the maximum throughput of your SAN. If you’re using iscsi, you may only see a maximum of 165MB/s or so due to the limits of 1gig ethernet.
That doesn’t necessarily mean your SAN is slow though. If you’re using MPIO and have lots of fast disks, you can still serve a very large number of IOPs.

I use CrystalDiskMark quite a bit. In my experience, the main numbers you want to look at / optimize for are sequential reads for table/index scans, and something in between the 512K and 4K random reads for the rest of an OLTP workload.

Although there are exceptions of course, most apps running in SQL Server don’t seem to generate a queue depth of anywhere near 32 on a single LUN, so I prefer the QD=1 numbers as a first-cut guide.

BTW, there’s a text box at the bottom of the form that you can use to describe the test environment/date, etc — that way you can capture that info in your screenshots, rather than relying on file names.

So Linchi Shea blogged about the importance of fewer and defragmented VLFs for transaction log files (especially for dbs that support a lot of large bulk operations). Is that because we want the write behavior of the TLog file to be as close to the number reported by crystal disk mark as “Sequential” writes? (MB/s).

As a follow up question. Can we use slow average write stalls (calculated from info in sys.dm_io_virtual_file_stats) as a symptom of too-many-VLFs?

Maybe the questions don’t make sense. Any help or links are appreciated.

Michael – well, no, not really. The VLF thing has less to do with disk fragmentation and more to do with the way SQL Server manages virtual log files. We do want the write behavior of log files (and data files) to be as high as possible, but realistically, it’s pretty hard to get SQL Server to crank out the same kind of pure sequential throughput that we can get with synthetic tests. If we have exactly one database, and we’re not doing t-log backups as we’re loading, then it’s possible to get close. Problem is, multiple databases’ log files on a shared log file drive, plus transaction log backups going on at the same time, mean that we end up doing a lot of random access anyway.

I wouldn’t directly draw a line between too many VLFs and slow write stalls. That’s totally possible – I just haven’t done that measurement. The two things are easy enough to measure independently – you can easily check VLF count with Dave Levy’s scripts at http://adventuresinsql.com/2009/12/a-busyaccidental-dbas-guide-to-managing-vlfs/ and if it’s high, then we shrink & regrow the log files to correct it. I measure storage stalls with sys.dm_io_virtual_file_stats, but if it’s slow (even for log files), VLFs aren’t the first place I look. I usually check for things like 4k NTFS allocation units, RAID 5, writing too much to storage, etc.

Hello Brent, I am having trouble reading the results from Crystal Disk Mark. I am using a NetApp for the sql server data files and a raid 10 for the OS. I am running SQL Server 2008 R2 Enterprise x64 on Server 2008 R2 Enterprise (this is a cluster). I went through the bandwidth reference poster and I still dont understand the results.

Below are my Crystal Disk Mark results, are these good or bad?
How do I compare the results with the bandwidth reference poster?

Rick – howdy sir! The MB/s numbers are megabytes per second, and that compares to the megabytes per second number on the poster. When you compare those, what’s the fastest number that you’re getting in CrystalDiskMark, and does that correspond to a cable type that Kendra shows on the poster? Does that happen to match up to the cable type that you’re using?

Hello Brent, thank you for responding so quickly! The fastest number I see in CrystalDiskMark is 4K QD32 – 118.1 Write MB/s. That number does not correspond to the cable we are using, we use 8 Gb Fiber Channel cable.

Hi, I was wondering in what way the Perfmon counters are different than the CrystalDiskMark figures, as Rick was asking, because I was noticing the same thing here, but I can’t explain it. Is there any correlation between them? Thanks for any insights!

Couldn’t help noticing that you did a CrystalDiskMark test on the Samsung 840 Pro (512GB) testing various RAID config some time ago.
We are getting a paltry 87MB/s for seq writes for the same Samsung disk in a RAID1 config vs 377MB/s without RAID.
Would this be normal or have your alarm bells going off?!

Hello…can u please take a look at my results
Seq –> Read :162.2 Mb/s Write :153.3 Mb/s
512K –> Read :43.01 Mb/s Write :117 Mb/s
4K –> Read :0.718 Mb/s Write :4.458 Mb/s
4K QD32 –> Read : 13.42 Write :16.58
Seq –> Read :162.2 Mb/s Write :153.3 Mb/s
These are on a brand new EQ 4100x…2 x1 Gb iSCSI…nothing is connected except the testing volumes
Dell told me that the SAN is running just fine but i just admit that this is the best 22 HDDs -RAID 50 can deliver…
I tried to point them to these poor results from CD or Hdtune but i hit a brick wall that they are not validated and only IOmeter is good…
Take a note that when Iometer runs on 64 threads i saturated the network bandwidth but when i reduced the threads to 1 i get around 30 Mb/s….to me it seems bizarre

9 Runs of 4Gb in Crystal Mark, each San Controller has 16Gb cache so that may be affecting results.
The Test Server is HyperV 2012R2 (running on a 3 Node HyperV 2012 R2 cluster). The test drive is a Dynamic .vhdx of 100Gb dedicated for the test. No other SAN activity is occuring at the time of these tests as it is PreProd.

Mike – that’s interesting, but I think the number of variables (VM, CSV cache, tiering) makes things a little unpredictable. I’m also rather disturbed by how low those random writes are – there’s something wrong there. I’ve seen much higher with Compellent gear, so I’d say it’s probably time to start troubleshooting. Thanks though!