Edit: I blogged an update script to capture a snapshot of I/O latencies in this blog post.

Over the last few months I’ve been lecturing at classes and conferences about getting SQL Server’s view of the I/O subsystem and what latencies it is experiencing, so time for a blog post to help everyone else.

Most SQL Server’s today are I/O bound – that’s a generally agreed-on statement by DBAs and consultants in the field. This means that the major factor in server performance is its ability to perform I/O operations quickly. If the I/O subsystem cannot keep up with the demand being placed on it, then the SQL Server workload will suffer performance problems.

Now, saying that, one trap that many people fall into is equating increased I/O subsystem latencies with poor I/O subsystem performance. This is often not the case at all. It’s usually the case that the I/O subsystem performs fine when the designed-for I/O workload is happening, but becomes the performance bottleneck when the I/O workload increases past the I/O subsystem design point. The I/O workload increase is what’s causing the problem, not the I/O subsystem – if you design an I/O subsystem to support 1000 IOPS (I/O operations per second – and making sure you’re using the right I/O size and the workload characteristics make sense to be defined in terms of the number of random IOPS) and SQL Server is trying to push 2000 IOPS, performance is going to suffer.

If you find that I/O latency has increased, look to a change in SQL Server behavior before blaming the I/O subsystem. For instance:

Decreased server memory leading to a smaller buffer pool and increased lazy writer and read activity

And a whole host of other reasons can lead to an increased I/O workload where it’s not the I/O subsystem’s fault.

On the other hand, however, it may very well be the I/O subsystem that has an issue if the SQL Server workload is the same. A SAN administrator may have decided to give some of the space on one of the SQL Server LUNs to another server, which can lead to an overload.

Using performance counters, you can see at the physical disk level what the latencies are (looking at the Avg. Disk sec/Read and Avg. Disk sec/Write counters) but if you have your databases spread across a few LUNs, that doesn’t help you pinpointing which database files are experiencing latency issues and driving the most I/Os to the I/O subsystem.

sys.dm_io_virtual_file_stats

This is where sys.dm_io_virtual_file_stats comes in. It was introduced in SQL Server 2005 as a beefed-up replacement for fn_virtualfilestats and shows you how many I/Os have occurred, with latencies for all files. You can give it a database ID and a file ID, but I found it most useful to look at all the files on the server and order by one of the statistics.

This isn’t that useful because a) I don’t have database IDs and file paths memorized, and b) it gives aggregate latencies (io_stall_read_ms and io_stall_write_ms).

Viewing Aggregate Information

What I usually do is use the script below – part of my standard set of scripts I use when doing a server health check for a client. It’s based in part on code from by good friend Jimmy May (blog|twitter), with a bunch of tweaks. It allows me to filter on read or write latencies and it joins with sys.master_files to get database names and file paths.

This is much more useful as it allows me to quickly see where the read and write hot spots are and then drill into a database to see what’s going on, and if nothing out of the ordinary, ask the SAN admin to move those hot spot files to dedicated and/or faster storage.

Making Use Of This Information

Beware: This information is the aggregate of all I/Os performed since the database was brought online, so the I/O times reported by the script are averages. The longer the time the data set covers, the more likely that transient latency spikes will be masked. The best way to use this data for analysis is to periodically capture snapshots of the sys.dm_db_io_virtual_file stats data and then examine the averages over that small time period. That way you’ll be able to see latency spikes. [Edit 2014: I blogged a script to capture a snapshot of I/O latencies here.]

One question I’m sure you’re going to ask is “what is an acceptable read or write latency?” and the answer is a big “it depends!”. It depends on the I/O subsystem and how it’s been configured. The key is producing a performance baseline for when things are running acceptably well and then seeing where the results from the DMV deviate from your baseline. On well-configured storage which isn’t being overloaded I’d expect to see single-digit ms for either read or write latency, but this will vary based on the rotation speed and technology of the drives (SCSI vs SATA vs SSD etc).

Well they won’t spoil the measurement, but they will add a lot of extra I/O workload and that needs to be taken into account when designing your I/O subsystem. An alternative is to offload the CHECKDB workload to another system after restoring a full backup there.

[…] Wait statistics are the bread-and-butter of performance tuning. SQL Server is keeping track of what resources threads need to wait for, and how long they need to wait. By analyzing which resources (and combinations of resource) are being waited for the most, you can get an idea of where to start digging in further. An example might be that if most of the waits are PAGEIOLATCH_SH waits, and this wasn't the case in your wait stats baseline, you might look at the I/O subsystem performance using the sys.dm_io_virtual_file_stats DMV (which I blogged about here). […]

i am trying to estimate the total bytes/sec of throughput that i am getting from my sql server and am wondering if it is accurate to say that my total bytes/sec of throughput would be = num_of_bytes_written/io_stall_write_ms?

Hi
I often use the sys.dm_io_virtual_file_stats DMV to calculate Avg Service Time for a file and also for a drive (by aggregation).
Recently, i have compared DMV latency results (ReadLatency and WriteLatency for each drive) and perfmon counters (Avg. Disk sec/Read and Avg. Disk sec/Write)
It showed [Avg Time ms/Read] was more important in DMV results than perfmon counters.
what is the reason for that ?

Not sure what you mean by ‘more important’? If you mean that the avg time was higher in the DMV than in perfmon, could be that there are other uses of that drive doing I/Os that will be taken into account by perfmon, but not by the DMV, which is only measuring I/O latency for those files managed by SQL on that drive. Remember also that the DMV is aggregated data since the database came online whereas perfmon is a rolling average – that could also make perfmon lower if there have been some spikes of high I/O latency in the past that the DMV is still taking into account.

Hello and thank you for your reply
In fact, I have a process that collects data from the dmv dm_io_virtual_file_stats every minute and that makes the difference between each collection.
I therefore compare the results of the dmv (select Drive, sum (io_stall_read_ms) / sum (num_of_reads) from group by dm_io_virtual_file_stats_diff Drive) with perfmon counters (Avg. Disk sec / Read)
From a general way, we have similar time but when high latency disk, the time seen through my collection “DMV” are more important than the perfmon data (20-40%).
I am sure there is no other activity on the monitored drive (just SQLServer activity)
Is it possible to explain this phenomenon?
Is that these 2 two types of counters (perfmon and DMV) is based on the same metric?
I hope they are not stupid questions

They’re not actually the same metric. The DMV is measuring everything from SQL Server to the disk and back. The PerfMon counters are just measuring the time Windows sees for the I/O. There could things like filter drivers etc in the way. Any significant difference between them can be explained by something taking time between SQL Server and Windows issuing the I/O.

Dear Paul,
I am seeing lot of negative values in Latency value of the given query. Is that the indication of an IO issue? Also I am seeing lot of events like this in eventlog and sql error log. SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file. Please advice.

I don’t see how you can get negative values for latencies, unless the DMV is returning negative values for total latencies, which would be a bug. If you’re seeing I/O stall warnings in the error log, that’s usually a sign of problems with your I/O subsystem.

Since the dmv gives cumulative numbers, I persist the results each hour and compare sequential results. This way we found out our latencies are “acceptable” during production hours, while they are terrible during night due to index rebuilds, integrity checks, database backups, system backups, etl processes etc.

While creating the view (eg ([io_stall_read_ms] – previous.[io_stall_read_ms]) / ([num_of_reads] – previous.[number_of_reads]) I also saw negative values, caused by the fact the services had been restarted (counters are reset) between 2 measurements. After testing on service restart using (SELECT crdate FROM sys.sysdatabases WHERE lower(NAME) = ‘tempdb’) I still saw some negative numbers. It seems to me this is caused by database restore, so now I just test whether bytesread and byteswritten are higher in the previous measurement. In that case I assume the values represent the first measurement, I can compare with.

I just started doing the same Peter. Doing any sort of computation on the DMV results is very, very misleading. Where the DMV was telling me I had IO wait times for writes of >200ms for each of my TempDB data files, my new hourly/differential report indicates the actual to be <5ms throughout the day with very few exceptions thus far. Since I just set this up three days ago, I need to gather more data but I see nothing that would explain why the DMV is returning such a high value. I don't do any crazy maintenance on the weekends – time will tell I suppose.

I’m seeing high avg. disk sec /read , and sec /write, but no high page waits in perfmon. Trying to figure out how to compare those values, vs dm_io_pending_io_pending requests, and sys.dm_io_virtual_file_stats. My issue is I have four data files two split across two LUNS each total 660 Gigs. Plan is to move off to four LUNS put one data file per drive on an upgraded o/s, with more memory, cpu etc. My question is there a rule of thumb as to when to add more data files to your databases. And during a rebuild operation does it split the load across the number of data files defined in your database?

What do you mean by ‘page waits in perfmon’? Are you looking at wait statistics? You should be seeing PAGEIOLATCH_XX waits. Too complex a topic for a comment. Read through my benchmarking and IO subsystem series. Yes, during a rebuild the allocations will come from all files in the filegroup where the index is stored.

Thank you for the query in the blog. I ran the above query and found that the write latency is quite high(532) for
tempdb and for another user database which stores images in this case 19. We have a clustered environment running
in sql2008R2. Unfortunately all the mdf ldf and tempdb had been installed in one lun(Drive:E). How can we improve the
I/o performance although the users have not complained so far.

Paul, thanks for your ever useful information. I was wondering if you could suggest where to look in my situation. We have our 64bit Enterprise, 96GBRam DWH server, connected to an array of SAN disks. the disk IO hasn’t been great so I ran your query – the top 13 files:

There’s a mismatch between the actual performance of the SAN (which is excellent) and SQL Server’s experience of it and I cannot figure out what steps to take next. What would impact SQL Server’s use of the disks?

Is your SQLIO test properly simulating your production workload? That’s all I can think of. You might also try taking two snapshots of the DMV data and then diff’ing them to see what I/O characteristics are over a period of time, rather than for all time. Could be that performance sucked earlier and now it’s fine.

Why are we dividing by num_of_reads? Shouldn’t we dividing by sample_ms instead? Don’t we want to see the average number of read stalls in the span of time? I am sure there is an explanation, and it is possible that sleep depravity has caused brain latency in my head :)

You mentioned that ” if you design an I/O subsystem to support 1000 IOPS (I/O operations per second) and SQL Server is trying to push 2000 IOPS, performance is going to suffer.” Is there a DMVs I can use to find the IOPS of my current DB is pushing? How do I find out if I push more IOPS than the system can support?

You can use the IO count fields in the DMV from this article to see how many IOs per second, but to get a really detailed view, you may need to move to using Extended Events and aggregating the IO data from there – but that’s likely overkill.

when I run the script, I got two entries as follows:
1-
17 ,0 ,17, 515665, 8192, 515665 ,C: local C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\Research2.mdf

2-
8 ,0, 8, 5347, 3379, 5335, C: local C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\Research2_log.ldf
the query is read intensive one as you can see from entry 1 but why there is ave write latency when there is no write latency?

for entry 2 as you can see, latency time is shorter than entry 1 should I include its data in my analysis?

Hello Paul, Thanks for the very informative blogpost. We are trying to set up our new server. we configured RAID 10 SSDs(8 * 800GB) for tempdb. Even with SSDs, We are seeing slow write latency for LOG file. We did a single threaded test(no other user query was running) and did a

We are consistenly seeing Write IO latency for temdb log file between 6ms and 7ms.

Can you please advise if this the best we can get out of SSD for LOG file write latency? what are we doing wrong? Would putting log file on seperate SSD LUN help(thought it wouldn’t as we don’t have spinning discs)? Any other suggestion to help improve this writelog latency? Thanks.

Are you sure you’ve configured the array correctly? I can get log write latency of 1ms or less on my laptop’s internal SSD. I would get an IO subsystem expert involved to help you configure things correctly.

HI Paul,
I ran the script on my PROD and got:
all 6 .mdf(.ndf) of TempDB files have ‘ReadLatency’ (6), WriteLatency (106), Latency (77)….. and a .ldf file has ‘ReadLatency (3), Write latency (6) and ‘Latency (6)….
Looks like it’s a problem but no complain from customer.
If I want to find cause where should I look into?
Thanks

In My case should ‘snapshot isolation’ be turned ON or OFF (it’s off now) on TempDB? (just want to know your suggestion I am not changing on Production.)
What’s the best way (script) to find ‘memory spills’?
Thanks in advance.

I don’t know anything about your workload, but usually people don’t use either flavor of snapshot isolation in tempdb. Lots of different ways to look at spills – Google ‘sql server memory spills’ and read some of the posts out there.

Hi Paul,
I used the query on this blog post to look at IO latency on my Prod server and found that my Write Latency is high.
Then I used your script that waits for 30 minutes and used that and it shows write latency as zero.
What Am I doing wrong?I was expecting fairly similar numbers from both queries.

Isnt this part of the query where it tells me write latencies?
[WriteLatency] =
CASE WHEN [num_of_writes] = 0
THEN 0 ELSE ([io_stall_write_ms] / [num_of_writes]) END,

I used this.
The second query outputs the delta between two time periods(30 minutes apart ) and I shouldn’t be seeing too much delta am I correct?
I first ran the query to give me all latencies and then I ran the 30 min delay query.

Yes. But I asked ‘How are you measuring write latency with the wait stats script?’. And you didn’t run the queries at the same time – so you can’t correlate the I/O latencies with the waits, as they were measured at different times…

Paul,
I’m confused.I thought your script was just doing that-using the DMV to measure write latencies. Looks like I’m really missing something big here. Could you please point me to an article or help me understand this better please?

Oh – I thought you said you measured latencies and then measured wait stats – my bad. Ok – so the script that just looks at the DMV output is showing average latencies since the server was started/database came online. The 30 minute script shows the I/Os that occurred just in those 30 minutes. If there were no I/Os, or no I/O contention, it’s entirely possible that the second script shows low latencies but the overall aggregate shows higher averages. Or vice-versa too.

Thank you Paul, that explains everything. The server latencies were high since it was up from a long time and I have Ola’s maintenance plan with index rebuilds that may have spiked it. over a period of 30 minutes at various time of the business day the delta of latencies are almost minimal(single digit), but my writes and sorts in Tempdb are way high for few of our stored proc’s in an application and now trying to test using your queries if putting TEmpdb on a separate LUN will reduce them.Thank you very much once again.

I have separate tempdb data drive T and separate tempdb db Log drive, all ssd. DMV ‘sys.dm_io_virtual_file_stats’ (using your above query) showing T Drive tempdb data drive write-latency is 83. Perfmon and Resource monitor is not showing any disk latency at the same time. how can i prove that DMV results are correct. I’m arguing with system administrators. there is nothing else on Tempdb data drive T.