SQL Server records ongoing disk latency metrics in the form of ‘disk stall’ values, as collected by SQL Server on an ongoing basis. The challenge is that these stall metrics are cumulative from the last time that the SQL Server service was started, instead of an average value on a sample window. Therefore, an ongoing collector can be set up to store these cumulative values on a fixed time interval, and then calculate the latency averages on a small granular window of time. These metrics provide a deeper look into the latency values, and can be used to analyze the ongoing latency state of the server’s disks. Overlay this data with the drive-level data collected through Windows Perfmon counters to see if you have any hot-spots on one or more disks.

First, either create a new database or select an existing DBA utility database. This example uses a new database called CollectorTemp.

Create a SQL Server Agent scheduled job that executes this stored procedure on an ongoing basis. A recommended value is every 30 seconds if a latency problem is suspected, or one minute if no problem is currently under review.

To view the raw values per database data and log file, execute the following query.

SELECT * FROM dbo.DiskStall ORDER BY CollectionDateTime, ServerName, DatabaseName

This query extracts the raw data, along with the calculated values for disk read and write latency over that sample period. However, analyzing these values over a longer window of time are necessary for determining the running state health of the environment. Performing a percentile analysis provides us a much more granular and significant view of the performance of the environment. To perform a percentile analysis, this SQL Server 2012 and above query will determine the running percentile metrics for the sample window. Adjust the time window being queried to match your specific requirements.

SELECT DISTINCT databaseid,
servername,
databasename,
physicalfilename,
databasefileid,
Round(Percentile_cont(0)
within GROUP (ORDER BY avgreadstallms) over (
PARTITION BY physicalfilename), 2) AS ReadStallms00,
Round(Percentile_cont(0.25)
within GROUP (ORDER BY avgreadstallms) over (
PARTITION BY physicalfilename), 2) AS ReadStallms25,
Round(Percentile_cont(0.5)
within GROUP (ORDER BY avgreadstallms) over (
PARTITION BY physicalfilename), 2) AS ReadStallms50,
Round(Percentile_cont(0.75)
within GROUP (ORDER BY avgreadstallms) over (
PARTITION BY physicalfilename), 2) AS ReadStallms75,
Round(Percentile_cont(0.9)
within GROUP (ORDER BY avgreadstallms) over (
PARTITION BY physicalfilename), 2) AS ReadStallms90,
Round(Percentile_cont(0.95)
within GROUP (ORDER BY avgreadstallms) over (
PARTITION BY physicalfilename), 2) AS ReadStallms95,
Round(Percentile_cont(0.96)
within GROUP (ORDER BY avgreadstallms) over (
PARTITION BY physicalfilename), 2) AS ReadStallms96,
Round(Percentile_cont(0.97)
within GROUP (ORDER BY avgreadstallms) over (
PARTITION BY physicalfilename), 2) AS ReadStallms97,
Round(Percentile_cont(0.98)
within GROUP (ORDER BY avgreadstallms) over (
PARTITION BY physicalfilename), 2) AS ReadStallms98,
Round(Percentile_cont(0.99)
within GROUP (ORDER BY avgreadstallms) over (
PARTITION BY physicalfilename), 2) AS ReadStallms99,
Round(Percentile_cont(0.999)
within GROUP (ORDER BY avgreadstallms) over (
PARTITION BY physicalfilename), 2) AS ReadStallms99d9,
Round(Percentile_cont(1)
within GROUP (ORDER BY avgreadstallms) over (
PARTITION BY physicalfilename), 2) AS ReadStallms100,
Round(Percentile_cont(0)
within GROUP (ORDER BY avgwritestallms) over (
PARTITION BY physicalfilename), 2) AS WriteStallms00,
Round(Percentile_cont(0.25)
within GROUP (ORDER BY avgwritestallms) over (
PARTITION BY physicalfilename), 2) AS WriteStallms25,
Round(Percentile_cont(0.5)
within GROUP (ORDER BY avgwritestallms) over (
PARTITION BY physicalfilename), 2) AS WriteStallms50,
Round(Percentile_cont(0.75)
within GROUP (ORDER BY avgwritestallms) over (
PARTITION BY physicalfilename), 2) AS WriteStallms75,
Round(Percentile_cont(0.9)
within GROUP (ORDER BY avgwritestallms) over (
PARTITION BY physicalfilename), 2) AS WriteStallms90,
Round(Percentile_cont(0.95)
within GROUP (ORDER BY avgwritestallms) over (
PARTITION BY physicalfilename), 2) AS WriteStallms95,
Round(Percentile_cont(0.96)
within GROUP (ORDER BY avgwritestallms) over (
PARTITION BY physicalfilename), 2) AS WriteStallms96,
Round(Percentile_cont(0.97)
within GROUP (ORDER BY avgwritestallms) over (
PARTITION BY physicalfilename), 2) AS WriteStallms97,
Round(Percentile_cont(0.98)
within GROUP (ORDER BY avgwritestallms) over (
PARTITION BY physicalfilename), 2) AS WriteStallms98,
Round(Percentile_cont(0.99)
within GROUP (ORDER BY avgwritestallms) over (
PARTITION BY physicalfilename), 2) AS WriteStallms99,
Round(Percentile_cont(0.999)
within GROUP (ORDER BY avgwritestallms) over (
PARTITION BY physicalfilename), 2) AS WriteStallms99d9,
Round(Percentile_cont(1)
within GROUP (ORDER BY avgwritestallms) over (
PARTITION BY physicalfilename), 2) AS WriteStallms100
FROM dbo.diskstall
ORDER BY
databasename,
databasefileid

You can also create a SQL Agent job that can purge the stats over some window of time so that the database does not grow out of control. The code to perform the cleanup can look like the following block for a 60-day retention window.

--clean up after 60 days
CREATE PROCEDURE dbo.usp_DiskStall_Cleanup AS
SET NOCOUNT ON;
DELETE FROM dbo.diskstall WHERE CollectionDateTime < DATEADD(day, -60, GETDATE())