Benchmark SQL Server Wait Statistics

One of the secret weapons in performance tuning with SQL Server is understanding wait statistics. Every time a process (spid) changes its status from running to suspended it is typically due to resource wait or a signal wait. For example, you are typically waiting on PAGEIOLATCH_SH when you run a query that needs to pull data from disk into memory for a select statement. Understanding wait statistics can be a very helpful tool in your tool-belt if you are brought in to troubleshoot a critical issue due to slow performance. Wait statistics can quickly eliminate several resources that aren’t causing your performance problem and allow you to focus on the root cause.

How Do We Get Wait Statistics?

Starting in SQL Server 2005 we were given access to dynamic management views and functions. One that is very helpful with collecting waits is sys.dm_os_wait_stats. You can access your top waits by using the following query.

SELECT TOP 10 *
FROM sys.dm_os_wait_stats
where wait_time_ms > 0
order by wait_time_ms desc

This DMV collects waits from the time the instance starts unless its cleared with the following DBBC command below. I strongly recommend that you grab snapshots of this view and compare instead of purging the history. Once you run the DBCC SQLPERF command you lose all history of your waits. We will talk a little bit more about this in the next section.

DBCC SQLPERF ('sys.dm_os_wait_stats', CLEAR)

Mistakes with Wait Statistics

Typically in the field I have seen two different mistakes with Wait Statistics. Both mistakes are fixable. The first is an easy fix, the second one takes some knowledge to prevent knee jerk reactions that can cause more problems.

The first mistake I see is people constantly running DBCC SQLPERF to clear out the sys.dm_os_wait_stats to get a starting point. Typically, one would clear out all the waits wait a little bit and use a great query like Paul Randal’s – tell me why it hurts script or Glen Berry’s DMV scripts to see what waits occurred in the last few seconds. Logically, this seems okay but you just dumped your history of wait statistics. You cannot go back and see what were your top resource waits over time. I strongly recommend capturing a snapshot of this view, waiting a few seconds and then capturing it again and then compare the two captures. This way you still have your history.

The second mistake is not understanding the wait types or making bad decisions based on the top wait types. Are there good waits types? What reactions should I take due to the wait types? These are both very good questions and we cover them at a basic view in this post. First, thing I recommend doing before running any wait stats queries is get a good reference list of waits types. I personally recommend MS Whitepaper on Waits and Queues and The SQL Server Wait Type Repository by CSS SQL Server Engineers. These can be very helpful when you need to understand your top waits or build an action plan behind your waits.

Let’s take a look at the TOP 10 query result set from the query above.If you look at the top four waits in CSS SQL Server Engineers blog you will notice that for various reasons it is safe to skip these wait types. Therefore, when were collecting waits we would want to bypass waits that are not indication of resource pressure actionable.

Finally, you need to be careful about the advice on the internet. It can be like taking candy from strangers. There still quite a bit of misleading advice that can cause more harm than good. An easy example is CXPACKET waits. There are articles that recommend setting Max Degree of Parallelism to 1 for the instance. This would actually be the last thing I would want to do by just seeing CXPACKET as a top wait type. I would want to find the queries causing CXPACKET waits see if they are effecting the business and tune them if needed. If that isn’t possible than I would consider this recommendation.

Capture Wait Statisitcs

So, now if you are still reading you have read about why understanding wait statistics is a good thing, basics of collecting wait statistics, and common mistakes with wait statistics. Lets jump to how I go about collecting wait statistics.

The following script will utilize three temp tables. One for a snapshot of wait statistics and another for comparing waits, and the last table for wait types we would want to skip due to the waits being background processes or waits that could safely be ignored. It will get a delta of your wait statistics for five minutes.

Benchmark Wait Statistics

Finally, I will include the code used for creating a stored procedure that will give you control on how long the wait statistics capture runs and how often we should collect during the capture process.