"Waits, Waits, Do Tell Me" – how to analyze bottlenecks in SQL Server

“Waits” in SQL Server are often a major source of performance issues. "Waits" as they are known in the SQL Server world essentially indicate that a worker thread inside SQL Server is waiting for a resource to become available before it can proceed with executing. See, a thread is tasked with executing code, it blossoms and blooms, it is “alive” when it is executing code on the CPU. “Waits” on the other hand are stagnation, obstruction, delay – preventing a thread from “realizing its full potential” executing the task it is given. Therefore, waits are the "opposite" of execution, i.e. when a SQL Server thread is waiting (stuck behind a resource), it is necessarily not executing (running code on the CPU). When you think of “bottlenecks” in terms of performance, “waits” is what you are really thinking about: “where are things getting stuck?”, “where is the road becoming narrower?”, “where is the neck of the bottle?”

From my comparative experience with other software products (outside the database world), I think we are very fortunate that SQL Server is transparent about the roadblock it encounters. SQL Server uses Wait Types (scroll down in that article to view entire table) to show us the particular bottleneck a thread is waiting on. Wait types themselves have been discussed extensively in the blogosphere, so I won’t delve into it.

One key element that is sometimes omitted in bottleneck analysis is the “wait time”. Knowing what type of bottleneck a SQL thread is behind is indeed important. But how long the wait lasted is equally important. For example, a wait for 15 ms. is not something I lose sleep over, but a wait that lasts 5000 ms certainly gets my attention, especially if the same session is reporting it multiple times in the lifetime of a query execution.

Armed with this knowledge, you are ready to collect some data, aren't you? There are 4 data points, so to speak, that may be of interest:

1. Overall waits at the SQL Server – helps with understanding trends and what are most common waits

3. All waits occurring in a particular session – helps with exhaustive analysis of all the waits for a session

4. All waits occurring across all sessions – helps with exhaustive analysis of all waits occurring on all sessions. This is a fairly expensive data collection in terms of server impact and it terms of huge amounts of data that need to be analyzed. I generally discourage people from doing this.

This method essentially captures periodic snapshots from DMVs such as sys.dm_exec_requests and shows what specific wait types each session is experiencing at the time of the capture. Thus if you capture multiple snapshots over time, you can observe somewhat of a trend and make inferences as to what is the predominant bottleneck for each session (or all sessions). This does allow you to focus on a specific session and therefore narrow your troubleshooting in case you know which query is problematic.

I don’t need to reinvent the wheel on this as far as scripts are concerned. The PSSDIAG tool does an excellent job collecting data, if you enable “SQL 2005 Perf Stats” or “SQL 2008 Perf Stats” (depending on SQL version) custom Diagnostic (on by default)

In essence, though, these scripts perform a task similar to the following and you can use the following script to do the work:

set nocount on

while (1=1)

begin

select session_id,wait_type,wait_time,text from

sys.dm_exec_requestsr

cross apply sys.dm_exec_sql_text(sql_handle)

waitfor delay'00:00:05'

end

3. All waits occurring in a particular session

(most precise but most expensive)

If you have narrowed the problem down to a specific query and you have indications that the query is slow due to waits then you can capture the entire set of wait types and corresponding wait durations for the lifetime of a query using Extended Events. This method works best if you can reproduce the issue at will.

Capture an Extended Event trace

drop eventsession AllWaitsForSingleSession on server

go

create event session AllWaitsForSingleSession on server

add event sqlos.wait_info

(action (sqlserver.sql_text)

where sqlserver.session_id=52 --NEED TO change SESSION_ID that matches your repro

--and duration >=5000 -- you may choose to only capture waits above a certain duration threshold

and opcode= 1 ),

add event sqlos.wait_info_external

(action (sqlserver.sql_text)

where sqlserver.session_id=52 --NEED TO change SESSION_ID that matches your repro