How To Get High Quality Information About Query Performance

There are a zillion reasons why Profiler and its friends by far are the most potent tuning tools available for SQL Server DBAs. This post focuses on one of them but I will mention a few others at the end as well (download details for the tools mentioned are also listed at the end).

With SQL Server 2005 a new world opened to SQL Server DBAs with the introduction of DMVs and there are even more available in SQL Server 2008. There are 2 groups of DMVs that has, right-fully, gained enormous popularity - the sys.dm_db_index_* DMV's and the sys.dm_exec_* DMVs.

With the introduction of the DMVs, I have noticed that a lot of people are not inviting Profiler to their tuning parties anymore – I will show you why he (she?) will always be on my guest list.

In this post, I am going to talk about the limitations of sys.dm_exec_query_stats and sys.dm_exec_cached_plans. I will then show you how you can get aggregated performance data with much higher quality using Profiler trace data and some of the easy-to-use lightweight tools that Microsoft and the community provide for free.

Needless to say, there are situations when running a trace is not feasible and the sys.dm_exec_* DMV is a very welcome option. It’s never going to hurt anyone to tune whatever TOP queries you identify with these DMVs. For real-time troubleshooting, they are probably also your only option but then you need to be aware that sys.dm_exec_query_stats only contains aggregated data about query executions that has actually finished executing. Sys.dm_exec_request – or some combination - is a better option for real-time troubleshooting.

However and whenever you use sys.dm_exec_query_stats, you need to be aware of its limitations.

There are numerous scripts out there that queries sys.dm_exec_query_stats from different angles and they will typically look something like this:

The first time you ever used this DMV you probably thought that the gates of heaven just opened up in front of you. No more getting approval for running traces, no more copying and wading around in MBs or GBs of trace data. For a few chosen ones, this might have been the case but for most of us that was not really the case. Especially if you, like me, tend to look at many different systems and have limited amount of time to assess the workload.

Depending on the access methods used on the particular server you are looking at, you might only see half of the truth, if you are lucky the output shows you something very close to the whole truth. But it will never show you the whole truth.

Why Not The Whole Truth?

The inherent problem with with sys.dm_exec_query_stats is that it shows you aggregations from sys.dm_exec_cached_plans. And contrary to what you might believe, not all query plans will go into the procedure cache. Even if you did know this, you may not realize the full implication.

Some of the things that will NEVER be cached are:

1. DBCC and BACKUP statements

2. Any query that was issued with the RECOMPILE statement

3. So called "zero-cost" plan queries - i.e. queries that are very cheap to compile but NOT necessarily cheap to execute.

In addition to this,

You cannot see the accumulated numbers for a query whose plan is never re-used (one that uses inline literals and is not parameterized implicitly or explicitly).

Execution time for things like EXEC extended_stored_procedure or CREATE INDEX will not be included in sys.dm_exec_query_stats.total_worker_time.

On top of this, a plan can get thrown out of cache at any point in time and once it is out, it will not show up in sys.dm_exec_query_stats.

Unless you are certain that your ENTIRE WORKLOAD consists of queries that are parameterized and that none of your procedures contain “funky stuff” such as EXEC ext_stored_procedure; you absolutely have to use Profiler (or the corresponding sp_trace_* procedures) to get an accurate view of your workload.

An example of an Ad-hoc query that would “escape” sys.dm_exec_query_stats is the following:

USE Adventureworks

GO

SELECT

sp.salespersonid

, so.salesorderid

FROM sales.salesorderheader so JOIN sales.salesperson sp

ON sp.salespersonid = so.salespersonid

WHERE so.salesorderid = 43860

The above query will generate a new plan for each execution provided that the salesorderid is different and you will find a different entry in sys.dm_exec_query_stats and sys.dm_exec_cached_plan for each execution:

SELECT

total_worker_time as CPU

, total_elapsed_time AS Duration

, total_logical_writes AS TotWrites

, total_logical_reads AS TotReads

, qs.execution_count AS NoOfExecs

, text

FROM sys.dm_exec_query_stats qs

CROSS APPLY sys.dm_exec_sql_text(sql_handle)

Fig 1. The two executions show up as separate queries in sys.dm_exec_query_stats.

For my second scenario, I created two stored procedures with the only difference that I added a WAITFOR (to simulate a call to e.g. an extended stored procedure) in one of them.

CREATE PROCEDURE uspGetSalesPerson2 (@p1 int)

AS

BEGIN

WAITFOR DELAY'00:00:10' -- uspGetSalesPerson WITHOUT the WAITFOR

SELECT

sp.salespersonid

, so.salesorderid

FROM sales.salesorderheader so JOIN sales.salesperson sp

ON sp.salespersonid = so.salespersonid

WHERE so.salesorderid = @p1

END

And then ran them:

uspGetSalesperson2 44501

go

uspGetSalesperson 44501

Fig 2. As you can see, the 10 WAITFOR seconds in supGetSalesperson2 are not accounted for in the total_elapsed_time (Duration) at all.

What About SQL Server 2008?

In SQL Server 2008, there is something called query fingerprints/query hash which will let you see aggregates for non-parameterized queries such as the above. It can also be used to identify which queries in your workload that would benefit from parameterization.

To address the problem with duration for statements other than SELECT/INSERT/UPDATE/DELETE, there is a new DMV in SQL Server 2008 called sys.dm_exec_procedure_stats which aggregates consumption for ALL statements executed within a stored procedure.

Enter Profiler And Friends

Anyone who has used Profiler knows that it in itself can do nothing about the above – in fact it does anything BUT aggregate data. This is when the friends should join the party. For the above query (first executed with so.salesorderid = 44501 and then with so.salesorderid = 43860), the equivalent output from my personal favorite friends, SQLNexus and Reporter, look like this:

Fig 3. Notice the number of executes and how Readtrace has parsed the inline literal in the Query Template in order to be able aggregate the performance data.

And for my second scenario with the missing WAITFOR duration:

Fig 4. Notice the difference in Duration for uspGetSalesperson2 compared to Fig 2.

So what did I do? I first ran a Profiler trace with the default template adding just the StmtStarting and StmtCompleted events (although in a real world scenario you would want to add some more events and you might also save the trace definition as a script and use the sp_trace_* procedures to avoid the overhead of the Profiler GUI).

I then ran SQLNexus, connected to one of the SQL Server instances on my machine and imported the trace file I just created. SQLNexus will at this point create a database called SQLNexus (default which can be changed) where the crunched trace data goes and then generate a number of reports. The number of reports available depends on what events you included in the trace.

Fig 5

If you switch to “ReadTrace_Main” you will find the output found in Fig3 and Fig4 at the bottom of the screen. As you can see in Fig 3 & 4, the information is somewhat limited and there is certainly some vital information missing. For instance, I like to know whether there are variations in Execution times to pinpoint intermittent bad execution plans, resource or blocking issues.

Fig 6.

By clicking the statement in the grid, you will get to the “Unique Batch Details” report which at the end has the following data:

Fig 7.

In order to get that information, you can query the data in the SQLNexus database as well. In addition to all the data necessary to produce the reports in the SQLNexus GUI, it also contains a number of views you can query (or you can query the base tables directly). You can also make your own custom reports.

To get the data in one single report, I ran the following SELECT statement:

USE sqlnexus

GO

SELECT

vwU.NormText

, vwB.CompletedEvents

, vwB.Duration

, vwB.AvgDuration

, vwB.Reads

, vwB.AvgReads

, vwB.Writes

, vwB.AvgWrites

, vwB.CPU

, vwB.AvgCPU

, vwB.HashID

FROM readtrace.vwBatchUtilization vwB

INNER JOIN readtrace.vwUniqueBatches vwU

ON vwB.hashid = vwU.hashid

Fig 8. Notice the HashID column to the far right – that is basically a “query fingerprint” for this particular query.

Using Reporter, Baseline and Comparison Data

One really cool feature of Reporter, is that you can load two different trace data databases and look at comparative data. So you could have one baseline database and then at any point in time, capture a new trace and compare the results in Reporter:

Fig 9. Comparative data from Reporter.

It will also identify which queries are found in which trace data set (i.e. if there are new TOPn queries on the scene or if any of your old TOPn queries are gone). Be aware that you can change the TOP parameter to show anything between TOP 3 to TOP 100.

Fig 10. Identifying new queries with Reporter.

More Information About the Tools

As mentioned, SQLNexus has a number of drill-downable reports available and it can also import Performance Monitor output and WAITSTATS (sys.dm_os_wait_stats output) data. Reporter on the other hand, has the same drill-down reports but works only with trace data (and some extra features such as the comparison featured mentioned you will not find in SQLNexus).

Both utilities require ReadTrace, one of the RML Utilities released by Microsoft and the data format for both Reporter and SQLNexus is the same. By default, ReadTrace creates a database name PerfAnalysis but if you run ReadTrace via SQLNexus, the name will, as mentioned earlier be SQLNexus by default.

In SQLNexus, you can point to the trace files and it will import them for you whereas Reporter requires you to first run ReadTrace to create the database. If you are a little bit lazy like me (or “productivity focused” as I like to call it) you can use the GUI in SQLNexus to create the trace data database and then point Reporter to that database.

Another perk is that once you have the trace data, you can easily feed it to the Database Tuning Advisor (which is a good idea considering the limitations of the sys.dm_db_index_* DMVs… but that’s another story… and an upcoming blog post).

My strong recommendation is that you at least every now and then try to find a window where you can capture trace data to make a full assessment of the workload of your SQL Server(s).

If the above is not necessary to convince you to bring along Profiler and his cool friends, I will mention a few other reasons:

The ability to Replay trace data. For instance, before applying a service pack, capture a trace of your critical workload and Replay it against an upgraded test server.

(With OStress, another RML Utility, or the built-in “Replay events using multiple threads” feature of Profiler, you can simulate multiple users.)

Profiler will also catch the number of rows returned to the client so you can identify queries that are returning over-the-top amounts of rows. These queries are not necessarily expensive to execute.

You can easily detect if cursors are used and in particular cursors with a low rowfetch size

It will give you information about runtime errors and performance related warnings (such as HASH and SORT warnings)

If you have a performance monitor log from the same time window, you can correlate the data from within the Profiler GUI.

In essence, there is nothing else available that will give you a complete picture about the workload on the server. The best method to get to know a system is to gather and analyze trace data (with whatever tool you are comfortable with) and maybe even spend a bit of time just browsing through a trace of some of your more critical workload(s).

More Information and Download Areas

You can find (a lot) more information about the tools in the following article. At the end of the article you will find links to download areas:

The answer to this is that we have already fixed it, but the code has been checked into the next version of SQL Server (i.e. the next major release after SQL 2008). When it was fixed we also fixed another problem where the same DMV can report NULL in the dbid and objectid columns under certain circumstances.

Currently there are no plans to backport this fix into SQL 2005 or 2008.

> As you can see, the 10 WAITFOR seconds in supGetSalesperson2 are not accounted for in the total_worker_time (Duration) at all.

Per BOL, total_worker-time in sys.dm_exec_query_stats means "Total amount of CPU time, in microseconds, that was consumed by executions of this plan since it was compiled." If it's truly a measure of CPU time, then time spent in WAITFOR should not be counted because it does not consume any CPU time while it sleeps. In other words, there is a difference between duration and CPU time, and total_worker_time may not be measuring the duration.

Hi Elizabeth, Found your blog whilst looking for something that returned the actual T-SQL and some useful stats so many thanks for that. I have had to make a little change to your script to get the full query_text and I thought you may want to test it and maybe update your script? In the last line of the CASE I have had the TSQL truncated until I changed your script to be "end - statement_start_offset )+1 / 2)".

I believe the scenario is no more valid in 2008R2 onwards. The plan_handle is reused by the query optimizer inspite of the query containing a hard coded filter value. A parameterized version is automatically created and the query hash signature is reused across the query plan.