In both examples above the SQL executed apparently should had been safe from SQL injection, but it isn’t. Neither REPLACE nor QUOTENAME were able to properly protect and the injected division by zero was executed. The problem is the Unicode MODIFIER LETTER APOSTROPHE (NCHAR(0x02bc)) character that I used in constructing the NVARCHAR value, which is then implicitly cast to VARCHAR. This cast is converting the special ‘modifier letter apostrophe’ character to a plain single quote. This introduces new single quotes in the value, after the supposedly safe escaping occurred. The result is plain old SQL Injection.

In order for this problem to occur, there has to exist an implicit or explicit conversion (a cast) from Unicode to ASCII (ie. NVARCHAR to VARCHAR) and this conversion must occur after the single quote escaping occurred.

]]>Identifying SqlConnection objects in a dumphttps://rusanu.com/2017/09/22/identifying-sqlconnection-objects-in-a-dump/
Sat, 23 Sep 2017 06:49:51 +0000http://rusanu.com/?p=2505I recently had to troubleshoot an ADO.Net connection pool exhaust issue. This problem may indicate a connection leak, but it can also be caused by an undersized connection pool. For analysis I used Windbg. For a quick introduction on how to use Windbg with managed code, see here. Once the problem was reproduced, I took a process dump, copied the dump to my laptop and started digging.

I started by finding a SqlConnection object in the dump. !DumpHeap can be used to dump all objects in all heaps, which is rather verbose. By using the -type filter we can restrict the output to only the objects with type name that contain a given string, and I also restricted to display only the aggregated statistics (-stat) rather than each individual entry:

From this command I am interested in only one thing, the method table address for the actual System.Data.SqlClient.SqlConnection type. With this I can restrict the !DumpHeap output to only this type objects, using the -mt argument. Notice that there are 414 SqlConnection instances in the dump, although we have a default connection pool size of 100. Lets see them:

If you enabled the DML preference then the output contains hot links to dumping each object with a simple click. This is just a shortcut and you can see the actual Windbg command run by the shortcut click. Lets dump the first object:

Those used to native debugging and the eternal chase after symbol files will appreciate the fact that managed data types are self describing even on retail. I removed some clutter, the information I’m after is the _innerConnection. The SqlConnection is just a wrapper. Lets look at it:

OK, so this inner connection is of the type DbConnectionClosedPreviouslyOpened, and I can make an educated guess that this is not relevant for my investigation. Lets look instead at the last SqlConnection object from my list of 414 instances, and its _innerConnection:

This time the inner connection is of type System.Data.SqlClient.SqlInternalConnectionTds, which is more interesting. Before you object that how are you going to find the SqlConnection objects that have inner connection of type SqlInternalConnectionTds, short of searching through all instances, let me say that you don’t have to. All I wanted so far was to do an introduction to the SqlInternalConnectionTds class, which is the one we’re really interested in. I could have started directly with it and ask you to take my word for it,
but I wanted you to see how I came to find this class. So lets look at all the instances of this class, using !DumpHeap -mt ... with this class method table address:

There are exactly 100 instances in memory, so this really is the class we’re after. It represents an open SqlConnection. So we have a way of finding all active, open SQL connections in memory. What can we do with this information? The SqlInternalConnectionTds type has the _parser field which is an TdsParser instance, and this type has the _physicalStateObj field which is a TdsParserStateObject instance, and this type has the _outBuff field of type Byte[] and I will make an educated guess that this is the last packet sent to the SQL server on this connection:

With all the knowledge we have so far we can use Windbg script to dump all the live connections and information about the last packet they sent to the server:

I reckon Windbg scripts are a bit arcane in syntax. poi is used to read a pointer value from an address. by reads a single low-order byte from an address. And the magic values like 98, 8 and 28 are the offsets of the relevant fields in the object instance:

The script uses .foreach(obj {!DumpHeap ... -short}) to iterate through each instance of type SqlInternalConnectionTds in memory, and then for each instance it outputs Debug Markup Language (DML) links to output, in a tabular form, the SqlInternalConnectionTds instance, the _parser, the _physicalStateObj and the _outBuff. It then prints the last sent packet type:

BTC is a SQL Batch

RCP is a SQL procedure call (this includes any SqlCommand batch that has parameters)

BLK is a BULK INSERT operation

ATN is an attention (a request to cancel an executing command)

XML is a request to enlist in a DTC

For RPC and Batch requests the script also dumps out the actual SQL command in the request. Note that this is the last packet sent, and if the SQL text was bigger than one packet then the last packet will contain only the ending of the actual command sent. Here is the output from my actual dump I was analyzing:

You can see all 100 connections in the connection pool, what was the last packet type sent to the server by each connection, and for the applicable cases, the last SQL command executed on that connection.

As an alternative to the cumbersome Windbg script syntax you can try DbgScript, which allows you to use python, ruby, LUA or other scripting languages to control Windbg.

]]>Understanding SQL Server Query Storehttps://rusanu.com/2016/04/01/understanding-sql-server-query-store/
Fri, 01 Apr 2016 08:30:01 +0000http://rusanu.com/?p=2435Query Store was introduced in SQL Server 2016 and on Azure SQL DB v12 as a way to track query execution statistics. When Query Store is enabled it will save the query SQL text, the execution plan and execution stats like number of executions, elapsed time, logical reads etc. The information is persisted in the database and allows for later analysis. Query Store should be the first stop for anyone doing performance analysis on SQL Server. Developers need to know about Query Store in order to identify the problem spots early and focus on improving the application performance by improving the queries that will yield the bigger impact. DBAs need to know about Query store for similar reasons to understand the performance bottlenecks, but also to understand the actual workload executed on a server.

the query stats are lost when entries are evicted from query plan cache

the stats are not available over long time

Query Store resolves these drawbacks by persisting the execution stats in the database and improves over the information in these DMVs with more features, efficient storage and better information structure. Query Store has no relation whatsoever to the query plan cache, it captures all queries even those that are never cached.

How to use Query Store

Query store needs to be enabled on each database you desire to capture execution stats. This is achieved by running the following:

ALTER DATABASE SET QUERY_STORE = ON;

Once Query Store is enabled all statements that execute in the context of that database are going to be captured.

The information captured by Query Store is exposed in a series of new catalog views:

This view exposes the individual statements (queries) captured by Query Store. The text is captured exactly as sent to execution, including comments inside the statement. Each query text captured by Query Store will appear once in this view.

This view exposes the individual queries captured by Query Store. A “query” in this context is a combination of one query text and one context settings, in other words is the query text executed in a specific context setting.

Query Store is notified of compilation duration after each statement compilation, and execution stats after each statement execution. Query Store caches the stats in memory and aggregates them for each query and plan. Periodically it will flush the accumulated statistics into the database. The frequency of flushing the stats depends on the INTERVAL_LENGTH_MINUTES Query Store option:

Determines the time interval at which runtime execution statistics data is aggregated into the query store. To optimize for space usage, the runtime execution statistics in the Runtime Stats Store are aggregated over a fixed time window. This fixed time window is configured via INTERVAL_LENGTH_MINUTES.

Query Store catalog views will always show the latest data

Querying any of the Query Store catalog views is guaranteed to show all captured statistics, including the latest ones not yet flushed to disk.

Performance impact of enabling Query Store

Probably the most frequent question I hear when I’m talking about Query Store is “OK, but how much does it impact my performance?”. The short answer is “you won’t notice it”, but lets details a bit. Having Query Store enabled will impact several aspects:

Disk Space used by Query Store is configurable via the MAX_STORAGE_SIZE_MB setting. This is space used in the database on the primary filegroup. The on-disk size of the Query Store data does not directly affect performance except when Query store has to load this data. After a database startup Query Store has to reload in memory the query text hashes and the plan hashes from the sys.query_store_query_text and sys.query_store_plan catalog tables. This load takes longer the bigger the Query Store size is, you should allow for something like 1 second per 1GB of Query Store data. Note that as SQL Server does not have a database quota system is possible for Query Store to actually exceed the configured maximum space, but it should only exceed it by very small amounts.

Disk IO required by Query Store will have a direct impact on your system performance. Disk IO is the most precious resource on any database system, when the disk IO is the bottleneck then anything that diverts IO will impact performance and Query Store is not exception. Query Store issues IO to write the query text and the query plan and when flushing accumulated aggregated execution stats. Query Store always saves the data asynchronously in background tasks, so the user workload is not impacted directly.

Query Store avoids using high CPU on the query execution code path, and Query Store structures are partitioned to avoid CPU interlocked hotspots. However when a new plan is captured by Query Store the query plan has to be generated, which implies walking the statement execution tree and generating the showplan XML. This is relatively expensive, but as it usually occurs in the compilation phase it should not impact the overall performance of a well tuned application.

Memory consumed by Query Store is dependent on the number of queries and plans present in your workload. Query Store memory can become significant factor if you execute hundreds of thousands of distinct queries. Query Store takes measures to prevent excessive memory use, like limiting the number of distinct plans captured per query.

Overall you should consider that Query Store typically adds an impact on performance in the range of 0.5% to 2.5%, meaning that if your system has a throughput of X transactions per second it will reduce that throughput by 0.5-2.5% if Query Store is enabled

The better your application is written, the lower the impact of enabling Query Store

Typical OLTP applications contains a small number of well tuned queries, executed with high frequency. In such environments Query Store quickly captures all the query texts and plans and will only have to capture the execution stats and periodically flush the aggregated execution stats to disk. This is the most optimal Query Store resource usage: low disk space used, low CPU and memory consumed, low disk IO required. Impact on the workload is minimal. Typical OLAP and DW applications run frequent ad-hoc queries, each query take significant resources to compile and execute and is executed seldom. In such environment Query Store requires more resources as it has to save more often (and bigger!) query texts and query plans.

Applications that use embedded literals in the query text (eg. SELECT ... FROM ... WHERE id=1) instead of parameter (eg. SELECT ... FROM ... WHERE id=@id) will be more impacted by Query Store. Embedding literals is an anti-pattern and has more negative effects, but with Query Store it will transform almost every execution into a distinct query text, and this triggers a new query, and a new plan. As each of these distinct queries and plans has to be saved to disk, disk space usage and disk IO consumed by Query Store skyrockets. In such situations, when you analyze Query Store you will see lots of different query texts, different queries, different plans, but very few plan shapes (different plans entries in sys.query_store_plan that represent the same actual execution plan can be identified by the query_plan_hash column value). For such workloads is recommended to enable forced parameterization. I know some consider the topic of parameterization anathema, but I strongly stand by this recommendation, even unrelated to Query Store.

]]>Introducing DBHistory.comhttps://rusanu.com/2016/03/18/introducing-dbhistory-com/
Fri, 18 Mar 2016 14:25:50 +0000http://rusanu.com/?p=2466After 145 months of employment with Microsoft, at the beginning of March 2016 I quit the SQL Server team to pursue my dream of creating a better solution for SQL Server performance monitoring and configuration management. I am now dedicating all my time to building DBHistory.com.

What is DBHistory.com?

At first DBHistory.com will allow you to track the history of all configuration and schema changes on SQL Server. You will see when the change occurred, which user initiated it, on which server, in which database, what objects where affected, and the exact T-SQL text of the statement that triggered the change. Using DBHistory.com you will be able to quickly asses the history of an object, what changes occurred in a certain time interval, view all changes done by a particular user and so on.

There is no tool to purchase, no code to deploy, no storage to reserve, no contracts to sign. Only a configuration wizard to run, DBHistory.com uses SQL Server’s own Event Notifications mechanisms to push all change notifications from the monitored servers to the DBHistory.com service. The change notifications leave the monitored server immediately and are then safely stored with DBHistory.com.

You can use DBHistory.com if you are a software vendor that deploys a solution on-premise at your clients and need to know if any configuration or schema change occurred in your applications. You can use DBHistory.com if you are a remote DBA company that needs to know the history of changes on any of your client’s SQL Server databases, no matter who or when did the change. You can use DBHistory.com if you are a consultant that needs to know what changed since your last visit. You can use DBHistory.com if you have many employees with database access and want to know what changes were done by whom and when. You can use DBHistory.com if you are database development company that needs a history of changes done directly to the database in the development environment. You can use DBHistory.com if you distribute an application with free SQL Server Express Edition and want to understand how many times your application was deployed.

You will be able to set alarms to get notifications when certain changes occur, or when important objects are modified.

The road ahead will add more functionality to DBHistory.com, including performance monitoring

When can I try DBHistory.com?

If you are interested, please sign up now at DBHistory.com. Soon I will extend beta invites for those interested in trying out the service.

]]>The cost of a transactions that has only applockshttps://rusanu.com/2015/03/06/the-cost-of-a-transactions-that-has-only-applocks/
Fri, 06 Mar 2015 13:23:23 +0000http://rusanu.com/?p=2420I recently seen a discussion on the merits of using SQL Server as an application lock manager. Application would start a transaction and acquire app locks, using sp_getapplock. No actual data would be queried or modified in this transaction. The question is whether using SQL Server as such, basiclaly as an application lock manager, can lead to problems in SQL Server.

sp_getapplock uses the same internals for acquiring and granting locks like SQL Server data locks

Locks are relatively cheap. The engine can acquire, grant and release millions of them per second. A lock cost a small amount of memory, but overall locking is such an important and critical part of SQL Server operations that everything about locks is highly optimized. sp_getapplock is basically just a shell around the lock manager features and as such it gets a ‘free ride’. Highly optimized, efficient, fast and low overhead locks. However, this is not the only cost associated with app locks. Transaction scoped app locks must also play by the rules of transaction rollback and recovery, and one such rule is that exclusive locks acquired by transactions must be logged, since they must be re-acquired during recovery. App locks are no different, and this is easy to test:

Sure enough our transaction that acquired an exclusive app lock has generated log records. This means that if the application is using SQL Server as a lock manager and holding long lived locks (for example while making a HTTP request) is also pinning the log, preventing truncation. Another side effect is log traffic, if the application is requesting lots of app locks it may result in noticeable log write throughput.

tempdb has a simplified logging model

I mentioned that the requirement to log the exclusive locks acquired in a transaction is a needed for recovery. Since tempdb is never recovered, tempdb has a more lax logging model. Specifically, it is not required to log exclusive locks acquired by transactions. So if you repeat the same experiment as above, but in tempdb you’ll see that no log record is generated:

If you make sure the current context is set to tempdb when calling sp_getapplock then you get a high performance lock manager for your application, with little overhead for your SQL Server engine. This is actually trivial to achieve in practice:

By simply using the 3 part name tempdb..sp_getapplock I have effectively suppressed the logging of the app lock acquired! Of course a word of caution: please make sure that you are OK with app locks not being logged. The important thing to consider is how will your application behave when doing an online recovery. If app locks were not logged your app may acquire a an app lock while the data that was protected by that app lock is actually going through recovery. In practice this is seldom the case to worry, but you have to consider and acknowledge the trade offs.

No. If a trickle INSERT requires an OPEN row group and it cannot place a lock on it it will create a new one. any number of INSERT statements can proceed in parallel, each locking its own row group. While this results in lower quality row groups (possible multiple small OPEN row groups) the decision was explicitly to favor concurrency. This applies as well to BULK INSERTs that fail to create the minimum row group size (~100k rows).

Does Tuple Mover delete the rows in deltastore?

No. Surprised? Each OPEN/CLOSED row group has its own individual deltastore. When the Tuple Mover finishes compressing a row group the entire deltastore for that row group will be deallocated. This is done to avoid the explicit logging of each individual row delete from deltastores. This is the same difference as between TRUNCATE and DELETE.

Does Tuple Mover block reads?

No. No way. Absolutely no.

Does Tuple Mover block INSERTS?

No. Both trickle inserts and bulk inserts into clustered columnstores can proceed while the Tuple Mover is compressing a rowgroup. You only need to think about the fact that the Tuple Mover only has business with CLOSED row groups and INSERTs are only concerned with OPEN row groups. There is no overlap so there is no reason for blocking.

Does Tuple Mover block UPDATE, DELETE, MERGE?

Yes. Spooled scans for update (or delete) cannot re-acquire the row if the row storage changes between the scan and the update. So the Tuple Mover is mutually exclusive with any UPDATE, DELETE or MERGE statement on the columnstore. This exclusion is achieved by a special object level lock that is acquired by UPDATE/DELETE/MERGE in shared mode and by the Tuple Mover in exclusive mode.

Can Tuple Mover, or REORGANIZE, shift rows between row groups?

No. The Tuple Mover (and REORGANIZE) can only compress a row group but it cannot shift rows between row groups. Particularly it cannot ‘stich’ several small deltastores into one compressed row group. REBUILD may appear that it can shift or move rows, but REBUILD is doing exactly what the name implies: a full rebuild. It reads the existing columnstore and builds a new one. The organization (row group numbers and size) of the new (rebuilt) columnstore has basically no relation with the original organization of the old columnstore.

]]>WindowsXRay is made public as Media eXperience Analyzerhttps://rusanu.com/2014/05/30/windowsxray-is-made-public-as-media-experience-analyzer/
Fri, 30 May 2014 11:34:13 +0000http://rusanu.com/?p=2383One of the most useful performance tools I used internally at Microsoft was the WindowsXRay. I’m pleased to find that it was released the new name of Media eXperience Analyzer. The tool was internally developed by the Windows media perf team and the release info is targeted toward the media developers. But rest assured, the tool can be used for all sort of performance analysis and I had successfully used in analysis of SQL Server performance problems. Using the Media eXperience Analyzer (XA) you typically start by collecting one or more ETL traces using the platform tools like the Windows Performance Recorder (WPR). XA is a visualization tool used to inspect these ETL traces, much like the Windows Performance Analyzer.
]]>How to prevent conversation endpoint leakshttps://rusanu.com/2014/03/31/how-to-prevent-conversation-endpoint-leaks/
Mon, 31 Mar 2014 10:39:53 +0000http://rusanu.com/?p=2343One of the most common complains about using Service Broker in production is when administrators discover, usually after some months of usage, that sys.conversations_endpoints grows out of control with CLOSED conversations that are never cleaned up. I will show how this case occurs and what to do to fix it.

A message exchange pattern that leaks endpoints

I have a SQL Server instance (this happens to be SQL Server 2012) on which I have enabled the Service Broker endpoint. I’m going to set up a loop back conversation, one that is forced to go on the network even though both initiator and target service are local:

This is pretty much as simple as it gets, is pair of services called source_service and target_service which have set up activation on their queue to simply end conversations when the EndDialog or Error messages are received (the very minimum requirement any service should handle). I'm using the DEFAULT contract. Next I'm going to send a message and immediately end the dialog:

The target conversation endpoint is in CLOSED state, but notice that the security_timestamp field is unitialized . The security_timestamp exists to prevent dialog replays (either as an malicious attack or as a configuration mistake) which would cause the dialog 'resurrect' if the first message is retried (or 'replayed'). The target conversation endpoint cannot be deleted before the datetime in the security timestamp field, which makes it safe in case of retry/replay. This field is initialized when the message from the initiator contains a special flag set that instructs the target that the initiator has seen the acknowledgement of the first message and had deleted the message 0 from its transmission queue, and thus will not re-send it. When the target receives a message with this flag set, it initializes the security timestamp with current time plus 30 minutes. You may have noticed that the message exchange pattern in my example is the dreaded fire-and-forget pattern. In this pattern the initiator never has a chance to send a message with the above mentioned flag set and thus the target never has a chance to initiate the conversation security timestamp. This endpoint will be reclaimed on April 18th 2082, because that is the conversation lifetime. In case you wonder that date comes from adding MAX_INT32 (ie. 2147483647) seconds to the current date.

From an operational point of view this target endpoint is 'leaked'. It will consume DB space and the system will refrain from deleting it for quite some time. Repeat this vicious exchange pattern several thousand times per hour and in a few days your target database will simply run out of space. After frantic investigation you discover the culprit is sys.conversation_endpoints and you send me an email asking for solutions. Happens about once every week...

Solution 1: specify a lifetime

Using the very same setup, I'll add a trivial change: I will specify a 60 seconds lifetime for the conversation:

Notice how the second conversation endpoint has a lifetime that is in the near future (one minute). Even though the security timestamp is uninitialized, the conversation endpoint will be reclaimed in 30 minutes after the lifetime expires. I just had lunch (chicken noodle soup, grill salmon and wild rice, apple tart), and when I check again, the endpoint is gone:

If you don't want to change the existing message exchange pattern then adding an explicit lifetime to BEGIN CONVERSATION is a viable solution. You have to be careful when choosing the lifetime, the "correct" value is very much application dependent. A lifetime declares that your application is no longer interested in delivering the messages sent after the lifetime has expired. Lifetime is per conversation, not per message. If the conversation is not complete (the two endpoints did not exchange EndDialog messages before the lifetime has expired) then when the lifetime expires the conversation ends with error, an error message is enqueued for your service. Choosing a lifetime like 60 seconds in my example is quite aggressive, specially in a distributed environment, it does not allow much room for retries if, for example, the target is going through some maintenance downtime. You may want to specify a lifetime of several hours, or maybe one day. Again, is always entirely application specific.

Solution 2: change the message pattern

The next solution I'll propose is to change the actual message exchange pattern. Instead of closing the conversation from the initiator side immediately after the first SEND, let the target close the conversation when it receives the DEFAULT message. The initiator will in turn close its own endpoint in the activate procedure, as a result of receiving the EndDialog message. The setup is almost identical, with one minor different in target's activated procedure:

Notice how with this message exchange pattern the target endpoint is in CLOSED state and security timestamp is initialized. The conversation endpoint will be removed by the system (deleted) when the security timestamp expires. Aside from fixing the endpoint leak problem, this pattern is better because it does have the >error handling related problems of the fire-and-forget pattern.

The bad idea: WITH CLEANUP

All too often the team that notices the conversation leak problem comes up with a horrible 'fix': they add the WITH CLEANUP cause to the END CONVERSATION statement in the target activated procedure. A short test seems to show the problem was solved, but in a few days the same team ends up in Tier 3 Microsoft customer support calls reporting all sort of weird and strange problems, including multiple application logic processing of the same message (replays) and warnings logged in ERRORLOG complaining about database corruption (DBCC will not report any). This happens because WITH CLEANUP clause is a very rude way to end a conversation. The distributed state embedded in the conversation is abruptly ended, without any attempt to inform the peer endpoint. The complications that can arise are endless and I don't even know all of them. Basically, all bets are off.

Conclusion

The fire-and-forget message exchange pattern is all too compelling, but it is riddled with problems. Ending the conversation from the target side is a much better approach. Additionally consider always specifying a conversation lifetime, one that makes sense for whatever the message represents from your business point of views. A lifetime declares "if the messages were not delivered after this time, I'm no longer interested in trying to deliver them", so consider carefully what is a good value for your case. And never ever deploy WITH CLEANUP in production code.

If you are already facing a situation in which the conversation endpoints have leaked then you first have to fix the application using one of the proposed solutions above (or both!). Then you can go ahead and cleanup the leaked conversations, and for that you can use END CONVERSATION ...WITH CLEANUP. Here is an example cleanup loop that looks for leaked endpoints and cleans them up:

]]>How to read and interpret the SQL Server loghttps://rusanu.com/2014/03/10/how-to-read-and-interpret-the-sql-server-log/
https://rusanu.com/2014/03/10/how-to-read-and-interpret-the-sql-server-log/#commentsMon, 10 Mar 2014 16:42:24 +0000http://rusanu.com/?p=1569The SQL Server transaction log contains the history of every action that modified anything in the database. Reading the log is often the last resort when investigating how certain changes occurred. It is one of the main forensic tools at your disposal when trying to identify the author of an unwanted change. Understanding the log and digging through it for information is pretty hard core and definitely not for the faint of heart. And the fact that the output of ::fn_dblog can easily go into millions of rows does not help either. But I’ll try to give some simple practical examples that can go a long way into helping sort through all the information and dig out what you’re interested in.

Write-ahead logging (WAL), also referred to as journaling, is best described in the ARIES papers but we can get away with a less academic description: SQL Server will first describe in the log any change is about to make, before modifying any data. There is a brief description of this protocol at SQL Server 2000 I/O Basics and the CSS team has put forward a much referenced presentation at How It Works: Bob Dorr’s SQL Server I/O Presentation. What is important for us in the context of this article is that a consequence of the WAL protocol is that any change that occurred to any data stored in SQL Server must be described somewhere in the log. Remember now that all your database objects (tables, views, stored procedures, users, permissions etc) are stored as data in the database (yes, metadata is still data) so it follows that any change that occurred to any object in the database is also described somewhere in the log. And when I say every operation I really do mean every operation, including the often misunderstood minimally logged and bulk logged operations, and the so called “non-logged” operations, like TRUNCATE. In truth there is no such thing as a non-logged operation. For all practical purposes everything is logged and everything leaves a trace in the log that can be discovered.

The log contains interleaved operations from multiple transactions

The image above shows how 3 concurrent transactions were lay out in the log: the first transaction contains two insert and a delete operation, the second one contains an insert but had rolled back hence it contains a compensating delete and the last operation committed two deletes and an insert operation. Although all three transactions run ‘simultaneous’ the true order of operation is the one in the log, given by the LSN. Usually when analyzing the log we need to look at operations from specific transactions. The key field is the [Transaction ID] column which contains the system degenerate transaction ID for each operation logged. All operations logged by a transaction will have the same [Transaction ID] value.

Lets create a small example with some activity and look at the generated log:

The first think I will do will be to look at the LOP_BEGIN_XACT operations. This log record marks the beginning of a transaction and is very important in log analysis because is the only log record that contains the date and time when the transaction started, and also contains the SID of the user that had issued the statement. Here is how I query for these operations:

We can recognize some of the transactions in our script. Remember that SQL Server will always operate under a transaction, and if one is not explictly started then SQL Server itself will start one for the the statement being executed. At LSN 20:5d:2 we started transaction 2d0 named CREATE TABLE, so this must be the implicit transaction started for the CREATE TABLE statement. At LSN 20:9b:2 the transaction id 2d6 named INSERT is for our first standalone INSERT statement, and similarly LSNs 20:a0:2 starts transaction 2d9, 20:a1:1 starts 2da, LSN 20:a2:1 starts xact 2db, LSN 20:a3:1 starts xact 2db and LSN 20:a3:1 starts xact 2dc respectively. For those of you that detect a certain pattern in the LSN numbering the explanation is given in my What is an LSN: Log Sequence Number article. And if hex number are familiar to you then you’ll also recognize that the transaction ID is basically an incrementing number.

The LSN 20:a4:1 that starts xact 2dd is a little bit different because is named user_transaction. This is how explicit transactions started with BEGIN TRANSACTION show up. There are no more implicit transactions named INSERT for the 5 INSERT statements inside the explicit transaction because the INSERT statement did not have to start an implicit transaction: it used the explicit transaction started by the BEGIN TRANSACTION statement. Finally the LSN 20:aa:1 starts the implicit xact 2df for the DELETE statement.

In addition we also see a number of transactions that we cannot correlate directly with a statement from their name. Transactions 2d2, 2d3, 2d4 and 2d5 are SplitPage transactions. SplitPage transactions are how SQL Server maintains the structural key order constraints of B-Trees and there is ample literature on this subject, see Tracking page splits using the transaction log or How expensive are page splits in terms of transaction log?. Notice that all these SlipPage transactions have the same Parent Transaction ID of 2d0 which is the CREATE TABLE xact. These are page splits that occur in the metadata catalog tables that are being inserted into by the CREATE TABLE operation.

Parent Transaction ID shows when sub-transactions are started internally by SQL Server

We can also see that right after the first INSERT a transaction named Allocate Root was started. Tables are normally created w/o any page and the very first INSERT will trigger the allocation of the first page for the table. the allocation occurs in a separate transaction that is committed imedeatly so it allows other inserts in the table to proceed even if the insert that actually triggered the allocation stalls (does not commit, or even rolls back). This capability of starting and committing transactions in a session independently of the session main transaction exists in the SQL Server, but is not exposed in any way to the T-SQL programming surface.

The Transaction SID column shows the SID of the login that started the transaction. You can use the SUSER_SNAME function to retrieve the actual login username.

I choose to look at the xact ID 2d6, which is the first INSERT operation. The result may seem arcane, but there is a lot of useful info if one knows where to look. First the trivial: the xact is 2d6 has 4 operations in the log. Every transaction must have an LOP_BEGIN_XACT and a record to close the xact, usually LOP_COMMIT_XACT. The other two operation are a recording of locks acquired (LOP_LOCK_XACT) and an actual data modification (LOP_INSERT_ROWS). the meat and potatoes lay with the data modification operation, but we’ll see later how LOP_LOCK_XACT can also be extremely useful in analysis.

Operations that modify data, like LOP_INSERT_ROWS will always log the physical details of the operation (page id, slot id) and the object which they modified: the allocation unit id (see sys.allocation_units) and the partition id (see sys.partitions). Remember that even non-partitioned tables are still represented in sys.partitions, you can think at them as being partitioned with a single partition for the whole table. The AllocUnitName column in fn_dblog output shows the is ussualy the simplest way to identify which operations affected a specific table.

The Page ID and Slot ID columns will tell exactly what record, on what page, was modified by this transaction. 4f hex is 79 decimal, my database id is 5:

Using the Page ID and Slot ID is one way to search for log records that affected a row of interest.

Indeed the page 5:1:79 at slot 0 contains the first INSERT from our script. The fn_dblog output matches what we find in the database. Besides the academic interest, this info has practical use: if we want to know what transaction modified a certain record and we know the record position (page, slot) we can search the log for records that updated that particular position. Alas, this approach has its drawbacks. The biggest problem is that in B-Trees the position of a record is very mobile. Page split operations move records and change their physical location, thus making it harder to find the transactions that modified the record.

The Lock Information column is actually my favorite forensic analysis column. Here is the full content for the LOP_INSERT_ROWS operation in my example: HoBt 72057594039042048:ACQUIRE_LOCK_IX OBJECT: 5:245575913:0 ;ACQUIRE_LOCK_IX PAGE: 5:1:79 ;ACQUIRE_LOCK_X KEY: 5:72057594039042048 (8194443284a0). We have here the actual object_id of the table, again the page id, and then the key lock. For B-Trees the key lock is a hash of the key value and thus can be used to locate the row, even if it moved after page splits:

We again located the same row, the first INSERT. Remember that lock resource hashes, by definition, can produce false positives due to hash collision. But the probability is low enough and one must simply exercise common sense if a collision is encountered (eg. the SELECT above returns more than one row for the desired lock resource hash).

The last column(s) I'm going to talk about are the log content columns. These contains the record 'payload' for data operations. The payload varies from operation type to operation type, but it must contain enough information so that recovery can both redo and undo the operation. For an INSERT that means that the full content of the inserted row must be present in the log record. Lets look at the Log Content 0:

The content is identical. That means that the LOP_INSERT_ROWS operation Log Content 0 will contain the actual image of the row as it was inserted in the page. And this gives us the last, and often most difficult, way to discover the log records that affected a row of interest: analyse the actual log record content for patterns that identify the row. If you are familiar with the Anatomy of a Record you can identify the record parts:

10002400 is the record header

01000000 is the first fixed column (id, int)

0x7374616E64616C6F6E6520786163742020202020 is the second fixed char(20) column

0x912ba300b6a10000 is the datetime created_at column

0x030000 is the NULL bitmap

Lets consider now an example of how to use this newly gained know-how. I did some modifications to the table:

Now the table contains a row that should not be there ('junk'), a row that was 'important' is missing and a row that should had not been changed was actually changed. Can we locate the corresponding log operations? Lets start with the easy case: the 'junk' record is still in the table so we can simply search for the log record that inserted it. We'll use the lock resource hash to locate the log record:

The script finds the lock resource hash value for the row with the 'junk' data value, it then finds the transaction ID that locked such a row and finally it shows all the operations this transaction did. There are couple of leaps-of-faith here: I assume that the lock resource hash is unique and I also assume that the first transaction that locks this row is the transaction we're interested in. In a real life scenario things can get significantly more complex because hash collisions and because there could be multiple transactions that locked the row with the lock resource we're interested in.

The next challenge is to locate the transaction that modified the data that was not supposed to change. We could run the very same script as above, but it would find the transaction that did the original INSERT of the row, not the one that did the UPDATE. So instead lets find all the transactions that locked the row's lock resource hash:

This shows three transactions (2e0, 2e2 and 2e4), the first one being our original INSERT and then two UPDATE transactions: the LOP_MODIFY_ROW is the log record for updating a row. How can we know which UPDATE is the one that modified the data which was not supposed to change? This time I included in the result [RowLog Contents 0] through [RowLog Contents 5], becuase the [Num Elements] is 6 which indicates the log record contains 6 payloads. Remember that the log record must contain sufficient information for recovery to be able to both redo and undo the operation, so these payloads must contain the value being changed, both before and after the change. The LOP_MODIFY_ROWS at LSN 20:b4:2 has payloads like 0x646F206E6F74206368616E676521 0 and 0x6368616E67656420202020202020, which are clearly ASCII strings:

Even w/o fully understanding the log record internals, we can use some educated guess to figure out which UPDATE did the modification by looking at the log record payloads.

The UPDATE we're interested in is the xact 2e2. From the LOP_BEGIN_XACT log operation for this xact we can, again, figure out when did the change occurred and who did it.

The final challenge was to figure out when was the 'important' record deleted. Since the record is now missing we cannot use again the %%lockres%% lock resource hash trick. If we would know the ID of the deleted record we could do some tricks, because if we recreate a record on the same table with the same key as the deleted row then it will have the same lock resource hash and we can search this hash. But we don't know the ID of the deleted row, only the value in the data column. If the value is distinct enough we can brute force ourselves by searching the log records payload:

We found 3 transactions that had the string 'important' in the log record payload. We know xact 2e0 is the original INSERT and we found xact 2e3 to contain an LOP_DELETE_ROWS, which is the log record for DELETE operations. Using this brute force approach we did find the transaction that deleted the record that contained the 'important' string. A note of caution is warranted here: my example uses a simple ASCII string. One must understand how the values are represented in binary to search the log payload for known values. This is quite tricky for more complex values, as you need to know the internal representation format (eg. the numeric/decimal types) and be able to write the correct Intel platform LSB value for types like int or datetime.

So what's the deal with the LOP_MODIFY_ROWS at xact 2e1, afteral we did not modify the row with the value 'important' in our test script. Again, the log will contain all the details:

The transaction is a stats auto-update. The actual string 'important' was copied into the stats when the automatically generated stats where created and thus the log record for the internal stats blob maintenance update operation will, obviously, contain the string 'important'.

Looking at log after truncation

backup log demolog to disk = 'c:\temp\demolog.trn'

Use fn_dump_dblog to look at the log records in log backup files

I just backed up the database log on my little demo database and this triggered log truncation: since the log records are backed up there is no need to keep them in the active database log, so the space in the LDF file can be reused for new log records. See How to shrink the SQL Server log for more details on how this works. But if I attempt now any of my queries above they will fail to locate anything in the log. To investigate now I have to look at the log backups:

When Lock Information does not help

You've seen how the [Lock Information] column is of great help to locate the log record of interest, but there are cases when this information is missing from the log. Once case is when the record was never locked individually, eg. when lock escalation occurred or a table lock hint was used and a higher granularity X lock was acquired:

Because the TABLOCKX hint was used the [Lock Information] does not contain the lock resource hash for the individual row deleted, it contains the record of the table X lock. However the log record payload still contains the deleted row image and we could still search using the [Log Record] or the [Log Contents 0..5] fields. The [Page ID] and [Slot ID] are also present and we could use them to try to identify the log records that affected the row we're interested in.

Another case that makes investigating the log difficult is minimally logged operations. With minimally logging the log contains only a record that describes the fact that a certain page contains minimally logged changes and the SQL Server engine must guarantee that that page is flushed to disk before the transaction commits, see Operations That Can Be Minimally Logged. In such a case the only record youl'll see will be the LOP_FORMAT_PAGE with the [Page ID] of the page that contains minimally logged operations. The good news, for such investigations, is that only INSERT and blob .WRITE updates can be minimally logged (and the legacy, deprecated, WRITETEXT/UPDATETEXT), so the sheer problem space is reduced as most data modifications, and specially accidental modifications, cannot be minimally logged and thus will leave a full blown trace in the log. There is also an opposite case, when the log contain more information about updates: when the table is an article in a transactional replication publication, but I will not enter into details on this topic.

Practical Considerations

My short demo operated on a tiny log. In real cases is not unusual to have to dig out one log record out of millions. And is just very very slow. In my experience is usually better to copy the log first into a temp table, at least the fields of interest and a relevant portion. This table can be then indexed as needed (eg. by [Transaction ID]) and analysis can be done much faster. It also helps if one can restrict the start and end LSN of interest because fn_dblog can return results much faster if is restricted to known LSNs. The shorter the interval (more appropriate LSNs), the faster the investigation. Be careful though as fn_dblog will cause a server core dump if the LSNs passed in are incorrect... on a live production server this can equate several minutes of a complete SQL Server freeze!

DDL Changes

Log analysis can be used for investigating modification to the database schema (DDL changes) using the same methods as one used to analyze data changes. Just remember that databases store their schema inside the database, and every single object is described in one or more catalog (system) tables. All DDL changes end up as data changes done to these catalog tables. Interpreting the catalog changes (insert/update/delete of rows in catalog tables) though is not trivial. The fact that all DDL changes are catalog changes also means that all DDL changes look the same. An LOP_INSER_ROWS into sys.syschhobjs could mean a CREATE TABLE, but it could just as well mean a CREATE VIEW or CREATE PROCEDURE. Furthermore, CREATE TABLE foo will look very similar to CREATE TABLE bar when analyzing the log. One of the main differences from data changes log analysis is using data locks to identify rows: with DDL changes you should not try to follow the actual data locks (locks on catalog table rows acquired as the DDL changes are being applied as basic INSERT/UPDATE/DELETE operation. Instead you should try to locate metadata locks. In general DDL operations will start with a SCH_M lock record which contains the object ID of the object being modified.

Another problem with doing schema modification analysis using the log is that the way SQL Server uses the system catalog to store its metadata changes between releases. So even if you figured out one way to interpret the log to understand what kind of DDL change occurred, you can discover that another SQL Server version uses a very different system catalog structure and changes look completely different on that version. You can rely on the system catalog (and therefore the way DDL operations appear in the log) as being consistent between SQL Server editions and between service packs and cumulative updates of the same version (this later point actually has a couple of exceptions, but is not worth talking about them).

Using object ID

The best way to locate DDL operations that affected an object is to locate transactions that required a SCH_M lock on that object. Use OBJECT_ID to gett he object ID, then look for the pattern N'%SCH%<object id>%' in the Lock Information column. For instance, say I am looking for changes to object id 245575913:

the first think to notice is that the Transaction Name of the LOP_BEGIN_XACT contains the DDL operation name: CREATE TABLE. Add the Begin Time and the Transaction SID and we found in the log when was the table created and by who. Using the information about System Base Tables the rest of operations for this transactions can be interpreted:

First LOP_INSERT_ROWS into sys.sysschobjs.clst

This is insert of the object record into sys.objects. There are four insert records because there is a clustered index and four non-clustered indexes of syssysschobjs catalog table.

LOP_INSERT_ROWS into sys.syscolpars.clst

These are two insert of column record into sys.columns. Each insert has two records, one for the clustered index one for the non-clustered index.

This is an insert in sys.objects of an object related to the table, but not the table itself. In this case is the primary key constraint, but identifying this sort of information is usually tricky. The best approach is to simply use cast([RowLog Contents 0] as nchar(4000)) and look for some human readable string, that would be the name of the object inserted. In my case the log record content revealed the string PK__test__3213E83F1364FB13 so I could easily deduce that is the default named primary key constraint.

As you can see I did not retrieve exactly the DDL that run, but I have a very good idea what happened: the table was created at 2014/03/10 15:00:08:690, it initially contained two columns and a primary key constraint. Using the same trick of casting the log record content to NCHAR(4000) I could easily retrieve the original column names. My example is for a very simple CREATE TABLE statement, more complex statements can generate more complex log signatures. Partitioning in special adds a lot of LOP_INSERT_ROWS records because every individual partition must be described in sys.sysrowsets and every column for every partition and for every index must be described in sys.sysrscols. And that is before the low level HoBT metadata is being described, see SQL Server table columns under the hood.

Again we can see the transaction name that reveals the type of DDL operation (is an ALTER TABLE), the time it occurred and the SID of the user that issued the change. The LOP_INSERT_ROWS into sys.syscolpars.clst tells me this is a DDL change that added a column to the table.

Retrieving a deleted object ID

How about the case when we're investigating an object that was dropped? We cannot use the handy N'%SCH%<object id>%' pattern trick because we don't know the object ID. In this case the we can look for the LOP_DELETE_ROWS record that deleted the object form sys.objects. We know is going to affect the sys.sysschobjs.clst allocation unit and we know the name of the deleted object must be in the log record. In my case I'm looking for a dropped table that was named test:

We found two records, from the transaction 0000:000002ee. It shouldn't be a surprise there are two delete operations, after all the CREATE did two INSERT because of the primary key constraint. So lets look at transaction 0000:000002ee:

Again we have the the transaction time, the user that issued the DROP, and the transaction name DROPOBJ. Notice the LOP_LOCK_XACT operations that is logged right after the transaction start: HoBt 0:ACQUIRE_LOCK_SCH_M OBJECT: 5:245575913:0. This is important, because it gives us the object Id of the object being dropped at the moment it was dropped. Using this object ID we can further dig in the log and locate other DDL changes, just as I showed above.

Object IDs can be recycled

Is important to realize that object IDs can be recycled, after an object was dropped another object can be created using the same object ID (trust me, it can happen). So do pay attention and make sure that when you follow an object ID you are looking at the relevant log records, not the log records of a completely unrelated object that happens to reuse the same object ID. Remember that the log record LSN establishes a clear time ordering of events so it should not be too hard to figure out such a case.

Is also important to realize that the CHARINDEX(cast(N'test' as varbinary(4000)), [Log Record]) > 0 condition could retrieve log records for drops of other objects that simply contain 'test' in their name. This is not the only place where some of these techniques can misfire. These techniques are quite advanced, not meant for everyday users. I expect you to exercise quite advanced judgement. Try to corroborate the information found through multiple methods. Use common sense.

]]>https://rusanu.com/2014/03/10/how-to-read-and-interpret-the-sql-server-log/feed/1How to analyse SQL Server performancehttps://rusanu.com/2014/02/24/how-to-analyse-sql-server-performance/
Mon, 24 Feb 2014 15:55:50 +0000http://rusanu.com/?p=2144So you have this SQL Server database that your application uses and it somehow seems to be slow. How do you troubleshoot this problem? Where do you look? What do you measure? I hope this article will answer enough questions to get you started so that you can identify the bottlenecks yourself, or know what to search for to further extend your arsenal and knowledge.

How does SQL Server work

To be able to troubleshoot performance problems you need to have an understanding of how SQL Server works. The grossly simplified explanation is that SQL Server executes your queries as follows:

The application sends a request to the server, containing a stored procedure name or some T-SQL statements.

The request is placed in a queue inside SQL Server memory.

A free thread from SQL Server’s own thread pool picks up the request, compiles it and executes it.

The request is executed statement by statement, sequentially. A statement in a request must finish before the next starts, always. Stored procedures executes the same way, statement by statement.

Statements can read or modify data. All data is read from in-memory cache of the database (the buffer pool). If data is not in this cache, it must be read from disk into the cache. All updates are written into the database log and into the in-memory cache (into the buffer pool), according to the Write-Ahead Logging protocol.

Data locking ensures correctness under concurrency.

When all statements in the request have executed, the thread is free to pick up another request and execute it.

This may sound trivial, but understanding that the request is either executing or is waiting for something (is suspended) is the key to troubleshooting SQL Server performance. If requests sent to SQL Server take a long time to return results then they either take a long time executing or they take a long time waiting. Knowing whether is one case or the other is crucial to figure out the performance bottlenecks. Additionally, if the requests take a long time waiting, we can dig further and find out what are they waiting for and for how long.

Understanding request Waits

Waiting requests have Wait Info data

Whenever a request is suspended, for any reason, SQL Server will collect information about why it was suspended and for how long. In the internal SQL Server code there is simply no way to call the function that suspends a task without providing the required wait info data. And this data is then collected and made available for you in many ways. This wait info data is crucial in determining performance problems:

A requests currently executing that is suspended right now you can see what is it waiting on and how long has been waiting for.

A requests currently executing that is executingright now you can see what is the last thing it waited on.

You can understand when requests are waiting on other requests.

You can get aggregates for what are the most waited for (busy) resources in the entire server.

You can understand which physical (hardware) resources are saturated and are causing performance problems.

Wait Info for currently executing requests

For every request executing on SQL Server there is a row in sys.dm_exec_requests. Querying this DMV, at any moment, gives you a quick view of everything executing right then. The wait_type, wait_time and last_wait_type columns will give you an immediate feel of what is ‘runnig’ vs. what is ‘waiting’ and what is being waited for:

Request on session 53 is a SELECT and is currently waiting. It is waiting on a lock.

Session 54 is an INSERT and is currently waiting. It is waiting on a page latch.

Session 55 is an INSERT and is currently running. Previous wait was a page latch.

Session 56 is an INSERT and is currently waiting. It is waiting for the database log to flush (is committing a transaction).

Session 57 is an INSERT and is currently waiting. It is waiting for the database log to flush.

Note that the request for session 54 has a status of running but it is actually waiting. This is because latch waits are expected to be short. The blocking_session_id also tells us, for some of the waiting tasks, what other request is holding the resource currently being waited. Session 53 is waiting on session 56 for a KEY lock resource, which means the SELECT is trying to read a row locked by the INSERT. The SELECT will not resume until the INSERT commits. Session 54 is waiting on session 55 for a page latch, which means that the session 55 INSERT is modifying that data page right now and the data on the page is unstable, not allowed to be read. Sessions 56 and 57 are waiting but there is no other session blocking them. They are waiting for the log to ‘flush’, meaning they must ensure the commit log record for their transaction had been durably written to disk. They will not resume until the disk controller acknowledges that the log was written.

Here is another example:

Request on session 53 is a COMMIT and is currently waiting. It is waiting for the database log to flush.

Session 54 is a SELECT and is currently waiting. It is waiting on lock.

Session 55 is a SELECT and is currently waiting. It is waiting on lock.

Session 56 is a SELECT and is currently waiting. It is waiting on lock.

Session 57 is a SELECT and is currently waiting. It is waiting on lock.

Note that in this example every request is actually suspended. At this moment the server is basically doing nothing. We have 4 sessions (54, 55, 56 and 57) waiting on the same lock (the row with the key lock resource KEY: 5:72057594039369728 (ac11a2bc89a9)). Session 54 is waiting on session 55 and session 55 is waiting on session 53 which means really session 54 is waiting on session 53. So everybody waits on session 53, which is waiting for the disk controller to write a log record on disk. We can also see, from the wait_time column, how long has each session waited for: about 160 milliseconds. Notice how in the previous example we had only one request actually running out of 5. I’m running these queries on a workstation with 8 cores, plenty of RAM and a decent disk, so there is plenty of hardware to carry these requests, but instead they are most of the times waiting instead of executing.

To make your requests faster you need to have them be executing rather than waiting

Is having most (or even all) requests waiting, rather than executing, something unusual? Not at all, this is the norm! Any time you will look at a SQL Server executing an even moderate load, you’ll see that most requests are waiting and only a few are executing. What you need to watch out for a long waits or short but repeated waits that add up. Long waits indicate some resource that is held for long times, and typically this occurs with locks. Repeated short waits indicate a resource that is being saturated, possibly a hot spot for performance.

Before I go further I just want to show sys.dm_os_waiting_tasks, which is a SQL Server DMV specifically designed to show currently waiting tasks:

Session 57 is waiting for 40 ms for a lock on a row and is blocked by session 53 (which therefore must be owning the lock)

Session 54 is waiting for 37 ms for a lock on a row and is blocked by session 57 (but session 57 is in turn blocked by session 53)

The situation we have here is much similar to the previous one, where we had 4 SELECT sessions blocked by an INSERT. He can see here two requests blocked by attempting to read a row (so they’re probably SELECT) and they blocking session is waiting for its transaction to durably commit. The information in this DMV is very similar to the one in sys.dm_exec_requests and the later one has more information, but there is one important case where the information in sys.dm_exec_requests is misleading: parallel queries.

To understand CXPACKET wait types you need to look into child parallel tasks

When a statement can benefit from parallel execution the engine will create multiple tasks for the request, each processing a subset of the data. Each one of these tasks can execute on a separate CPU/core. The request communicate with these tasks using basically a producer-consumer queue. The query operator implementing this queue is called an Exchange operator (I’m really simplifying here, read The Parallelism Operator (aka Exchange) for a more accurate description). If this producer-consumer queue is empty (meaning the producers did not push any data into it) the consumer must suspend and wait and the corresponding wait type is the CXPACKET wait type. Requests that show this wait type are really showing that the tasks which should had produced data to consume are not producing any (or enough) data. These producer tasks in turn may be suspended, waiting on some other wait type and that is what is blocking your request, not the exchange operator. Here is an example:

Here we can see that request on session 54 has 3 tasks waiting, and while the request wait_type shows CXPACKET, one of the parallel child tasks is actually waiting on a page lock.

Aggregated wait stats: sys.dm_os_wait_stats

SQL Server aggregates statistics about all wait types and exposes them in a sys.dm_os_wait_stats. If looking at the currently executing requests and the waiting tasks showed us what was being waited at any moment, the aggregate stats give a cumulative status since server start up. Querying this DMV is straightforward, but interpreting the results is a bit more tricky:

select *
from sys.dm_os_wait_stats
order by wait_time_ms desc;

First, what’s the deal with all those wait types that take the lion’s share at the top of the result? DIRTY_PAGE_POOL, REQUEST_FOR_DEADLOCK_SEARCH, LAZYWRITER_SLEEP? We did not see any request ever waiting for these! Here is why: by adding the WHERE session_id >= 50 we filtered out background tasks, threads internal for SQL Server that are tasked with various maintenance jobs. Many of these background tasks follow a pattern like “wait for an event, when the event is signaled do some work, then wait again”, while other are patterned in a sleep cycle pattern (“sleep for 5 seconds, wake up and do something, go back to sleep 5 seconds”). Since no task can suspend itself without providing a wait type, there are wait types to capture when these background tasks suspend themselves. As these patterns of execution result in the task being actually suspended almost all time, the aggregated wait time for these wait types often trumps every other wait type. The SQL Server community came to name these wait types ‘benign wait types’ and most experts have a list of wait types they simply filter out, for example see Filtering out benign waits:

Now we have a more coherent picture of the waits. the picture tells us what wait types are most prevalent, on aggregate, on this SQL Server instance. This can be an important step toward identifying a bottleneck cause. Furthermore, interpreting the data is data is subjective. Is the aggregate WRITELOG value of 4636805 milliseconds good or bad? I don’t know! Is an aggregate, accumulated since this process is running, so will obviously contiguously increase. Perhaps it had accumulated values from periods when the server was running smooth and from periods when it was running poorly. However, I now know that, from all ‘non bening’ wait types, WRITELOG is the one with the highest total wait time. I also know that max_wait_time for these wait types, so I know that there was at least one time when a task had to wait 1.4 seconds for the log to flush. The wait_task_count tells me how many time a task waited on a particular wait type, and dividing wait_time_ms/wait_task_count will tell me the average time a particular wait type has been waited on. And I see several lock related wait types in top: LCK_M_S, LCK_M_IS and LCK_M_IX. I can already tell some things about the overall behavior of the server, from a performance point of view: flushing the log is most waited resource on the server, and lock contention seems to be the next major issue. CXPACKET wait types are right there on top, but we know that most times CXPACKET wait types are just surrogate for other wait types, as discussed above. How about those other with types in the list, like SOS_SCHEDULER_YIELD, PAGELATCH_EX/SH or PAGEIOLATCH_EX/SH/UP?

Common wait types

In order to investigate wait types and wait times, either aggregate or for an individual query, one needs to understand what the wait type names mean. Many of the names are self-describing, some are cryptic in their name and some are down right deceitfully named. I think the best resource to read a detailed description of the names is the Waits and Queues white paper. It has an alphabetical list of wait types, description and what physical resource is correlated with. The white paper is a bit old (it covers SQL Server 2005) but still highly relevant. The wait type names for important resources has not changed since SQL server 2005, what you’ll be missing will be newer wait type names related to post SQL Server 2005 features. The wait types are briefly documented also in the sys.dm_os_wait_types topic on MSDN.

Disk and IO related wait types

PAGEIOLATCH_*

This is the quintessential IO, data read from disk and write to disk wait type. A task blocked on these wait type is waiting for data to be transferred between the disk and the in-memory data cache (the buffer pool). A system that has high aggregate PAGEIOLATCH_* wait types is very likely memory starved and is spending much time reading data from the disk into the buffer pool. Be careful not to confuse this wait type with the PAGELATCH_* wait type (not the missing IO in the name).

WRITELOG

This wait type occurs when a task is issuing a COMMIT and it waits for the log to write the transaction commit log record to disk. High average wait times on this wait type indicate that the disk is writing the log slowly and this slows down every transaction. Very frequent waits on this wait type are indicative of doing many small transaction and having to block frequently to wait for COMMIT (remember that all data writes require a transaction and one is implicitly created for each statement if no BEGIN TRANSACTION is explicitly used).

IO_COMPLETION

This wait type occurs for tasks that are waiting for something else that ordinary data IO. Eg. loading an assembly DLL, reading and writing TEMPDB sort files, some special DBCC data reading operations.

If your wait time analysis finds that IO and disk are important wait types then your should focus on analyzing disk activity.

Memory related wait types

RESOURCE_SEMAPHORE

This wait type indicate queries that are waiting for a memory grant. See Understanding SQL Server memory grant. OLTP type workload queries should not require memory grants,, if you see this wait type on an OLTP system then you need to revisit your application design. OLAP workloads often are in need of (sometimes large) memory grants and large wait times on this wait type usually point toward RAM increase upgrades.

SOS_VIRTUALMEMORY_LOW

Are you still on 32 bit systems? Move on!

Network related wait types

ASYNC_NETWORK_IO

This wait type indicate that the SQL Server has result sets to send to the application but the application odes not process them. This may indicate a slow network connection. But more often the problem is with the application code, it is either blocking while processing the result set or is requesting a huge result set.

CPU, Contention and Concurrency related wait types

LCK_*

Locks. All wait types that start with LCK indicate a task suspended waiting for a lock. LCK_M_S* wait types indicate a task that is waiting to read data (shared locks) and is blocked by another task that had modified the data (had acquired an exclusive LCK_MX* lock). LCK_M_SCH* wait types indicate object schema modification locks and indicate that access to an object (eg. a table) is blocked by another task that did a DDL modfication to that object (ALTER).

PAGELATCH_*

Do not confuse this wait types with the PAGEIOLATCH_* wait types. High wait times on these wait types indicate a hot spot in the database, a region of the data that is very frequently updated (eg. it could be a single record in a table that is constantly modified). To further analyze, I recommend the Diagnosing and Resolving Latch Contention on SQL Server white paper.

LATCH_*

These wait types indicate contention on internal SQL Server resources, not on data (unlike PAGELATCH_* they do not indicate a hot spot in the data). To investigate these waits you need to dig further using the sys.dm_os_latch_stats DMV which will detail the wait times by latch type. Again, the best resource is Diagnosing and Resolving Latch Contention on SQL Server white paper.

CMEMTHREAD

This wait type occurs when tasks block on waiting to access a shared memory allocator. I placed thi here, in concurrency section, and not in ‘memory’ section because the problem is related to internal SQL Server concurrency. If you see high CMEMTHREAD wait types you should make sure you are at the latest SQL Server available Service Pack and Cumulative Update for your version, because some of these kind of problems denote internal SQL Server issues and are often addressed in newer releases.

SOS_SCHEDULER_YIELD

This wait type can indicate spinlock contention. Spinlocks are extremely light wait locking primitives in SQL Server used for protecting access to resources that can be modified within few CPU instructions. SQL Server tasks acquire spinlocks by doing interlocked CPU operations in a tight loop, so contention on spinlocks burns a lot of CPU time (CPU usage counters show 90-100% usage, but progress is slow). Further analysis need to be done using sys.dm_os_spinlock_stats:

This wait type indicate that a task is waiting to compile its request. High wait times of this type indicate that query compilation is a bottleneck. For more details I recommend reading Troubleshooting Plan Cache Issues.

Special wait types

TRACEWRITE

This wait type indicate that tasks are being blocked by SQL Profiler. This wait type occurs only if you have SQL Profiler attached to the server. This wait type
occur often during investigation if you had set up a SQL Profiler trace that is too aggressive (gets too many events).

PREEMPTIVE_OS_WRITEFILEGATHER

This wait type occurs, among other things, when auto-growth of files is triggered. Auto-growth occurs when a insufficiently sized file is grown by SQL Server and is a hugely expensive event. During growth event all activity on the database is frozen. Data file growth can be made fast by enabling instant file growth, see Database File Initialization. But log growth cannot benefit from instant file initialization so log growth is always slow, sometimes very slow. Log Auto-Growth events can be diagnosed by simply looking at the Log Growths performance counter in the SQL Server, Databases Object, if is anything but 0 it means log auto-growth had occurred at least once. Real-time monitoring can be done by watching for Data File Auto Grow Event Class and Log File Auto Grow Event Class events in SQL Profiler.

I did not cover many wait types here (eg. CLR_*, SQLCLR_*, SOSHOST_*, HADR_*, DTC_* and many more). If you encounter a wait type you do not understand, usually just searching online for its name will reveal information about what that wait type is what potential bottleneck it indicates, if any.

Analyze disk activity: IO stats

SQL Server needs to read and write to the disk. All writes (inserts, updates, deletes) must be written to disk. Queries always return data from the in-memory cache (the buffer pool) but the cache may not contain the desired data and has to be read from disk. Understanding if the IO is a bottleneck for your performance is a necessary step in any performance investigation. SQL Server collects, aggregates and exposes information about every data and log IO request. First thing I like to look at is sys.dm_io_virtual_file_stats. This DMV exposes the number of writes and reads on each file for every database in the system, along with the aggregated read and write IO ‘stall’ times. Stall times are the total time tasks had to block waiting for transfer of data to and from disk.

The per file IO stats are aggregated since server start up but be aware that they reset for each file if it ever goes offline. The total number of bytes transferred (reads and writes) is a good indicator of how busy is a database from IO point of view. The stalls indicate which IO subsytem (which disk) is busy and may even be saturated.

Analyzing individual query execution

When analyzing an individual statement or query for performance allows us to understand what the query is doing, how much is executing vs. what is waiting on. The most simple analysis is to use the execution statistics. These work by first running the appropriate SET STATISTICS ... ON in your session and then executing the statement or query you want to analyze. In addition to the usual result, the query will also report execution statistics, as follows:

The query parse time, compile time and execution time are reported. If you execute a batch or a stored procedure with multiple statements then you get back the time for each statement, which helps determining which statement in a procedure is the expensive one. But we’ll see that analyzing the execution plan reveals the same information, and more.

The IO statistics show, for each statement, the number of IO operations. The result shows several metrics for each table touched by the statement:

scan count

Number of times the scans or seeks were started on the table. Ideally each table should be scanned at most once.

logical reads

Number of data pages from which rows were read from the in-memory cache (the buffer pool).

physical reads

Number of data pages from which data was had to be transferred from disk in-memory cache (the buffer pool) and the task had to block and wait for the transfer to finish.

read-ahead reads

Number of data pages that were asynchronously transferred from disk into the buffer pool opportunistically and the task did not had to wait for the transfer.

LOB logical/physical/read-ahead reads

Same as their non-lob counterparts, but referring to reading of large columns (LOBs).

There is also the SET STATISTICS PROFILE ON, which provides a detailed execution information for a statement, but much of the information overlaps with the execution plan info and I think analyzing the execution plan is easier and yields better information.

Logical Reads are the best measure of a statement cost.

A large logical reads value for a statement indicates an expensive statement. For OLTP environments the number of logical reads should be in single digits. Large logical reads count translate to long execution times and high CPU consumption even under ideally conditions. Under less ideal conditions some or all of these logical reads turn into physical reads which mean long wait times waiting for disk transfer of data into the buffer pool. Large logical reads are indicative of large data scans, usually entire table end-to-end scan. They are most times caused by missing indexes. In OLAP, decision and DW environments, scans and large logical reads may be unavoidable as indexing for an OLAP workload is tricky, and indexing for an ad-hoc analytic workload is basically impossible. For these later cases the answer usually relies in using DW specific technologies like columnstores.

Information about query execution statistics can also be obtained using SQL Profiler. The SQL:StmtCompleted, SQL:BatchCompleted,SP:StmtCompleted and RPC:Completed capture the number of page reads, the number of page writes and the execution time of an individual invocation of a statement, a stored procedure or a SQL batch. The number of reads is logical reads. Note that monitoring with SQL Profiler for individual statement completed events will have effect on performance and on a big busy server it can cause significant performance degradation.

You can also see the number of logical reads, reads and writes as it happens by looking at the sys.dm_exec_requests but this information is visible only if while a request is running. Furthermore the number aggregate all statements in a request so you cannot determine the cost of an individual statement using this DMV.

Analyzing individual query execution wait times

If you need to drill down more detail that execution time and IO statistics for a query or for a stored procedure, the best information is the wait types and wait times that occurred during execution. Obtaining this information is no longer straightforward though, it involves using Extended Events monitoring. The method I’m presenting here is based on the Debugging slow response times in SQL Server 2008 article. It involves creating an extended event session that captures the sqlos.wait_info info and filter the extended events session for a specific execution session (SPID):

Finally, we can aggregate all the data in the captured Extended Events session:

with x as (
select cast(event_data as xml) as xevent
from sys.fn_xe_file_target_read_file
('c:\temp\wait_stats*.xel', 'c:\temp\wait_stats*.xem', null, null)),
s as (select xevent.value(N'(/event/data[@name="wait_type"]/text)[1]', 'sysname') as wait_type,
xevent.value(N'(/event/data[@name="duration"]/value)[1]', 'int') as duration,
xevent.value(N'(/event/data[@name="signal_duration"]/value)[1]', 'int') as signal_duration
from x)
select wait_type,
count(*) as count_waits,
sum(duration) as total__duration,
sum(signal_duration) as total_signal_duration,
max(duration) as max_duration,
max(signal_duration) as max_signal_duration
from s
group by wait_type
order by sum(duration) desc;

This is an exceptional wealth of information about what happened during the execution of my request. 742 times it waited on the log commit to be written to disk for a total of ~14 seconds wait time, 12 times it waited for a lock, 2 times for a page latch etc. If your curious, my 'workload' I captured was a single row INSERT in a loop.

Analyzing Execution Plans

SQL Server can expose the actual execution plan for queries. The actual query tree gets serialized as XML and there are several ways to obtain it:

Use sys.dm_exec_query_plan function to obtain the plan XML from a plan handle. The plan handle can be obtained from DMVs like sys.dm_exec_requests (currently executing queries) or sys.dm_exec_query_stats (past executed queries).

If you don't know anything about execution plans then focus on thick arrows

I will not enter into much detail into how to analyze a query plan because the subject is way too vast for this article. Execution plans capture what the query did, namely what operators it executed, how many times was each operator invoked, how much data did each operator process and how much data it produced in turn, and how much actual time it took for each operation. Unlike the wait times analysis, analyzing the execution focuses on runtime and where was the actual CPU time consumed on. The graphical display of SSMS simplifies the information in an execution plan XML to make it more easily consumable and the graphical representation makes easy to stop the bottlenecks. Thick arrows represent large data transfers between operators, meaning that operators had to look at a lot of rows to produce their output. Needless to say, the more data an operator needs to process, the more time it takes. Hovering over operators display a popup info note with details about the operator. A large actual row count shows large amount of data being processed. The popup include also the estimated row count which is how much data SQL Server had predicted the operator will process, based on the table statistics. A wild discrepancy between actual and estimated often is responsible for bad performance and can be tracked to outdated statistics on the table.

Another are of interest in the execution plan are the operators themselves. Certain operators are inherently expensive, for example sort operators. An expensive operator coupled with a large data input is usually the cause of a huge execution time. Even if you do no know which operators are expensive and which are not, the execution plan contains the cost associated with each operator and you can use this information to get an idea for where problems may be.

Identifying problem queries

SQL Server exposes the runtime execution statistics for most queries it had run in sys.dm_exec_query_stats. This DMV shows the number of times a query had run, total/max/min/last active CPU time ('worker time'), total/max/min/last elapsed wall-clock time, total/max/min/last logical reads and logical writes, total/max/min/last number of rows returned. This kind of information is a real treasure cove to identify problems in an application:

Large execution count

Queries that are run frequently are the most sensitive for performance. Even if they perform acceptable, if they are run often then small improvements in each individual execution can yield overall significant speed. But even better is to look, with a critical eye, into why is the query executed often. Application changes can reduce the number of execution count.

Large logical reads

A query that has to scan large amounts of data is going to be a slow query, there is no question about this. The focus should be in reducing the amount of data scanned, typically the problem is a missing index. A 'bad' query plan can also be a problem. A high number of logical reads is accompanied by a high worker time but the issue is not high CPU usage, but the focus should be on large logical reads.

Large physical reads

This is basically the same problem as large logical reads, but it also indicates that your server does not have sufficient RAM. Luckily this is, by far, the easiest problem to solve: just buy more RAM, max out the server RAM slots. You're still going to have a large logical reads, and you should address that problem too, but addressing an application design issue is seldom as easy as ordering 1TB of RAM...

High worker time with low logical reads

This case is not very common and indicates an operation that is very CPU intensive even with little data to process. String processing and XML processing are typical culprits and you should look at moving the CPU burn toward the periphery, in other words perform the CPU intensive operations on the client, not on the server.

High elapsed time with log worker time

A query that takes a long wall-clock time but utilizes few CPU cycles is indicative of blocking. This is a query that was suspended most of the time, waiting on something. Wait analysis should indicate the contention cause, the resource being waited for.

High total rows count

Large result sets requested by the application can be justified in an extremely low number of cases. This problem must be addressed in the application design.

Finding a problem query using sys.dm_exec_query_stats becomes a simply exercise of ordering by the desired criteria (by execution count to find frequently run queries, by logical reads, by elapsed time etc). Tho get the text of the query in question use sys.dm_exec_sql_text, and to get the query execution plan use sys.dm_exec_query_plan:

If you do not know what kind of problem query to look for to start with, my advice is to focus in this order:

High execution count. Identifying what queries are run most often is in my opinion more important that identifying which queries are particularly slow. More often than not the queries found to be run most often are a surprise and simply limiting the execution count yield significant performance improvements.

High logical reads. Large data scans are the usual culprit for most performance problems. They may be caused by missing indices, by a poorly designed data model, by 'bad' plans caused by outdated statistics or parameter sniffing and other causes.

High elapsed time with low worker time. Blocking queries do not tax the server much, but your application users do not care if the time they waited in front of the screen for a refresh was spent active or blocked.

Missing Indexes

Repeatedly I said that performance problems are caused by 'missing indexes', and SQL Server has a feature aptly named Missing Indexes:

Personally I seldom focus my analysis on the built in missing index feature. To start with the feature has known limitations, but more importantly is that the feature is starting with the solution, w/o first searching for a cause. I always start with identifying problem queries first, searching for ones which would yield the highest bang for the buck. Once a problem query is identified as causing large scans, the missing index DMVs can be consulted to identify a candidate. But you should always take the recommendations with a grain of salt. The missing index feature does not know what the query is doing from an application business logic point of view. Reviewing the query can potentially find a better solution than what the automated missing index feature may recommend.

Experienced administrators have in place automation that periodically inspects the missing index DMVs and find potential hot spots and problems. This is a very good proactive approach. But in most cases the performance problems are hot and immediate, and finding them using wait analysis or query execution stats is a more direct process. For a more in depth discussion of missing indexes DMVs I recommend sp_BlitzIndex.

TEMPDB Performance

tempdb is the special SQL Server database that is used to store any transient data, including things like temporary tables, static cursors, table variables, sort runs, internal worktables, the version store and piles more. Even though the problems that can occur with tempdb are ultimately CPU, IO or contention problems, tempdb deserves a special topic because some of these problems can be understood at a higher level as tempdb problems, as opposed to, say, low level latch contention problems. tempdb also tends to manifest a set of problems entirely of its own, specially due to the very high rate of creation and destruction of objects in tempdb. MSDN has an Optimizing tempdb Performance topic which contains some practical advice, and Cindy Gross has compiled a collection of SQL Server TempDB IO Best Practices.

Make each tempdb data file the same size

In general I tried to emphasize the importance of measuring and understanding SQL Server performance characteristics instead of blindly following advice. But here is one piece of advice that I always give, w/o any secondary consideration: make sure tempdb is stored on files of identical size, with identical growth characteristics. I've spent just too many hours chasing difficult to diagnose issues to discover that they're ultimately caused by unaligned tempdb files.

SQL Server Performance Counters

Performance counters offer a different perspective into SQL Server performance than the wait analysis and execution DMVs do. The biggest advantage of performance counters is that they are extremely cheap to collect and store so they lend themselves as the natural choice for long term monitoring. Looking at current performance counter numbers can reveal potential problems, but interpreting the values is subject to experience and there are a lot of lore and myths on certain magical values. For a analyzing performance problems on a server that you have access and is misbehaving right now, a more direct approach using wait analysis and execution stats is my recommended approach. Where performance counters shine is comparing current behavior with past behavior. for which, obviously, you need to have performance counters values collected and saved from said past. These should be part of an established baseline, which has captured the server behavior under workload when the server was behaving correctly and was not experiencing problems. If you do no have a baseline, consider establishing one.

The performance counters infrastructure workhorse is the logman.exe utility. For a quick intro, read Two Minute Drill: LOGMAN.EXE. I recommend you familiarize yourself with this tool. I favor it over the graphic interface (perfmon.exe) because, as a command line interface (CLI) tool, it allows for easy scripting. Telling someone "run the attached script" is always easier that telling "click the button, select A from the list, then choose menu B, click twice in the middle of the red square etc etc". The graphic perfmon.exe tool is the tool of choice for displaying performance counters.

Performance counters, if collected at sufficient resolution, are also an excellent tool for determining correlation. The graphic display makes easy to spot counters that tend to spike together, or in an anti-correlated pattern (one spikes the other dips). Other patterns also become evident in the graphic visualization, like lag (eg. a counter spike is always preceded by another one increasing). Short of having a data scientist doing factorial analysis on your counters, your own eyes are the best analysis tool.

SQL Server usage performance counters

If you want one counter to best represent the overall performance throughput of your server, this is it. It captures the workload pressure (how many requests the application is making), and how is the server coping with the workload. Trends in this counter need to be further analysed: an increase in the value could mean the server is performing better, because it can accept and handle more requests, but it can also mean the application is sending more requests (bigger, heavier workload), for example more your application is serving more client requests.

The number of active transactions (any type). Be aware that there are at least two counters named Transactions, one in the SQL Server:General Statistics object and another one in the SQL Server:Transactions object. I'm referring to the later one. Knowing the number of active transactions is important to set a background on your understanding of the workload.

This is specially useful when troubleshooting spikes or jagged performance. For example if you can correlate moments where performance drops with moments when the blocking increase you have found an area to focus your investigation on: blocking.

Related to the number of Errors/sec, monitoring the rate of deadlocks can quickly point toward a performance problem. Is hard to believe that a you are not aware that deadlocks are occurring, given the impact they have on the application. But I've seen cases where the deadlocks were quietly swallowed in the application or simply there was no communication between the organization monitoring the application and the DBA organization monitoring the server.

I can hardly think of any event in SQL Server execution that has a more dramatic impact on performance than log growth. During log growth every write request effectively freezes as it waits for available log to generate the write-ahead log records necessary to record any update. Log growth cannot use instant file initialization so it must wait until the log file is grown and the newly allocated disk space is filled with 0s. Your application users might as well watch paint dry... Often cases this is aggravated by a log file gone out of control with 10% increments which is now growing in hundreds of megabytes or even gigabytes at a time. The impact on performance is usually devastating, and is also difficult to track down after the fact. It shows as a sudden performance dip (actually often a complete freeze) that lasts few seconds or even minutes, and then everything is again fast until the next log growth event. The article How to shrink the SQL Server log offers an explanation why log growth can occur. Data and log growth can also be tracked by non-zero values for the PREEMPTIVE_OS_WRITEFILEGATHER wait info statistics.

Monitoring the data size is interesting on its own, but there is another reason why this counter should be collected: it can show when database growth events occur. Unlike Log Growths, data growths do not have counter of their own (shame!), but the events can be deduced from sudden jumps in the data file size. Note that data growth can leverage instant file initialization so the impact may not be as dramatic. You should always ensure instant file initialization is enabled for SQL Server, if is not the impact may be serious, specially if you encounter tempdb growth events (which in my experience occurs on most deployments).

IO related performance counters

The rate at which data pages are read from disk. This will be the main driver factor for disk read IO. Normally your workload should read only a very small amount of data from disk, the data should be cached in memory. Constant reading is a sign of data scans (missing indexes) and/or insufficient memory (RAM).

Together with the Log Flush Waits/sec, Log Flush Wait Time and Log Flush Write Time (ms) you can quickly get a feel of how much blocking occurs due to transactions waiting to commit (log flush events). For well written OLTP intensive applications the rate at which transactions can be committed is often the main performance bottleneck.

I intentionally put data write after log writes, because most times is the log that is the problem. Only few workloads are data-write heavy. However data-write can interfere with disk IO, specially with log writes, and cause delays in log flush waits that are very visible as application performance drops. The typical pattern is some 'longer' time of good performance with periodic 'short' times of performance dip. 'Long' and 'short' are relative, sometime you see 1-2 minutes 'long' and 10-15 seconds as 'short', but other times is 10-15 minutes 'long' and 1-2 minutes 'short'. The jagged pattern is the key, with a strong correlation between performance dips and data-writes spikes. The pattern is caused by checkpoints and the timing is driven by the recovery interval option. Looking at Checkpoint pages/sec will quickly corroborate this hypothesis.

This performance counter exposes the wait info object for log writes. The advantage of using performance counters vs. the DMVs is to leverage the performance counters infrastructure, namely collecting the values in a continuous fashion using the operating system collectors.

Same as above, this is a wait info exposed through performance counters. Strictly speaking this does not measure IO, it measures only when blocking caused by IO occurs. You cannot distinguish between read IO and write IO with this counter.

Often the IO requirements of the maintenance backup operations are forgotten when doing performance analysis. Understanding the additional IO pressure a backup adds and tracking it through this counter is important on its own, but is even more important to understand correlation between spikes in this counter and general performance degradation. This is a scenario I've see all too often: the maintenance plan kicks in and the overall throughput dips significantly, but nobody understands why.

The Process counter category has a different instance for each OS executing process and is very important to analyze when the overall IO on the system does not match the IO as reported by SQL Server. Often times you'll discover that another process is causing IO making the disks busy and increasing the response time in SQL Server. Unfortunately monitoring this counter is difficult long term because instances come and go as OS executable processes start and terminate.

Despite being in the Memory counter category, understanding if and when paging occurs is critical in understand IO performance as paging can introduce a large spike in IO requests. See Monitoring Paging.

The value of 300 seconds is a Microsoft recommendation in a white paper that was written five years ago. It’s not a relevant guideline nowadays, and even back in 2006 I would have argued that aiming for a set value isn’t a valid recommendation.

Analyzing PLE is useful as it gives a single value to get a feel for the server memory health, but do not panic if the value is 297 and relax is is 302... More important is to understand trends and the dynamic of this value. A jagged pattern (periodic dips) indicates something that comes and creates a great disturbance in the server memory, perhaps a periodic query that causes a scan, evicting much of the cached data. An overall trend in reducing PLE indicates your workload increases but the server hardware is the same and is a great indicator of potential future problems, better to alleviate now.
On machines with NUMA nodes, which in this day and age means pretty much every server, you should pay attention to the PLE per node using the SQL Server:Buffer Node\Page life expectancy counter. Read Page Life Expectancy isn’t what you think….

This category tracks the how is memory used inside the SQL Server process. the absolute values themselves are somehow esoteric, but having a baseline to compare against and looking at long term trends is a great way to determine what changed and what components start putting SQL Server under memory pressure.

I'm calling out this counter in the Memory Manager category because is particularly important in understanding blocking caused by memory grant waits. Memory grants are only needed for certain query operators and queries requesting large memory grants tend to wreak havoc with your SQL Server performance. Read Understanding SQL server memory grant for more details.

The Process category OS performance counter has an instance for each process in the system and allows you to easily track other processes that may consume memory, causing external memory pressure for SQL Server and ultimately performance degradation.

CPU performance counters

I'm mentioning this counter here, in the CPU category, because it is a critical counter to analyze when considering CPU usage in SQL Server. When this counter correlates strongly with CPU usage is a clear indicator of why the CPU usage increased: more pages are scanned, the queries simply look at more data and therefore consume more CPU. The usual culprit is, again, table scans (ie. missing index, poor data modeling or a bad query plan).

The Process category OS performance counter has an instance for each process in the system and allows you to easily track other processes that may consume CPU, causing SQL Server performance degradation.

SQL Server blocking related performance counters

Monitoring and analyzing the locking performance counters can reveal if lock related blocking is a significant cause of performance problems. Look specially at the Lock Wait Time (ms) value, it will give you an insight on how much percent of the time is spent by queries being blocked waiting for locks. You won't be able to determine exactly what query is blocking what, but knowing that lock waits are significant allows you to focus on analyzing the block chains. sp_whoIsActive is a great tool for that.

Most important wait info types are reported also as performance counters. Analyzing these is similar to analyzing the wait statistics using the DMVs, but the performance counters have the advantage of being available in the whole performance counters toolset ecosystem, making easy to collect, store and analyze. Not all wait info types are exposed as performance counters.

Network performance counters

ProTip: Network is not your bottleneck. Network performance comes into picture usually only for deployments using Database Mirroring or Availability Groups. These features are indeed network hungry and require a fast, high bandwidth-low latency connection between the nodes. Furthermore, both DBM and AG tend to push their problems up the stack, resulting in performance degradation when the network traffic cannot keep up with the workload.

This counter shows the wait info stats related to the server being blocked because the client is not consuming the result sets. Most often this does no indicate a network problem, but an application design problem: the client is requesting large result sets.

These categories are important for monitoring the health of the traffic, not the amount. Specifically they can indicate network errors, dropped packets and retries.

Defining your own counters

You can actually report custom performance counters from T-SQL. The SQL Server:User Settable is specifically designed for this and allows you to report up the 10 user defined values to be monitored and collected by the system performance counters tools. You publish counter values by using sp_user_counter1 through sp_user_counter10:

EXECUTE sp_user_counter1 @value;

For application developers is also possible to add performance counters to the application itself. Publishing performance counters from managed (.Net) application code is trivially easy, and you can automate the somehow boring process of declaring them, see Using XSLT to generate Performance Counters code.

Advanced Performance Analysis

Developers are familiar with performance analysis using sample profiling, with tools like VTune, F1 or WPA. The sample profiling technique consist of periodically taking a stack sample of each thread in the profiled process, up to several hundred times per second. The process is non-invasive, it does not require a specially instrumented process, any process can be profiled. It is also fairly lightweight, taking several hundred samples per second has no measurable impact on the performance of the process being sampled. The trick is interpreting the observations. Using the Microsoft public symbols server you can resolve the collected samples to actual function names in the source code (note that in my experience F1 and WPA do a much better job at handling the LTCG code of SQL Server than VTune does). With some common sense, one can make an educated guess what exactly was the SQL Server doing during the period being sampled. This technique makes sense only if you are seeing significant CPU usage. If the performance is bad and you see slow queries but low CPU usage it means the queries are waiting and you should focus instead on wait info analysis. Sample profiling of a blocked process will reveal nothing useful.

WPA is one of my favorite performance tools and it can do more than just sample profiling. It can visualize XPerf ETW traces and this makes it extremely powerful. Existing documentation and whitepapers for WPA tends to focus on multimedia and game development, but if you are and adventurous spirit and want to learn about how your favorite database works, using WPA to analyse SQL Server works can reveal a great deal of information and help you spot performance improvements that can be leveraged in your application. One of my favorite use cases for WPA is to collect IO activity (this will collect every single IO request) and then visualize the Disk Offset as a graph. This is an incredibly simple, yet powerful, way to visualize random IO vs. sequential IO patterns. If you have a feeling WPA and XPerf/ETW are somehow related to SQL Server Extended Events, that is because they are, very much so. Finally, some readings to get you started on using WPA:

The USE method

USE stands for Utilization, Saturation and Errors, and the USE Method is a formalized approach to investigating performance troubleshooting. Its focus is on identifying resources in a system and then for each one measure the utilization (percent in use compared to maximum bandwith), the saturation of the resource (extra work queued because the resource cannot service due to 100% utilization) and errors.

When used in SQL Server context, the tricky part is to identify the resources and how to measure the utilization/saturation/errors. Some resources are obvious: disk, CPU, memory, network. Sometimes even simply starting with the USE method and measuring these hardware parameters is a good way to start a performance investigation, at the very least you'll know what hardware component is saturated, if any. But knowing how SQL Server works can reveal more detailed, internal resources that can be analyzed. In Understanding how SQL Server executes a query I have shown this image:

Many of the components in the image above can be investigated as Resources in the USE method. Applying this methodology requires some thinking outside the box since you will not find much references online about how to apply the USE method to SQL Server, what resources to look for and how to measure the utilization, saturation and errors for each. I myself do not have a clear picture yet for each resource involved and I'm still collecting facts and know-how on the subject. But I like the USE approach because it gives a method, a strategy to follow in troubleshooting performance.