If you're really interested, take three minutes: run the steps to reproduce the issue, and then check the box that says that you were able to reproduce the issue.

Why?

Imagine that ten hours ago you started a big transaction. You're sitting there waiting for it to finish, and it's running, and running, and running. At some point, you notice that the drive with the transaction log has filled up, so you create a second log file. And the transaction is sitting there running, and running, and running. Is it still doing work? Or did it catch the low disk space issue and start rolling back? Wouldn't it be great if you could actually answer that question? Are you surprised to learn that you can't answer that question?

SQL Server currently has a few DMVs that are supposed to tell us the status of a given request, transaction, and so on. These DMVs are unreliable in the vast majority of situations. That means that we are unable to answer important questions like, "is my transaction still doing any work, or did it die three hours ago?"

The SQL Server team needs to fix this. End of story. Please vote and help me convince the powers that be to do the right thing.

Comment Notification

Comments

Just curious - the explanation they gave to my untutored eye appears to contradict the whole concept of ACID. If it doesn't roll back the whole transaction, what does it do - retry? In a nested transaction the whole lot gets rolled back (barring save points), so their analogy seems to be highly misleading. Tbh I'm rather confused now, and would rather have not read it... ;)

"Each time a statement is run, it starts a virtual scope within the transaction (akin to a nested transaction), which is used to log the changes made by that stement. If an error occurs (such as the attention event), depending on the severity of the exception either the scope will be rolled back or the whole transaction.

The rollback is limited to the scope for low-severity exceptions or if XACT_ABORT is OFF. For high-severity exceptions, the whole transaction will be rolled back.

Because of the above, during the time the scope is rolled back, the transaction itself is not necessarily being rolling back, so that's the reason the state remains as ACTIVE (2) in the DMV."

SQL Server internally uses parent and child transaction scopes. These are all bound to the same transaction via the parent, so there is no consequence as far as ACID goes -- it's more for flexibility and granularity.

For example, if you have a bunch of threads and one of them needs to spill some data to tempdb, it will do so in a child transaction. That data doesn't need to be replayed or rolled back if the parent transaction is interrupted, so there is no worry there.

Another scenario is with the query I posted in the item. In this case the parent transaction can support numerous child transactions over the course of its lifetime (the big insert being one of them), and if one of them dies all of the work that's already been done can still be committed. That's a feature!

As far as the DMVs go, unfortunately, it's difficult at best to tie child transactions to parent transactions. You can do it, but only via the sys.dm_tran_locks view, which tends to be very slow and is not reliable. Even worse, the rows in the transaction views for child transactions don't seem to indicate the rollback either. There is a column called "status2" that seems to have something to do with it, but it's totally undocumented and I don't know what to look for. Of course since I can't reliably tie the parent to the child it's a moot point. I simply can't get the right data to figure out what's happening.

Mr. Tea: Sure, you can do that, but it may not tell you much. IO_COMPLETION is used for other purposes, so there may not be a rollback happening at all, and even if you are in the midst of a rollback there's no guarantee you'll see it unless your disks happen to be somewhat overloaded. It is unfortunately not a very accurate way of telling what's going on.

I Agree, IO_COMPLETION is not well documented (I'd love to know the exhaustive list of what causes it) but it may be the best indication we have until someone takes the above request seriously!

After a few quick tests with extended events to monitor the pattern of waitstats on a SPID, during rollback it shifts from what I expect e.g. PAGEIO/LOCK based on the type of transaction to a bunch of IO_COMPLETION, Ill have to test this on a large transaction that causes churn in the buffer pool to see what kind of wait pattern that gives during rollback.

From observation, IO_COMPLETION in this case looks like its the log file disk waits when reading the Tlog for rollback/forward.

September 5, 2012 5:11 AM

Leave a Comment

About Adam Machanic

Adam Machanic is a Boston-based SQL Server developer, writer, and speaker. He focuses on large-scale data warehouse performance and development, and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has written for numerous web sites and magazines, including SQLblog, Simple Talk, Search SQL Server, SQL Server Professional, CoDe, and VSJ. He has also contributed to several books on SQL Server, including "SQL Server 2008 Internals" (Microsoft Press, 2009) and "Expert SQL Server 2005 Development" (Apress, 2007). Adam regularly speaks at conferences and training events on a variety of SQL Server topics. He is a Microsoft Most Valuable Professional (MVP) for SQL Server, a Microsoft Certified IT Professional (MCITP), and an alumnus of the INETA North American Speakers Bureau.