Another recap of a problem we recently tackled where we had to find out why a doomed transaction was still trying to do work. A doomed transaction is one that is about to roll back. In our case, it was caused by a rare set of circumstances:

Getting Error 3930

Msg 3930, Level 16, State 1, Line 14
The current transaction cannot be committed and cannot support operations that
write to the log file. Roll back the transaction.

Google pointed me to this Stackoverflow question where Remus Rusanu talks about doomed transactions. You’re not allowed to do work in a catch block if your transaction is doomed. He then points to more information about uncommittable transactions in the TRY/CATCH docs. According to the TRY/CATCH docs, you can get this kind of error when XACT_ABORT is turned ON. This is something we do and I wrote about why in Don’t Abandon Your Transactions.

So there’s a CATCH block catching some error in a TRY block. Now I wonder what that error is. The SqlException that got raised to the client only had the 3930 error above. It had nothing about the original error. I did have the procedure name though. I can look there.

Complicated Procedure

So I took a look at the procedure getting called. I saw that it was calling another procedure. And that procedure called other procedures. It was a big complicated mess. The calling tree looked like this:

Something Complicated

So the procedure was complicated and it used explicit transactions, but I couldn’t find any TRY/CATCH blocks anywhere! What I needed was a stack trace, but for T-SQL. People don’t talk about T-SQL stack traces very often. Probably because they don’t program like this in T-SQL. We can’t get a T-SQL stack trace from the SQLException (the error given to the client), so we have to get it from the server.

More Info Needed From the Server

So luckily, extended events do have T-SQL stack traces. I wanted to look at those stack traces whenever there was an error on the server. My session looked something like this:

Eventually It Failed Again

Woohoo!
First I looked for the error. The one where error_code = 3930. I did it using Management Studio to view the session’s data.
Then I looked at the error immediately before it. The one where is_intercepted = true. That looks something like this

The Original Error

Here’s what I found. The original error was a PK violation in a procedure using the Just Do It (JDI) pattern. It’s a technique which tackles upsert concurrency problems by inserting a row and catching and suppressing any PK errors that might be thrown.
So that’s where the TRY/CATCH block was.

Also, the stack trace told me that the procedure was called from a trigger. So my complicated procedure was actually more complicated than I thought. It actually looked like this

Something More Complicated

That’s why i didn’t see it before. I hate the way triggers can hide complexity. They’re sneaky.

The CATCH block catches a real error and the XACT_ABORT setting dooms the transaction. Because I didn’t do anything with the error, the transaction was allowed to continue. It was actually some query in s_ProcM that attempted to do work inside the doomed transaction.