Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It only takes a minute to sign up.

I have a stored procedure building a fact table from a data model. I'm generally using the technique described here as I found it gives a significant performance boost.

However, I'm running into situations where SQL Sever 2016 (SP1, CU1, Developer edition) just hangs. One such permutation it hung for 5 days and 23 hours (I went on vacation and let it run to see what would happen). Normal performance for this SP is about a minute. This particular section of the SP is the slowest part at about 40 seconds.

By reducing the joins one at a time, I was able to find a culprit and engineer around this particular problem, however, now I'm expanding my testing and the problem is back (I kill it after 10 minutes, but nothing indicates to me that there is any reason to expect it to be done in 11. sp_whoisactive shows me no locks and no waits. CPU time simply ticks up and up.

In this particular case the last join on the list not commented out is what causes it. I also left in the original problematic join. Yes, I am joining to the same table over and over again with different parameters on purpose.

When the problem was the first issue I figured I could work around it. But now this second appearance of the issue suggests to me that the whole procedure isn't reliable. In the case I am currently having a problem with, there are no record in #GLT. Zero. And there are two in the InvestmentDimension table. The original one under test has about 200,000 in #GLT.

Running

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

before running the test changes nothing (not that this would be a good solution, but perhaps it eliminates some possible causes).

Where can I look further at what the problem is and how to work around it?

Further insight:

I rewrote this to chunk it into one insert and three updates. This had the advantage of reducing some code complexity in the select field list, so although it impacts performance, it isn't a terrible compromise for now - I'm still investigating that. However, after I did that and continued my experimentation, suddenly the population of #GTL started hanging forever.

It doesn't have anything like that join list, but it did have one type of join in common, and removing that immediately solved the problem. It was this line:

Basically it is linking to the record where a secondary (non-unique) id is a hard coded number and the other criteria is from a temporary table. Although this line generally works, when the join list gets large enough, it seems to send the compiler into a death spiral when it is there. The reason it started failing for #GLT is that this table now had two records with an InvestorId of 0, one for each ClientId. Previous experiments hadn't gotten that far.

Insights into how to understand this more directly (rather than the hunt and peck of pulling out joins one by one) are still appreciated.

Adding OPTION (RECOMPILE) did not help. Trace flags and database scoped configurations are basically the defaults. I did try turning on optimize for ad hoc workloads but that didn't change anything. There are no trace flags active.