Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It only takes a minute to sign up.

I have messed things up at work (oh no!). Everything was running fine until about two days ago; queries executed instantly, and the database tended to sit around 20% load. Then I got told that some of our historical data in the database was entered incorrectly, and needed to be removed and re-entered from the following tables:

My first attempt to delete the data simply took far too long - it's historical data so there are a few million rows. After that failed, I asked around and someone suggested disabling check constraints, deleting, and then re-enabling the constraints for those specific tables (which I will NEVER do again, terrible idea). The disable and delete ran fairly quickly, I had to leave the enable running all night but it did succeed.

From then on, the database has been at 100% CPU, and one of the queries occasionally never completes. It's the same query every time:

SELECT TOP 500 * FROM TrackerStates WHERE TrackerId = @TrackerId AND Processed = 0 ORDER BY DateRecorded

This query uses the index IX_TrackerId_DateRecorded_Processed, and usually takes no time at all to execute. Occasionally though, a query will just sit there executing until it times out (after 30 seconds).

So far, I have tried:

Scaling the database up to 5 times the size (less timeouts, but still 100% CPU usage)

Rebuilding the indexes, as they were fragmented to about 80% (no difference)

What query plan does the db report using for the slow query? And to clarify: you rebuilt and not just reorganized the indexes, right? What does the database say is using the CPU?
– indiriSep 7 '17 at 22:26

The query plan is the same between normal execution and the slow executions. The database says this is the query using the CPU. Here's a link to the plan, I'm having trouble getting it into BrentOzar I think because I'm using SSMS2017.
– Andrew WilliamsonSep 7 '17 at 22:36

1 Answer
1

It looks like the statistics are off. From the plan you posted it's estimating that it's going to read 64,000 rows, but it's actually reading zero. That's a very wide disparity. I'd suggest a few things. First, update the statistics with a full scan. Any index rebuilds ought to have taken care of that, but with this disparity I'm wondering if something is up there. Next, make sure that the constraints are all in place (although, this plan isn't referencing constraints since it's a straight index seek with a TOP operation). Finally, capture the wait statistics for the system to see what's actually causing things to run slowly. You can use extended events to capture the wait metrics for just this query, so that's an even better approach. Also, a 64,000 row range scan from a seek is a little excessive unless you have millions of rows. However, this could still be a part of the statistics being off.

I stopped the server, made a copy of the database (with Azure's GUI, not sure how it does it in the back end), and the copy database is running fine. I will try your suggestions on the original when I get back to work, because I would like to know what caused the problem and how to fix it. For future reference, how can I make DELETE on the TrackerStates table faster? Assume my application has already handled dependent rows, so there is no need for cascade deletes.
– Andrew WilliamsonSep 9 '17 at 3:35

Mass deletes are frankly tough. The best way to get them done is to do them in batches so that you keep the transactions smaller. It can take longer in batches, but it's generally going to be a safer, less intrusive operation. You'll still have to deal with the constraints and the statistics. That part doesn't go away. Cascading deletes you can eliminate, but not the constraints (never, ever trust the app to get that perfect, constraints are a safety check here).
– Grant FritcheySep 10 '17 at 15:31

UPDATE STATISTICS TrackerStates WITH FULLSCAN doesn't seem to have helped. All constraints are enabled. How do I capture the wait statistics? This is a historic table, so I do have millions of rows, but I agree, it shouldn't take a 64,000 row range scan with the indexing I have set up.
– Andrew WilliamsonSep 11 '17 at 0:36

The CPU usage has switched from this being the bottleneck to a different query, and operations are no longer timing out, although I am still at 100% CPU. I guess speeding that up should be a separate question.
– Andrew WilliamsonSep 11 '17 at 4:05