A mysterious problem plagued one of our web based applications: Usually searches are very fast and provide
results within a couple seconds at most.
Very rarely though searches in the system would time out and keep the UI
hanging on a loading screen. Once this happened, the next
searches would most of the time be very fast again and the problem could not be
reproduced – I shrugged it off many times as the “warm-up effect”. Surely
enough, a few weeks later the same thing would happen – a search times out.
Eventually we had one system where the “warm-up effect” would not “solve” the
problem – the system was stuck and did not return any search results.

What happened here?

Our searches are executed by a back-end powered by SQL Server full text search
which keeps statistics internally to make an “informed” decision on what query
plan to generate for a given query. Exactly
when these statistics are updated depends on when SQL Server decides the
existing statistics are stale. As a rule of thumb for a decently large table
whenever more than 20% of the rows in a table have been changed (additions,
deletions or modifications), the corresponding statistics are probably
considered stale. This happens to be around 20 days for most of our databases since they are producing new data and trimming old data on a 24/7 basis.

The default setting in a SQL Server database to deal with
statistics updating is to automatically create statistics and to automatically
update statistics. This sounds like a
sensible default.

The problem lies in the fact that the statistics are updated
at the time the query comes in – If there is an incoming query but statistics
are stale, SQL Server will update the statistics first before creating the query plan for the query and executing the
query.

Our database just happens to be fairly large and content
changes very often (real-time), so statistics updating takes time and is
frequently needed. Unfortunately above approach of first updating the
statistics will make the query time out on a large database, and in some cases
even timeout on the statistics updating itself, so the statistics are never
updated. Rinse and repeat for the next search, and no search will ever
complete.

The solution was (of course! in hindsight) to set the statistics updating to “Auto
Update Statistics Asynchronously” which allows the query to complete using the
stale statistics for query plan generation but triggers a statistics update in a
separate thread. After this change all
searches again are very fast, and timeouts are a thing of the past.