The Importance of SQL Server Statistics

It’s impossible to overstate just how important statistics are to SQL Server. Without them, SQL Server’s impressive cost-based optimizer simply wouldn’t even be an option.

The Need for Updated Statistics

As a rough and tumble analogy for gaining a sense of how important statistics are, image a team of individual treasure hunters combing a beach with metal-detectors.Then assume they’re working an area where they’re actually finding LOTS of treasure. Some of this treasure is tiny in the form of jewelry, trinkets, or single coins. Some is larger and takes the form of mast-heads, cannons, or even entire ‘treasure chests’. And, in this analogy, there’s also the possibility of running into entire shipwrecks – complete with literal mounds of treasure.

Then, imagine that for every found treasure, this team needs to decide HOW they’ll get each ‘find’ back up to their base of operations. At their disposal they’ve got additional team members – back at the base – waiting to be called down and fetch/retrieve treasure with the following tools:

A Shovel

A Bucket

A Wheel-Barrow

A Pickup Truck

An Earth Mover

Now, obviously, each of these ‘containers’ is ideally suited for different sizes and ‘types’ of treasure. And, in many ways, if you think about queries against your system/databases as being similar to these kinds of ‘treasure seeking’ activities, you’ll realize that queries come in all sorts of different sizes and shapes. Some (hopefully most) are bucket to wheel-barrow sized affairs where a treasure-hunter can quickly find ‘treasure’ (data), call in a bucket or a wheel-barrow to quickly zip in and grab the data, while other operations might require a few trips from a pickup or earth-mover. Only, each and every time you have to fire up an earth-mover (or pickup) and operate it on sand, you’re going to expect things to be a bit slower and more cumbersome to deal with.

And, in this regard, updated statistics are NOT quite the treasure-hunter NOR are they the retrieval mechanisms. Instead, statistics are MORE like the ‘assessment’ and ‘reporting’ process that a treasure-hunter would go through when they find a treasure and radio back to base for something to come in and retrieve the treasure. Because while it would be overkill to spin-up a pickup truck to come and grab a single earring, an even worse scenario would be finding a sunken galley full of treasure and assessing that a bucker or wheel-barrow would suffice – because, once committed, what you’d end up with would be a bucket/wheel-barrow making gobs and gobs and gobs and gobs of trips back and forth to fetch your data (or gold) and, obviously, you’d watch performance suffer.

Consequently, without accurate and up-to-date statistics, SQL Server can and WILL make the wrong assessment about what kinds of specific retrieval (or modification) mechanisms it should use when processing queries.

Additional Learning Resources

To learn more about the core importance of statistics, I highly recommend the following resources:

Kimberly L. Tripp’s Excellent MCM Video on Statistics. This video provides fantastic insight into exactly how SQL Server uses statistics when interacting with the storage engine. I can’t recommend it enough – especially since it does such a great job of making such a technical and complex subject so easily approachable and easy to understand.

Whitepaper: Statistics Used by the Query Optimizer in Microsoft SQL Server 2008. This is a great white paper that covers statistics in detail – including providing detailed information on when statistics are created and updated. (And, just as a note, this blog post provides information about how trace flag 2371 can be used to make those refresh rates a bit more aggressive. (And, to be VERY clear: I’m NOT recommending that trace flag – just pointing out that this MIGHT be an option in SOME environments IF you’re willing to incur the negatives that implementing such a trace flag would incur HAD you fully determined that you needed such a fix.)