Intelligent Stats Updater

Ensuring that statistics are up-to-date is a critical maintenance task because statistics are used by the query optimizer to choose an optimal query plan. Poor statistics can result in poor query plans, which can result in poor performance of both the individual query and the overall server. If a poor query plan is chosen in my environment, which has a federated farm containing both large and small clients on my multidatabase servers, a spike in CPU usage can occur (often read: CPU pegged at 100 percent).

Like many of you, I always ensure my indexes are properly maintained. I rebuild and reorganize indexes whenever I have the opportunity and have the auto update statistics asynchronously option enabled for any statistics that might not have been updated during an index rebuild. In theory, based on the rebuilds and frequency of data changes triggering automatic statistics updates, the statistics in my environment should be up-to-date. However, even with all of these measures in place, there’s still the possibility that my statistics could be out-of-date because of how the auto update statistics option functions. Thus, I set out to find a better way of ensuring that statistics are fresh.

The initial requirements for my process were pretty typical: I wanted to ensure statistics were updated efficiently and with the least amount of impact on my server as possible. I created the IntelligentStatsUpdater stored procedure to accomplish this task. Let’s take a look at this stored procedure and its parameters. (You can download IntelligentStatsUpdater.sql by clicking the 103405.zip file at the top of the page.)

The Graduated Update Scale

Simply put, executing IntelligentStatsUpdater will locate out-of-date statistics based on input parameters and update them using a Graduated Update Scale to determine an appropriate statistics sample percentage for the table size. Line 590 of IntelligentStatsUpdater.sql, which Figure 1 shows, is the case statement that constitutes the Graduated Update Scale.

I chose to use a case statement because it’s easy to read and modify.

You must use caution when updating statistics using the FULLSCAN option on large tables because it’s performance-intensive and comes with the inherent potential for blocking while statistics are gathered. Large tables that have millions of rows might need only a small percentage to sample for proper statistics. That’s why I based the update statistics sample percentage on the number of rows in a specific table.

The scale I use is based on what I know works given the data and high transactional volume in my environment. Although you can use the scale right out of the box, the frequency and the degree to which you update your statistics is dependent on your environment. I encourage you to explore and tweak the conditions to suit your needs.

IntelligentStatsUpdater’s Input Parameters

The IntelligentStatsUpdater stored procedure includes several features that are controlled by input parameters to target statistics and databases and limit the impact on the server. The first input parameter is @DaysOlderOutDated. This is the number of days equal to or after which you consider statistics to be out-of-date. For instance, if you assign @DaysOlderOutDated a value of 7, any statistics with a date that’s seven days or older than the current date will be updated. If you want to target or exclude specific databases on every run, @DBInclude and @DBExclude, which are comma-delimited strings of databases to either include or exclude, can be very helpful.

The next few parameters are important to the impact of the overall update process. The @MaxStatsToUpdate value specifies the maximum number of statistics you want to update in the current execution. For example, I have 4,246 statistics that can be updated on any given run. To minimize the procedure’s effect on my server and to have the procedure complete in a reasonable amount of time, I limit @MaxStatsToUpdate to 1,000. In addition, you can use the @WaitForMinutes and @WaitForSeconds parameters to provide a delay between update executions, giving your server a little time to breathe.

It’s important to note that you have to be careful when assigning @MaxStatsToUpdate a value. I run IntelligentStatsUpdater via a scheduled job that I included in CreateJob_IntelligentStatsUpdater.sql, which you’ll find in the 103405.zip file. By default, I set the scheduled job to run every two days. If your environment has several thousand statistics like mine and you were to, for instance, run the job only once a week with @MaxStatsToUpdate set to 1,000 and @DaysOlderOutDated set to 7, you would only ever be updating the first 1,000 statistics every week. If you use @MaxStatsToUpdate, make sure you have appropriately set the job schedule and the @DaysOlderOutDated value.

If you want to run IntelligentStatsUpdater via a scheduled job, simply run the CreateJob_IntelligentStatsUpdater.sql script that’s in the 103405.zip file after running IntelligentStatsUpdater.sql to create the IntelligentStatsUpdater stored procedure. Be sure to change the references to the database named \\[Common\\] in both scripts to the name of the database in which you’ll create the IntelligentStatsUpdater stored procedure.

Messaging and Debugging Parameters

I always prefer to see what a procedure is going to do before it actually does the work. If you set @debug to 1, you’ll see a verbose output of data from both SELECT and PRINT statements. If you set @PrintOutput to 1, you’ll see just the printed UPDATE statements as they’re being executed. You can’t have both @debug and @PrintOutput set to 1 at the same time.

The procedure uses a table named UpdateStatsProcessList to keep track of anything it processes. (Note that I use a database called \\[Common\\] for all of my maintenance scripts, so you’ll want to modify IntelligentStatsUpdater.sql and CreateJob_IntelligentStatsUpdater.sql to replace all references to Common.dbo.UpdateStatsProcessList with the database name in your environment.) This table gives you a list of each database, table, index, and statistic targeted for update, as Figure 2 shows.

It also gives you the row count of each table and the last updated date of each statistic, and it lets you know whether a statistic was chosen for update via the ProcessStatus column. The ProcessStatus values are P for processed, R for ready, and I for ignore. Anything greater than the @DaysOlderOutDated calculated date, as well as tables with zero rows, will have a ProcessStatus value of I.

The final two parameters are @BuildStatsTableDataOnly and @KeepStatsTableData. These parameters are used for debugging and can’t both be set to 1 at the same time. @BuildStatsTableDataOnly executes the procedure and creates the UpdateStatsProcessList output table. It’s useful if you want to see the last update date on all statistics. You use @KeepStatsTableData in the event that you executed the procedure with a @MaxStatsToUpdate value less than the total number of statistics and simply want to process the next batch count of @MaxStatsToUpdate without having to recalculate the table. Keep in mind that all data collected, such as row counts and statistics dates, are from DMVs and system tables on which querying shouldn’t have a noticeable impact on your system.

Update Statistics with Little Impact

With IntelligentStatsUpdater, all statistics are kept up-to-date with as little impact on your SQL Server system as possible. I have been running IntelligentStatsUpdater for several months now in my transactionally active environment, which demands near perfect uptime, to ensure that it performs well in a very busy environment. Almost immediately I noticed an improvement in performance. I hope it benefits your environment as it has mine. Happy updating!

Discuss this Article 7

Good info! Thanks for sharing this process! I'm definitely going to look into using this.

Quick question for Brian: is this process geared towards a specific version? I was assuming 2K5 and 2K8, but I have some 2K databases that could benefit from this as well. I guess the easy way to find out is to give the sample code a trial run against each version.

Good info! Thanks for sharing this process! I'm definitely going to look into using this.

Quick question for Brian: is this process geared towards a specific version? I was assuming 2K5 and 2K8, but I have some 2K databases that could benefit from this as well. I guess the easy way to find out is to give the sample code a trial run against each version.

From the Blogs

My initial goal in writing this series of posts was to outline some of the concerns surrounding Availability Groups (AGs) and SQL Server Agent Jobs – and call out how there is virtually no guidance from Microsoft on this front and then detail some of the pitfalls and options available for tackling this problem domain. I initially expected this series of posts to have between 25 and 30 posts – according to some of the early outlines I created ‘way back when’....More

Throughout this series of posts I’ve taken a somewhat pessimistic view of how SQL Server Agent jobs are managed within most organizations – meaning that most of the code and examples I’ve provided up until this point were based on assumptions about how CHANGE to jobs is managed. That pessimism, to date, has come in two forms:...More

In this series of posts I’ve called out some of the concerns related to SQL Server AlwaysOn Availability Groups and their interaction with SQL Server Agent jobs – both in the form of Batch Jobs (see post #3) and backups....More