Statistics

I was recently asked is there a single script which can provide all the necessary details about statistics for any database. This question made me write following script. I was initially planning to use sp_helpstats command but I remembered that this is marked to be deprecated in future. Again, using DMV is the right thing to do moving forward. I quickly wrote following script which gives a lot more information than sp_helpstats.

When I created non clustered index on the column city, it also created statistics on the same column with same name as index. When we populate the data in the column the index is update – resulting execution plan to be invalided – this leads to the statistics to be updated in next execution of SELECT. This behavior does not happen on Heap or column where index is auto created. If you explicitly update the index, often you can see the statistics are updated as well. You can see this is for sure happening if you follow the tell of John Sansom.

I agree with John. It is reusing the Execution Plan. Aside from OPTION(RECOMPILE), clearing the Execution Plan Cache before the subsequent tests will also work.

i.e., run this before round 2:
————————————————————–
– Clear execution plan cache before next test
DBCC FREEPROCCACHE WITH NO_INFOMSGS;
————————————————————–

Nice puzzle!

Kevin

As this was puzzle John and Kevin both got the correct answer, there was no condition for answer to be part of best practices. I know John and he is finest DBA around – his tremendous knowledge has always impressed me. John and Kevin both will agree that clearing cache either using DBCC FREEPROCCACHE and recompiling each query every time is for sure not good advice on production server. It is correct answer but not best practice.

By the way, if you have better solution or have better suggestion please advise. I am open to change my answer and publish further improvement to this solution. On very separate note, I like to have clustered index on my Primary Key, which I have not mentioned here as it is out of the scope of this puzzle.

My article on “Important of Statistics” has been published in Solid Quality Journal.

Statistics are a key part of getting solid performance. In this article we will go over the basics of the statistics and various best practices related to Statistics. We will go over various frequently asked questions like when to update statistics and difference between sync and async update of statistics. We will also discuss the pros and cons of the statistics update.

Quite often when I am staring at my SSMS I wonder what is going on under the hood in my SQL Server. I often want to know which database is very busy and which database is bit slow because of IO issue. Sometime, I think at the file level as well. I want to know which MDF or NDF is busiest and doing most of the work. Following query gets the same results very quickly.

When you run above query you will get many valuable information like what is the size of the file as well how many times the reads and writes are done for each file. It also displays the read/write data in bytes. Due to IO if there has been any stall (delay) in read or write, you can know that as well.

I was recently working on a performance tuning project in Dubai (yeah I was able to see the tallest tower from the window of my work place). I had a very interesting learning experience there. There was a situation where we wanted to receive the schema of original database from a certain client. However, the client was not able to provide us any data due to privacy issues. The schema was very important because without having an access to underlying data, it was a bit difficult to judge the queries etc. For example, without any primary data, all the queries are running in 0 (zero) milliseconds and all were using nested loop as there were no data to be returned. Even though we had CPU offending queries, they were not doing anything without the data in the tables. This was really a challenge as I did not have access to production server data and I could not recreate the scenarios as production without data.

Well, I was confused but Ruben from Solid Quality Mentors, Spain taught me new tricks. He suggested that when table schema is generated, we can create the statistics consequently. Here is how we had done that:

Once statistics is created along with the schema, without data in the table, all the queries will work as how they will work on production server. This way, without access to the data, we were able to recreate the same scenario as production server on development server.

When observed at the script, you will find that the statistics were also generated along with the query.

Statistics and Best Practices
Timing: 11:45am-12:45pm
Statistics are a key part of getting solid performance. In this session we will go over the basics of the statistics and various best practices related to Statistics. We will go over various frequently asked questions like a) when to update statistics, b) different between sync and async update of statistics c) best method to update statistics d) optimal interval of updating statistics. We will also discuss the pros and cons of the statistics update. This session is for all of you – whether you’re a DBA or developer!

If you have never attended my session on this subject I strongly suggest that you attend the event as this is going to be very interesting conversation between us. If you have attended this session earlier, this will contain few new information which will for sure interesting to share with all.

I recently received an email that contains a question from one of my readers. I have already replied the answer to his email, but I would still like to bring it to your attention and ask if you think I could have done any better with the example I gave.

A: Rows can logically disappear from an Indexed view based on OUTER JOIN when you insert data into a base table. This makes the OUTER JOIN view to be increasingly updated, which is relatively difficult to implement. In addition, the performance of the implementation would be slower than for views based on standard (INNER) JOIN.

After running this script, you will notice that as the base table gains one row, the result loses one row. Going back to the white paper mentioned earlier, I believe this is expensive to manage for the same reason why it is not allowed in Indexed View.

Let me know if you have a better example to demonstrate this behavior in the Outer Join.

Statistics are one of the most important factors of a database as it contains information about how data is distributed in the database objects (tables, indexes etc). It is quite common to listen people talking about not optimal plan and expired statistics. Quite often I have heard the suggestion to update the statistics if query is not optimal. Please note that there are many other factors for query to not perform well; expired statistics are one of them for sure.

If you want to know when your statistics was last updated, you can run the following query.

USE AdventureWorks
GOSELECT name AS index_name,STATS_DATE(OBJECT_ID, index_id) AS StatsUpdatedFROM sys.indexesWHERE OBJECT_ID = OBJECT_ID('HumanResources.Department')GO

If due to any reason you think that your statistics outdated and you want to update them, you can run following statement. In following statement, I have specified an additional option of FULLSCAN, which implies that the complete table is scanned to update the statistics.

Please note that you should only run update statistics if you think they will benefit your query and when your server is not very busy. If you have auto update “usually” on, the SQL Server takes care of updating stats when necessary. Here, is a quick example where you can see the updating of older statistics using fullscan.

Statistics is very deep subject, and a lot of things can be discussed over it. We will discuss about many other in-depth topics in some other article in future.

In one of the recent projects, I found out that despite putting good indexes and optimizing the query, I could not achieve an optimized performance and I still received an unoptimized response from the SQL Server. On examination, I figured out that the culprit was statistics. The database that I was trying to optimize had auto update of the statistics was disabled.

Once I enabled the auto update of statistics, the database started to respond as expected. If you ever face situation like this, please do the following:

Like this:

Post navigation

Community Initiatives

About Pinal Dave

Pinal Dave is a Pluralsight Developer Evangelist. He has authored 11 SQL Server database books, 17 Pluralsight courses and have written over 3200 articles on the database technology on his blog at a http://blog.sqlauthority.com. Along with 11+ years of hands on experience he holds a Masters of Science degree and a number of certifications, including MCTS, MCDBA and MCAD (.NET). His past work experiences include Technology Evangelist at Microsoft and Sr. Consultant at SolidQ. Follow @pinaldave
Send Author Pinal Dave
an email at pinal@sqlauthority.com

Email Subscription

Enter your email address to subscribe to this blog and receive notifications of new posts by email.