SQL Index

Sometimes something very small or a common error which we observe in daily life teaches us new things. SQL Server Expert Sandip Pani (winner of Joes 2 Pros Contests) has come across similar experience. Sandip has written a guest post on an error he faced in his daily work. Sandip is working for QSI Healthcare as an Associate Technical Specialist and have more than 5 years of total experience. He blogs at SQLcommitted.com and contribute in various forums. His social media hands are LinkedIn, Facebook and Twitter.

Once I faced following error when I was working on performance tuning project and attempt to create an Index.

Mug 1913, Level 16, State 1, Line 1 The operation failed because an index or statistics with name ‘Ix_Table1_1′already exists on table ‘Table1′.

The immediate reaction to the error was that I might have created that index earlier and when I researched it further I found the same as the index was indeed created two times. This totally makes sense. This can happen due to many reasons for example if the user is careless and executes the same code two times as well, when he attempts to create index without checking if there was index already on the object. However when I paid attention to the details of the error, I realize that error message also talks about statistics along with the index. I got curious if the same would happen if I attempt to create indexes with the same name as statistics already created. There are a few other questions also prompted in my mind. I decided to do a small demonstration of the subject and build following demonstration script.

The goal of my experiment is to find out the relation between statistics and the index. Statistics is one of the important input parameter for the optimizer during query optimization process. If the query is nontrivial then only optimizer uses statistics to perform a cost based optimization to select a plan. For accuracy and further learning I suggest to read MSDN.

Now let’s find out the relationship between index and statistics. We will do the experiment in two parts. i) Creating Index ii) Creating Statistics

We will be using the following T-SQL script for our example.

IF (OBJECT_ID('Table1') IS NOT NULL)DROP TABLE Table1
GOCREATE TABLE Table1(Col1 INT NOT NULL,Col2 VARCHAR(20) NOT NULL)GO

We will be using following two queries to check if there are any index or statistics on our sample table Table1.

When I ran above two scripts on the table right after it was created it did not give us any result which was expected.

Now let us begin our test.

1) Create an index on the table

Create following index on the table.

CREATE NONCLUSTERED INDEX Ix_Table1_1 ON Table1(Col1)GO

Now let us use above two scripts and see their results.

We can see that when we created index at the same time it created statistics also with the same name.

Before continuing to next set of demo – drop the table using following script and re-create the table using a script provided at the beginning of the table.

DROP TABLE table1
GO

2) Create a statistic on the table

Create following statistics on the table.

CREATE STATISTICS Ix_table1_1 ON Table1 (Col1)GO

Now let us use above two scripts and see their results.

We can see that when we created statistics Index is not created. The behavior of this experiment is different from the earlier experiment.

Clean up the table setup using the following script:

DROP TABLE table1
GO

Above two experiments teach us very valuable lesson that when we create indexes, SQL Server generates the index and statistics (with the same name as the index name) together. Now due to the reason if we have already had statistics with the same name but not the index, it is quite possible that we will face the error to create the index even though there is no index with the same name.

A Quick Check

To validate that if we create statistics first and then index after that with the same name, it will throw an error let us run following script in SSMS. Make sure to drop the table and clean up our sample table at the end of the experiment.

SQL Server has lots of things to learn and share. It is amazing to see how people evaluate and understand different techniques and styles differently when implementing. There are three different instances where I have come across a situation where I felt that proper understanding is important and something which looks evil may not be evil as suggested. The real reason may be absolutely different but we may blame something totally different for the incorrect results.

Scenario 1: Database Tuning Advisor and Incorrect Results

One of my friends called me he was stressed out. He just ran a few of Database Tuning Advisor on his system and followed up a few of the suggestions by DTA and suddenly his applications and reporting server started to show incorrect results. He got stressed out and reverted every single change he had done so far. Though he was getting better performance his results were not as per expectation. Finally after quick research we figured out that it was not DTA’s advice but something else which was creating this scenario.

Scenario 2: No Index for Correct Results

One of the place where I went for consultation for performance tuning. I realize that there was absolutely no indexes created. When I discussed this with them, they told me creating indexes is giving incorrect results to them. This was very funny to me. I have heard everything before but this was kind of first time for me. However, with carefully investigating I got the idea what they were suggesting and later on solution was reached.

Scenario 3: Top and Changing Results

This one is my own experience when I was beginning my career. Just like every beginner I was not sure about every single command of SQL Server as well I had no idea how the engine behaves internally. I was under the misconception that TOP changes how the results are displayed. I was very much scared of to use TOP in my queries but after a while I understood that it was not TOP which was misbehaving but my knowledge is incomplete. I eventually figured out how to use TOP clause and get expected results.

The Real Issue

The real issue in the above three cases is not using ORDER BY clause for the query. While writing query I have faced situations where one result can be achieved by multiple ways. It is our responsibility that we select the result which is deterministic and consistent. We should opt for the way which maintains database integrity as well, gives optimal performance.

In all the above three scenario the issue was developer was not writing script which was consistent with his understanding. The SQL Server Engine was behaving how it should behave but lack of understanding was building different expectation from the query.

Reproduction of Scenarios

Following T-SQL script is doing the same. First let us create a table with several records.

Now let us create index over one of the columns and retrieve records. You will find the result is inconsistent than before or different than what we have received. (Note that I did not use word INCORRECT as it is not incorrect result, but different for sure)

Now let us use ORDER BY clause and you will see that result is as per expected.

SELECT TOP 3 NameFROM TestTableORDER BY ID
GO

Let us Clean up.

DROP TABLE TestTable
GO

In our sample reproduction I have selected one single column and single column index. In above scenario we had much difficult scenarios and complex cases. The major point of this blog post is to make sure that developers understand the importance of the ORDER BY clause.

Summary

When we are beginning our career it is quite easy get confused and not understanding the impact of code which we are writing. Quite often we build prejudice towards part of technology which should not be the case as we progress in our career. In this example we are trying to drive home the point that when you use TOP and if you are expecting your results in certain order all the time uses ORDER BY along with it.

What do you do when you are in a hurry and hear someone say things which you do not agree or is wrong? Well, let me tell you what I do or what I recently did.

I was walking by and heard someone mentioning “Columnstore Index are really great as they are using Batch Mode which makes them seriously fast.” While I was passing by and I heard this statement my first reaction was I thought Columnstore Index can use both – Batch Mode and Row Mode. I stopped by even though I was in a hurry and asked the person if he meant that Columnstore indexes are seriously fast because they use Batch Mode all the time or Batch Mode is one of the reasons for Columnstore Index to be faster. He responded that Columnstore Indexes can run only in Batch Mode.

However, I do not like to confront anybody without hearing their complete story. Honestly, I like to do information sharing and avoid confronting as much as possible. There are always ways to communicate the same positively. Well, this is what I did, I quickly pull up my earlier article on Columnstore Index and copied the script to SQL Server Management Studio. I created two versions of the script. 1) Very Large Table 2) Reasonably Small Table. I a query which uses columnstore index on both of the versions. I found very interesting result of the my tests.

I saved my tests and sent it to the person who mentioned about that Columnstore Indexes are using Batch Mode only. He immediately acknowledged that indeed he was incorrect in saying that Columnstore Index uses only Batch Mode. What really caught my attention is that he also thanked me for sending him detail email instead of just having argument where he and I both were standing in the corridor and neither have no way to prove any theory.

Here is the screenshots of the both the scenarios.

1) Columnstore Index using Batch Mode

2) Columnstore Index using Row Mode

Here is the logic behind when Columnstore Index uses Batch Mode and when it uses Row Mode. A batch typically represents about 1000 rows of data. Batch mode processing also uses algorithms that are optimized for the multicore CPUs and increased memory throughput. Batch mode processing spreads metadata access costs and overhead over all the rows in a batch. Batch mode processing operates on compressed data when possible leading superior performance.

Here is one last point – Columnstore Index can use Batch Mode or Row Mode but Batch Mode processing is only available in Columnstore Index. I hope this statement truly sums up the whole concept.

Columnstore Index is one of my favorite enhancement in SQL Server 2012. A columnstore index stores each column in a separate set of disk pages, rather than storing multiple rows per page as data traditionally has been stored. In case of the row store indexes multiple pages will contain multiple rows of the columns spanning across multiple pages. Whereas in case of column store indexes multiple pages will contain (multiple) single columns. Columnstore Indexes are compressed by default and occupies much lesser space than regular row store index by default.

One of the very common question I often see is need of the list of columnstore index along with their size and corresponding table name. I quickly re-wrote a script using DMVs sys.indexes and sys.dm_db_partition_stats. This script gives the size of the columnstore index on disk only. I am sure there will be advanced script to retrieve details related to components associated with the columnstore index. However, I believe following script is sufficient to start getting an idea of columnstore index size.

If you are regular reader of this blog, you must be aware of that there are two kinds of blog posts 1) I share what I learn recently 2) I share what I learn and request your participation. Today’s blog post is where I need your opinion to make this blog post a good reference for future.

Background Story

Recently I came across system where users have changed the name of the few of the table to match their new standard naming convention. The name of the table should be self explanatory and they should have explain their purpose without either opening it or reading documentations. Well, not every time this is possible but again this should be the goal of any database modeler. Well, I no way encourage the name of the tables to be too long like ‘ContainsDetailsofNewInvoices’. May be the name of the table should be ‘Invoices’ and table should contain a column with New/Processed bit filed to indicate if the invoice is processed or not (if necessary). Coming back to original story, the database had several tables of which the name were changed.

Story Continues…

To continue the story let me take simple example. There was a table with the name ‘ReceivedInvoices’, it was changed to new name as ‘TblInvoices’. As per their new naming standard they had to prefix every talbe with the words ‘Tbl’ and prefix every view with the letters ‘Vw’. Personally I do not see any need of the prefix but again, that issue is not here to discuss. Now after changing the name of the table they faced very interesting situation. They had few indexes on the table which had name of the table. Let us take an example.

Old Name of Table: ReceivedInvoice
Old Name of Index: Index_ReceivedInvoice1

Here is the new names

New Name of Table: TblInvoices
New Name of Index: ???

Well, their dilemma was what should be the new naming convention of the Indexes. Here is a quick proposal of the Index naming convention. Do let me know your opinion.

Here ColumnName is the column on which index is created. As there can be only one Primary Key Index and Columnstore Index per table, they do not require ColumnName in the name of the index. The purpose of this new naming convention is to increase readability. When any user come across this index, without opening their properties or definition, user can will know the details of the index.

T-SQL script to Rename Indexes

Your Contribute Please

Well, the organization has already defined above four guidelines, personally I follow very similar guidelines too. I have seen many variations like adding prefixes CL for Clustered Index and NCL for Non-clustered Index. I have often seen many not using UX prefix for Unique Index but rather use generic IX prefix only.

Now do you think if they have missed anything in the coding standard. Is NCI and CI prefixed required to additionally describe the index names. I have once received suggestion to even add fill factor in the index name – which I do not recommend at all.

What do you think should be ideal name of the index, so it explains all the most important properties? Additionally, you are welcome to vote if you believe changing the name of index is just waste of time and energy.

Note: The purpose of the blog post is to encourage all to participate with their ideas. I will write follow up blog posts in future compiling all the suggestions.

As you know I have been writing on Columnstore Index for quite a while. Recently my friend Vinod Kumar wrote about SQL Server 2012: ColumnStore Characteristics. A fantastic read on the subject if you have yet not caught up on that subject. After the blog post I called him and asked what should I write next on this subject. He suggested that I should write on DMV script which I have prepared related to Columnstore when I was writing our SQL Server Questions and Answers book. When we were writing this book SQL Server 2012 CTP versions were available. I had written few scripts related to SQL Server columnstore Index. I like Vinod’s idea and I decided to write about DMV, which we did not cover in the book as SQL Server 2012 was not released yet. We did not want to talk about the product which was not yet released.

The first script which I had written was with DMV – sys.column_store_index_stats. This DMV was displaying the statistics of the columnstore indexes. When I attempted to run it on SQL Server 2012 RTM it gave me error suggesting that this DMV does not exists.

I was pretty confident that this DMV was available when I had written the scripts. The next reaction was to type ‘sys.’ only in SSMS and wait for intelisense to popup DMV list. I scrolled down and noticed that above said DMV did not exists there as well.

Now this is not bug or missing feature. This was indeed something can happen because the version which I was practicing was early CTP version. If you go to the page of the DMV here, it clearly stats notice on the top of the page.

This documentation is for preview only, and is subject to change in later releases.

Now this was not alarming but my next thought was if this DMV is not there where can I find the information which this DMV was providing. Well, while I was thinking about this, I noticed that my another friend Balmukund Lakhani was online on personal messenger. Well, Balmukund is “Know All” kid. I have yet to find situation where I have not got my answers from him. I immediately pinged him and asked the question regarding where can I find information of ‘column_store_index_stats’. His answer was very abrupt but enlightening for sure. Here is our conversation:

Pinal: Where can I find information of column_store_index_stats?Balmukund: Assume you have never worked with CTP before and now try to find the information which you are trying to find.

Honestly it was fantastic response from him. I was confused as I have played extensively with CTP versions of SQL Server 2012. Now his response give me big hint. I should have not looked for DMV but rather should have focused on what I wanted to do. I wanted to retrieve the statistics related to the index. In SQL Server 2008/R2, I was able to retrieve the statistics of the index from the DMV – sys.dm_db_index_usage_stats. I used the same DMV on SQL Server 2012 and it did retrieved the necessary information for me.

Here is the updated script which gave me all the necessary information I was looking for. Matter of the fact, if I have used my earlier SQL Server 2008 R2 script this would have just worked fine.

I think there was a time when lots of questions were coming via either email or blog comments. Nowadays, the trend seems to change. Most of the question I receive is through social media. Here is the latest question I received through Twitter. The best or worst part of Twitter is that it allows only 140 characters, so I’ve noticed that a question is easy to ask on Twitter, but an answer is difficult to provide using this social network.

“How do I know if columnstore index is used by query through execution plan?“

Well, my reaction was simple – “From Operators“. See, I managed to answer this in two words only. However, I was not satisfied with my own answer. The questioner did not follow up after this question but I personally did not like my answer. I felt that if he was talking about execution plan, he is indeed aware of operators. If he was aware of operators, my answer was no use to him. Suddenly, I thought I was arrogant (or at least not clear). I sent him a direct message that I would write a blog post tomorrow to explain how to figure out if columnstore index is used or not via execution plan.

If you are interested in Columnstore Index, read following related posts on my blog for additional details:

During yesterday’s evening, I asked a very simple question on my Facebook Page. The question was written in a jiffy and in a very light mood. While writing the question, I left a few things out, and the question did miss a few details about setup. However, as the question was not complete, it created an extremely interesting conversation in the following thread.

Here is the question: Write a select statement using a single table, using single table single time only without using join keywords, which generate execution plan with 2 join operators. Use AdventureWorks as a sample database.

I got many interesting answers to the question I posted. I must say that I learned something new from all these answers. Before I discuss my own answer, let me show the answer provided by Alphonso Jones.

When I saw this answer – I was very happy because I did not visualize it as a solution when I was asking the question. Here is the execution plan of the T-SQL code above. It’s easy to see that there are multiple joins because of the Partition Function used in the query. What an excellent participation by Alphonso Jones.

Click to Enlarge

Here is the answer which I had visualized when I asked the question. I was running the query on AdventureWorks database and executed the following query, which in turn, generated an execution plan with multiple joins:

Look at the execution plan of the above query. You can see the joins even though I am using single table and there is no join syntax in the query.

Click to Enlarge

Personally, I liked the solution of Alphonso Jones as his solution will always generate multiple joins due to Partition Function. On the other hand, my solution is a bit tricky for it requires Indexes on the table [Purchasing].[PurchaseOrderHeader], which generates index intersection. Index Intersection is a technique which utilizes more than one index on a table to satisfy a given query.

Storing data non-contiguously on disk is known as fragmentation. Before learning to eliminate fragmentation, you should have a clear understanding of the types of fragmentation. When records are stored non-contiguously inside the page, then it is called internal fragmentation. When on disk, the physical storage of pages and extents is not contiguous. We can get both types of fragmentation using the DMV: sys.dm_db_index_physical_stats.

Here is the generic advice for reducing the fragmentation. If avg_fragmentation_in_percent > 5% and < 30%, then use ALTER INDEX REORGANIZE: This statement is replacement for DBCC INDEXDEFRAG to reorder the leaf level pages of the index in a logical order. As this is an online operation, the index is available while the statement is running. If avg_fragmentation_in_percent > 30%, then use ALTER INDEX REBUILD: This is replacement for DBCC DBREINDEX to rebuild the index online or offline. In such case, we can also use the drop and re-create index method.(Ref: MSDN)

Here is quick video which covers many of the above mentioned topics.

While Vinod and I were planning about Indexing course, we had plenty of fun and learning. We often recording few of our statement and just left it aside. Afterwords we thought it will be really funny Here is funny video shot by Vinod and Myself on the same subject:

Have you ever faced a situation where you see something working but you feel it should not be working? Well, I had similar moments a few days ago. I knew that SQL Server 2008 supports online indexing. However, I also knew that I could not rebuild index ONLINE if I used VARCHAR(MAX), NVARCHAR(MAX) or a few other data types. While I was strongly holding on to my belief, I came across with that situation where I had to go online and do a little bit of reading at Book Online.

Here is an example showing the situation I’ve gone through:

First of all, run the following code in SQL Server 2008 or SQL Server 2008 R2.

Then, run the same code in SQL Server 2012 version. Observe the difference between both of the executions. You will be getting the following resultset.

In SQL Server 2008/R2, it will throw the following error:

Msg 2725, Level 16, State 2, Line 1An online operation cannot be performed for index ‘IX_TestTable_Cols’ because the index contains column ‘SecondCol’ of data type text, ntext, image, varchar(max), nvarchar(max), varbinary(max), xml, or large CLR type. For a non-clustered index, the column could be an include column of the index. For a clustered index, the column could be any column of the table. If DROP_EXISTING is used, the column could be part of a new or old index. The operation must be performed offline.

In SQL Server 2012, it will run successfully and will not throw any error.Command(s) completed successfully.

I always thought it will throw an error if there is VARCHAR(MAX) or NVARCHAR(MAX) used in the table schema definition. When I saw this result, it was made clear to me that it would surely not be a bug enhancement in SQL Server 2012. As a matter of fact, I always wanted this feature to be added in SQL Server Engine as this would enable ONLINE Index Rebuilding for mission critical tables that need to be online always. I quickly searched online and landed on Jacob Sebastian’s blog where he blogged about it this subject.

Well, is there any other new feature in SQL Server 2012 which gave you a good surprise?

Like this:

Post navigation

Community Initiatives

About Pinal Dave

Pinal Dave is a Pluralsight Developer Evangelist. He has authored 11 SQL Server database books, 17 Pluralsight courses and have written over 3200 articles on the database technology on his blog at a http://blog.sqlauthority.com. Along with 11+ years of hands on experience he holds a Masters of Science degree and a number of certifications, including MCTS, MCDBA and MCAD (.NET). His past work experiences include Technology Evangelist at Microsoft and Sr. Consultant at SolidQ. Follow @pinaldave
Send Author Pinal Dave
an email at pinal@sqlauthority.com

Email Subscription

Enter your email address to subscribe to this blog and receive notifications of new posts by email.