As discussed previously, an index entry within a Secondary Index on an Index Organized Table (IOT) basically consists of the indexed column(s) and the Logical Rowid, the PK column(s) and a “guess” to the physical block in the IOT containing the corresponding row.

Let’s discuss this “guess” component in a bit more detail.

When the Secondary Index is created, this guess is spot on and will indeed point to the correct block within the IOT structure that contains the row being referenced by the indexed entry.

As we can see, the PCT_DIRECT_ACCESS value is 100, which means that 100% of all the guess components are correct. Therefore, the index behaves in a manner very similar to an ordinary Secondary Index with a rowid, in that all the initial accesses to the IOT are valid and there’s no need to subsequently re-access the IOT via the PK component. From the perspective of finding the required row entries with the IOT structure, the Secondary Index is as efficient as possible when all the guesses are valid.

If we run a little query to access a number of rows via this Secondary Index:

We notice that the PCT_DIRECT_ACCESS value has dropped significantly to just 58%. This means that only 58% of the guesses are now accurate and that in the other 42% of cases, Oracle is forced to now re-access the IOT again via the PK component stored in the Secondary Indexes. This results in additional consistent gets now likely being required to access the IOT via the index, resulting in a less efficient index.

We can see that approximately doubled the number of rows are now returned (184 from 99 rows). However, the number of consistent gets has increased by approximately 4 fold (from 110 to 463). The index is now not as efficient in retrieving rows as it was previously, requiring now some 2.5 consistent gets per row being accessed.

To fix these guesses and make the index more efficient again, one can either ALTER the index with the REBUILD or the UPDATE BLOCK REFERENCES clause:

The consistent gets have now reduced substantially from 463 to just 202, back at a touch over 1 consistent get per row retrieved …

So, IOT Secondary Indexes can be as efficient as normal secondary indexes, but if the IOT is subject to 50-50 block splits, they’ll likely need to be maintained more regularly to ensure they stay nice and efficient. Another significant disadvantage associated with IOTs that have Secondary Indexes …

There’s a bit more I’ll like to say on the subject but I’ll leave it for a Part II :)

As discussed previously, one of the nice features of an IOT Secondary Index is that it contains the mandatory Primary Key of the IOT, which is always maintained and can be used to access the necessary rows of the IOT regardless of row movement within the IOT itself.

This can also be beneficial if only the PK columns of the table are required by the query (perhaps as part of a join) as a visit to the actual IOT table would be unnecessary.

However, one of the main disadvantages of an IOT Secondary Index is that it contains the PK of the IOT :)

Remember, one of the benefits of an IOT is that we don’t have to store columns twice as we would with a conventional Heap table, that being once within the table structure and again within the (often overloaded) PK index. However, with an IOT Secondary index, we must store the PK columns again. In fact, we have to re-store the PK columns again for as many IOT Secondary indexes we have defined for the IOT.

So the overall additional overheads we’re talking about here becomes a product of two important factors. The actual overall size of the PK column(s) and the number of Secondary Indexes we have defined on the IOT. If the average size of the PK is large and/or we have a number of Secondary Indexes, then the overall overheads can be significant, reducing the benefits of the IOT.

If we look at the size of the IOT Secondary Index I created in my previous introductory post:

Above is the Heap Table Secondary Index example, which is only 12 bytes in length.

The 8 bytes required for the table header and physical “guess” overheads within the IOT Secondary Index are almost cancelled out by the 7 bytes of overhead required for the ROWID column within the Heap Table Secondary index. However, most of the difference in length (20 bytes vs. 12 bytes) can be attributed to the 7 bytes required to store the PK columns and their associated length bytes in this example.

Obviously, the larger the PK, the greater the associated overheads. Obviously, the greater the number of IOT Secondary indexes, again the greater the overall associated overheads.

If we create a secondary index on a column that forms part of the PK, Oracle can be a lit bit cleverer. Following, we create an index on the COUNTRY_ID column, which is the second column of our PK (album_id, country_id):

We notice that for this new index, Oracle has eliminated “redundant” PK columns from the secondary index, as there’s no need to store the entire PK again as the indexed column itself already forms part of the PK:

A quick look at a block dump of this secondary index will confirm that the PK portion of the index entry only contains the PK columns that are not included in the indexed column list (i.e. just the ALBUM_ID column):

col 0 represents the indexed column (COUNTRY_ID) which happens to be part of the PK

col 1 is the remaining PK column yet to be defined in the index entry (ALBUM_ID)

col 0 (with a length of 4) represents the physical “guess”.

So Oracle still has defined within the index entry the full PK to access the IOT as necessary if the “guess” proves to be wrong.

The key points to take from this post is to fully consider the consequences of a large PK on any defined secondary index on an IOT and to fully consider the suitability of having the table defined as an IOT if you require many secondary indexes to be defined on the table.

Man, its been ages since I had free time to update the blog, what with birthday parties to organise, Roger Water concerts to attend and Radiohead concerts in the planning !! OK, time to take an initial look at Secondary Indexes for Index Organized Tables (IOTs).

If the IOT needs to be accessed via the Primary Key (PK) column(s), then no problem, the IOT structure must have a PK defined and the logical structure of the IOT ensures that data within the IOT is ordered based on the PK. Therefore, the IOT can be navigated like any conventional PK and the necessary data can be efficiently accessed.

But what if we want to access the data efficiently via Non-PK columns or without specify the leading column of the PK ? Can we create secondary indexes on a IOT ?

When IOTs were first introduced way back in Oracle8, secondary indexes weren’t supported (they came later in 8i). That’s likely due to the fact Oracle had to resolve a tricky issue in relation to indexing an IOT structure, that being what to do when indexing rows that potentially move around all the time ?

With a conventional Heap table, once a row is inserted into the table, it doesn’t generally subsequently move. There are relatively few examples of when this occurs, for example updating the partitioned column of a row such that it needs to be stored in another partition. This is recognised as a rather expensive thing to do as not only do at least two blocks need to be accessed and modified but it also requires associated indexes to be updated as well. As such, it generally requires explicitly allowing such activities to occur (by enabling row movement and the such). Note, when rows migrate to another block due to an increase in row size, indexes are not impacted and still reference the original block and the remaining stub of the row which points to the new block/location of the row.

But with IOTs, the story can be very different. When a 50-50 index block split occurs, roughly half the rows in the leaf block move to a new block. A relatively expensive operation would be even more expensive if Oracle had to also update the index entries of all secondary indexes that referenced all these moved rows. Although rare with Heap tables, rows moving to new locations could be relatively common in an IOT due to associated 50-50 block split operations.

To deal with the difficulties of frequently moving rows within an IOT, Oracle created the IOT Secondary Index structure. It has three main components:

The indexed column values

The PK columns of the associated IOT

A “guess” that points to the physical location of the rows within the IOT, initially at the time the index is created

So the IOT Secondary Index is used in the following fashion. During an index scan, Oracle attempts to use the “guess” to access the block that was the last known physical location of the row within the IOT. If it finds the required row in the IOT, great. The index performs in a similar manner to using a rowid with a conventional secondary index. However, if the required row is nowhere to be seen within the referenced block, Oracle tries again, this time using the PK value contained with the IOT Secondary Index to perform a Unique Scan of the IOT. This is a little more expensive to perform as it requires navigating down the branch structures of the IOT, but is at least guaranteed to find the row this time in its current location.

So in the best case scenario, the index performs similar to that of a normal secondary index. In the worst case scenario where the row has moved, the index is forced to perform an additional Unique Scan of the IOT using the PK but at least this has the potential to be much more efficient that a Fast Full Scan of the IOT in order to find the necessary row.

The key point to note here is that the secondary index is not updated when a block split on the parent IOT occurs. The “guess” via the physical pointer reference simply becomes stale and the PK which is also stored within the secondary index is used as a backup method of accessing the required row.

If we start with a traditionally simple little demo, let’s first create and populate an IOT:

We notice that the index is used as expected and the number of consistent gets has dropped significantly. Notice also that the IOT is accessed subsequently not via Index ROWIDs but by a INDEX UNIQUE SCAN via the IOT PK. More on this later …

If we look at a partial block dump of an index entry within the IOT Secondary index:

I’ve recently returned from a great two-week holiday, firstly at the Australian Open Tennis (what a final !!) and then up at the Gold Coast in not quite so sunny Queensland. Time now to get back to my blog :)

In my previous IOT examples, we had a very large column called Description which we didn’t really want to store within the Index Organized Table as it would cause the resultant index structure to get very inflated and inefficient. All the rows contained a very large Description value so it never made sense to include the Description column within the IOT.

In the following example, the Description column has values of varying lengths. Some of the values remain very large, however many of the Description values are quite moderate in size and wouldn’t be problematic to store within the IOT. Indeed, it would be quite beneficial as it wouldn’t be necessary to perform additional I/Os to the Overflow segment in cases where the Description was quite small in size and required by the application.

PCTTHRESHOLD gives us more flexibility in what is actually stored within the IOT index structure by storing the non-PK columns up to the INCLUDING clause within the IOT but only if the row length to be stored inside the IOT is below a specified percentage threshold of the block size. So with a PCTTHRESHOLD of (say) 5, the non-PK columns up to the INCLUDING clause will be included within the IOT but only if the resultant row size is less than 5% of the blocksize. If a row size were to be greater than the specified percentage threshold of the block size, then any non-PK columns that would violate this length threshold would not be included within the IOT and stored instead within the Overflow segment.

In the following example, every other row is actually quite small and we would want these rows to have the Description value stored within the IOT. Therefore, we have modified the IOT table definition to include the Description column if the resultant row is less than 5% of the (8K in this case) blocksize:

We notice the leaf block contains 251 row entries. Half the rows with a Description of 19 bytes have the Description value stored within the IOT leaf block, while the other half of rows with the larger Description values contain a nrid that refers to the corresponding Description within the Overflow segment.

In my previous post on Index Organized Tables (IOT), I introduced the concept of the IOT Overflow Segment, where we can store columns that we may not want to include within the actual IOT index structure. Before we move on, I just wanted to cover off a few additional points that could be a trap for the unwary …

In my experience, the Primary Key (PK) columns of a table are typically the first columns defined in the table. This has certainly been standard practice in most environments I’ve seen. This makes sense in that the PK are in many ways the “key” column(s) in the table and are identified as such by having the prestigious honour of being the first column(s) defined within the table. Most people look at and intuitively expect the first columns in the table to be the PK columns and for that reason alone, it’s probably good practice to consistently define the PK columns in this manner.

However, there’s also a good argument why having the PK columns as the leading columns in the table is precisely the wrong location for them. As many tables are “primarily” accessed via the PK columns and so accessed directly through the associated PK index, the application already knows the PK values of the row in question. Therefore, it’s somewhat inefficient to then have the PK columns the first columns defined in the table as these generally have to be read through and ignored before we get to the non-PK columns that are of direct interest and the reason for visiting the table block in the first place. By placing the PK columns after the most accessed non-PK columns, we avoid having to unnecessarily read through these PK columns again when accessing the table via the PK index.

I personally prefer to define the PK columns first in a standardised manner, with the advantages of avoiding possible confusion and misunderstandings outweighing any possible performance improvements. However, I can at least see the logic and merit of not following this standard with Heap tables.

The same however can not really be said for IOTs and I would strongly recommend defining the PK columns first in an IOT …

I’m going to run the same demo as I did in my last post on the Overflow Segment, but with one subtle change. I’m not going to define the two PK columns first but rather have them defined after my heavily accessed non-PK column:

So in this example, my leading column is the non-PK total_sales column, followed then by the two PK columns. I still only want these 3 columns to be included in the actual IOT structure, so I have my INCLUDING clause only including columns up to the country_id column. I want the remaining large description column to be stored separately in an Overflow segment.

The SEGMENT_COLUMN_ID column determines the order of the columns as they’re actually stored within the segment and we notice the column order is different. The two PK columns are listed first, with the total_sales column only listed in the 3rd position.

As discussed in the IOT Introduction post, the structure of an index entry in an IOT has the PK columns as the leading columns, following by the non-PK columns in the table portion. This is critical because the PK columns determine the location within the IOT table where new rows need to be inserted and the subsequent ordering of the rows in the table. As such, the PK columns must always be the leading columns of an IOT, despite how the table is actually defined at creation time. If the PK columns are not listed first in the table creation DDL statement, Oracle will automatically re-order the columns and place the PK columns first regardless.

This now has consequences on the INCLUDING clause if specified. In the above table creation statement, the INCLUDING clause specified the country_id column. Although defined as the third column, as it’s a PK column, Oracle has automatically re-ordered the columns such that it’s physically listed as the second column within the IOT segment. Unfortunately the INCLUDING clause is only applied after the re-ordering of the columns and as such, the total_sales column which is now logically listed third and now after the country_id column, is not therefore actually included in the IOT index structure as (perhaps) intended.

As we can see, the IOT row entries only consist of the PK columns and the row reference to the corresponding Overflow segment. None of the non-PK columns (such as total_sales) are co-located within the IOT segment as the table column count is 0 (cc: 0).

As a result, additional consistent gets are now required to fetch the total_sales column from the Overflow segment to satisfy the query. This explains why the query is now less efficient than it was previously.

My recommendation with regard to defining IOTs is to simply list the PK columns first. This will ensure the INCLUDING clause is applied as intended and will generally reduce confusion and misunderstandings. Otherwise, the INCLUDING clause needs to specify a Non-PK column to ensure more than just the PK columns are actually included in the IOT segment, the consequences of which may not be obvious to the casual observer of the DDL or describer of the table.

Jonathan Lewis, a great source of information on indexes and Oracle in general has previously discussed this same IOT Trap on his blog.

As expected, the IOT is quite large as it has to accommodate the very large Description field within the IOT index structure. At 71,429 leaf blocks for the 500,000 rows in the table, that’s just 7 rows on average per leaf block.

The application doesn’t generally access the Description column with the following query typical (Note: to make fetching data as efficient as possible, I’ve set the arraysize to 100):

We can see the leaf block only has 7 rows, with the vast majority of space taken up by the very large Description column.

Considering the Description column is so large and/or that it’s rarely accessed, wouldn’t it be nice if we didn’t have to store this column directly within the IOT index structure itself.

Enter the IOT Overflow segment. The IOT Overflow segment enables us to store in another physical location those columns that we don’t necessarily want to store directly within the IOT index structure. So those columns that might be particularly large (or just the occurrences of those columns when the specific values might be too large to store within the IOT index structure) or those columns that are rarely accessed can be stored elsewhere. Effectively, we’re back to having a separate “table” like structure, but the Overflow segment will only hold those columns that we don’t necessarily want to store within the index structure. Unlike a normal Heap table, in which all columns are stored within the table segment.

There are a number of different methods we could use (to be explored further in future posts), for now I’ll use the INCLUDING clause:

So in the above example, all columns up to and “including” the total_sales column will be included in the IOT index structure. All the following columns listed in the table definition (in this case the Description column) will be store in the Overflow segment, which in the above example will be created within the BOWIE2 tablespace.

We can see the number of index entries in the leaf block has increased from 7 to 322, with the size of the index entry decreasing from 1004 to just 23 bytes. Instead of the Description column being stored within the leaf block, we now have a nrid entry consisting of a 6 byte relative block address and row directory number (0x01800081.0), which effectively points to the actual location of the remaining portion of the row within the Overflow segment. We only therefore have a table column count of 1 (cc:1).

To find out more about the corresponding Overflow segment, we first must determine the OBJECT_ID of the IOT:

We notice the Overflow block contains 7 rows as we would expect, as this was all the IOT segment could previously manage when it had to store the large Description column values.

The table row directory contains 7 rows, with the first row (#0) having an offset at address 0x1bda, which is the actual location of the first row within the Overflow block.

Therefore, in order to find a specific Description column value of interest from the IOT, Oracle references the (say) nrid: 0x01800081.0 within the IOT index entry for the row. This in turns points to the relative block address (0x01800081) of the Overflow block containing the description and the corresponding row directory number (0), which in turn specifies the offset (say) 0x1bda to the actual location of the Description value within the Overflow block. Easy !!

We notice all the rows are listed as “Chained Rows“. This is because all the rows have a corresponding Description value stored in the Overflow segment and so the rows are not stored within the one block. As the previous query illustrated, this is no bad thing if we don’t need to reference these additional columns stored in the Overflow segment. It makes the resultant IOT table more compact and efficient to access.

However, on those (hopefully) rarer occasions when we do need to access the columns in the Overflow segment, this will clearly require additional block accesses:

The above query which returns the Description column results in the consistent gets increasing to 32 consistent gets, from the 5 consistent gets when the Description wasn’t accessed and from the 19 consistent gets from when the Description column was co-located within the IOT segment. But this is a price we might be willing to pay if this query isn’t frequently executed while the frequently executed queries which don’t access the Description column are more efficient.

The Overflow segment gives us in a manner “the best of both worlds”. The ability to store just those columns of interest within the IOT segment (although these must always include all the Primary Key columns) and those that are less often accessed or too large to be efficiently stored within the IOT can be stored elsewhere. Effectively, it’s an index and table relationship except the table doesn’t have to store again the columns that are already stored within the index.

Thought it was high time that I covered in a little detail the subject of Index Organized Tables (IOTs). When used appropriately, they can be an extremely useful method of storing and accessing data. Hopefully by the end of this series, you’ll have a better understanding of IOTs, their respective strengths and weaknesses and so perhaps be in a better position to take advantage of them when appropriate.

As I mentioned in a previous post, Martin Widlake has recently written an excellent series on IOTs, which I highly recommend. I’ll try to cover differing aspects of IOTs that will hopefully be of interest.

To start, let’s cover a very basic little example.

Let’s begin by creating and populating a simple Heap Table that holds information about musical albums (Note using an 8K blocksize in a MSSM tablespace):

As we can see, things are pretty good. 18 consistent gets in order to return 100 rows isn’t bad at all. Clearly, the index has a good Clustering Factor and can retrieve the 100 required rows in a relatively efficient manner.

However, this is a very frequently executed query and we want to do even better. One thing we notice is that we only have a couple of columns in the table which are not part of the index. Perhaps if we included these columns in the index as well, we can then use the index to extract all the required data and thus eliminate the need to visit the table segment at all. Overloading an index in this manner is a common tuning technique and will hopefully reduce the number of required logical I/Os to run the query.

We can do this by dropping and recreating the index with all the columns, making sure the PK columns remain the leading columns. This will ensure the index can still be used to police the PK constraint:

OK, as expected the index is now somewhat larger as it now needs to accommodate the extra columns. The number of overall blocks allocated to the index is 2048, with leaf blocks increasing from 1062 to 2006 leaf blocks.

We notice that each leaf entry is 26 bytes in length. The length of the four columns adds up to 13 bytes. The remaining 13 bytes is basically overhead required for each index entry:

2 bytes for flag and lock information in the index entry header

5 x 1 byte for each of the length bytes for each column

6 bytes for the 5th index column which is the index rowid

So that’s 13 bytes of overhead per index entry in this example index.

Well, everything is currently pretty good. We have the application now performing approximately 40% less work than it was previously. But we have one little issue. With the index now consisting of all the columns in the table and with the application using the index exclusively, what’s the point of now having the table? It’s wasting storage and wasting resources in having to be maintained for no purpose other than having to exist so that the index can in turn exist.

Wouldn’t it be nice if we can somehow just have the index, but without the underlining table. Enter the Index Organized Table (IOT), first introduced way back in Oracle 8.0. It’s basically an index structure that can exist without the need for an underlining table. The index structure itself is the table by which we can store and retrieve the necessary data.

OK, let’s now create a new version of this table with the same data, but this time as an IOT:

We notice it’s smaller than the corresponding overloaded index for the Heap Table. The previous index consisted of 2048 blocks and 2006 leaf blocks but this index is somewhat smaller at just 1664 blocks and 1550 leaf blocks.

If we take a look at a partial block dump of a leaf block from the IOT:

Firstly, we notice it’s definitely an IOT block dump as the IOT flag is set.

The structure of the index entry is somewhat different here. It basically consists of:

2 bytes for lock and flag info in the index header as previously

Next come the two Primary Key columns with their corresponding length bytes. Note an IOT must have a PK defined.

Following are 3 bytes for the table header consisting of a lock byte, flag byte and a byte to denote the number of table (non PK) columns (in this case 2).

Followed finally by the 2 Non-PK columns and their corresponding length bytes.

Note the big missing component here from the previous block dump is that there is no rowid defined with its corresponding length byte. No need for a rowid if there’s no corresponding table to point down to …

So the overall overhead has been reduced to:

2 byes for the index header

3 bytes for the table header

4 bytes for the 4 column lengths

for a total of 9 bytes, 4 less than the 13 bytes overhead required in the previous example. So the total length of an index entry has reduced down from 26 bytes to just 22 bytes. Hence, the overall reduction in the size of the corresponding IOT index.

Not only have we saved ourselves some storage and having to maintain two physical segments, but things are a tad more efficient as well, reducing the number of consistent gets down from 11 to 10 as the corresponding index segment we need to access is smaller …

Enough to start with for now and yes the pun in the title is fully intended :)

Based on the excellent comments in the Quiz post, we have some clever cookies out there :)

I guess the first thing to point out is that based in the basic scenario provided, the index shouldn’t ordinarily be continually growing in this fashion. Although the index values are monotonically increasing, the deletions are leaving behind fully emptied leaf blocks which can generally be recycled in future block splits.

OK, so why is this index behaving in this fashion, continually to increase in size while the number of rows in the table remains relatively constant ?

Well, there are a number of contributing factors.

As stated, the index values are indeed monotonically increasing so all inserts into the index are hitting the right-most index leaf block within the index structure and the deletions which all occur on the “left-side” of the index are leaving behind leaf blocks that contain nothing but deleted index entries. As Marcus and David mentioned in the comments, the use of a Reverse Key index would therefore alleviate this problem as subsequent inserts will be performed within the same leaf blocks in the index structure, automatically cleaning out and reusing space used by previously deleted index entries. This solution though may create as many problems as it solves (if say range predicate statements relied on a Non-Reverse index).

Additionally, the processing is being performed with a PL/SQL block. Oracle has a whole lot of smarts to make PL/SQL as efficient as possible and so the manner in which code runs within PL/SQL compared to other native languages can vary significantly. Unfortunately at times, these smarts might not be quite so smart after all.

The tablespace used to store the index is a Locally Managed Tablespace (LMT) with Automatic Segment Storage Management (ASSM). Instead of freelists (or freelist groups), Oracle uses a set of bitmap blocks within the index segment to determine the amount of free space available within its blocks and whether a block is available for inserts. As Vyacheslav and Alberto highlighted in the comments, there are a number of “issues” in the manner in which these bitmap blocks are maintained within PL/SQL processing. This is effectively locking out the vast number of these now effectively empty leaf blocks from being recycled and reconsidered for subsequent inserts. By rebuilding the index once in a Manual Segment Space Management (MSSM) tablespace would also alleviate this issue.

The actual processing involved within the PL/SQL block can also have an impact. The procedure contained the following important statement:

select min(id),max(id) into n,m from bowie;

By removing this statement from the PL/SQL block and either manually defining the values to be processed or passing them through to the procedure, can impact how PL/SQL manages the freespace bitmaps within the segment. For example, if one used something similar to the following:

Finally, the PL/SQL procedure only performed a Commit after 1000 iterations. This means that there were 1000 deletions performed during a logical transaction. As Greg mentioned in the comments, Unique Key index values can be reused within a single transaction BUT only if the same actual values are reused. The procedure introduces new values for those that have been deleted and so the deleted entries can’t be reused during the same transaction. This means there will be at least 1000 deleted index entries that can’t be reused during the running of the procedure and sufficient additional leaf blocks to accommodate these 1000 index entries will need to be allocated, even if we use some of the solutions mentioned, such as Reverse Key indexes or MSSM tablespaces. By performing either all the necessary deletions within one transaction followed by the inserts or having a Commit for each delete/insert pairing, these additional blocks won’t be required. For example:

Although of course, the inefficiencies in the processing or the potential breaking of business rules may not make the index savings worthwhile.

So in summary, there a number of things we could do to fix this scenario, rather than simply periodically rebuilding the index all the time. Depending on applicability, we could convert the index to a Reverse Key index (or say Hash Partition), we could move the index to a MSSM tablespace, we could modify our procedure logic to remove the reference to fetching MIN/MAX values, not use PL/SQL, or to make the index as efficient as possible, change the transactional logic to not perform large numbers of inserts and deletes within the same transaction.

So there’s quite a lot happening within what on the surface looks like a simple piece of PL/SQL :)

So far, everything is as expected. With have an index with 200,000 rows that currently has 374 leaf blocks.

OK, what we want to do is basically gradually delete the current set of rows and replace them with 200,000 new rows, with ever-increasing Primary Key values. To this end, we create the following procedure:

So the procedure basically determines the current MIN and MAX values of our PK column and gradually deletes the current rows and then inserts new ones. Every 1000 iterations, we commit the changes. Nothing too complex here.

Things have gone even worse. We still only have 200,000 rows in the table but the index now has an additional 147,841 deleted entries and the number of leaf blocks has again increased substantially to 994 leaf blocks.

We notice that the first portion of the index contains leaf blocks with nothing but deleted index entries. The number of rrows is 0 for a vast number of leaf blocks. We also notice that the root block has a rba of 0x100378b 16791435, which is only a few values below some of the rba values of the left most indexes in the index structure (say) 0x100378e 16791438. Therefore, this highlights that even though these left most blocks in the index structure contain nothing but deleted index entries, Oracle is not recycling them as it should do. Oracle is simply adding new blocks to the index structure rather than recycling empty leaf blocks, resulting in the index growing bigger and bigger.

The leaf blocks however at the right most end of the index structure (the second portion of the partial treedump), shows us a nice compact set of leaf blocks with lots of index entries per block (most with 533 per leaf block) and with no deleted index entries (rrows matches the nrows value).

If we run the procedure 10 times in total, we get an index that looks like the following:

We now have 1,125,132 deleted index entries and the index is now over 10 times the original size, up from 374 to a massive 4,136 leaf blocks, even though the table only contains 200,000 rows.

There are a number of contributing factors here :)

The question is why, why is the index behaving in this fashion and what can we do to ensure the index doesn’t grow in this manner and can remain basically the same size as we delete and insert new rows into the table ?

As has been mentioned to me on quite a number of occasions, I’ve haven’t really covered the area of Index Organized Tables (IOTs) on this blog. This despite covering the subject of IOTs in quite some detail in my Index Internals Seminar. Which I readily admit is a bit of a shame as IOTs are often misunderstood little things and I would say are generally way under-utilised in many environments, I would suggest in part because they are so often misunderstood. The only reason I can really give for not discussing them more is that I simply haven’t gotten around to it :)

Well the good news as hinted at late last year, it’s about time I addressed this and started writing a few blog pieces on IOTs.

However, the even better news is that the subject of IOTs have already been covered by Martin Widlake in a truly excellent series of articles on his blog. I would strongly recommend giving them a read as they cover many aspects of IOTs in an interesting and informative manner.

I’ll recap some of the basic principles of IOTs here and then discuss various different aspects (such as the joys of Secondary Indexes on IOTs) in my own unique way :)

So, all that’s to come in the coming weeks.

However, I recently received an index question via email during the Christmas holidays which I thought was worthy of a blog piece or two. Although I often respond to emails, I much prefer to discuss index related “stuff” on the blog so everyone can share in the knowledge exchange. This is an interesting question because it relates to why an index keeps growing, even though the number of rows in the table remains constant. It’s an even more interesting question because there are a number of contributing factors for this impact on the index which I thought were worth sharing.

No more clues, I’ll post the question and the specific scenario in the next few days …

For now, the quick answer to the last quiz is that the second table was indeed an Index Organized Table (IOT).

One of the nice benefits of an IOT is that when re-organised, unlike a Heap Table, all indexes remain valid, even the Secondary Indexes. I’ll explain why in my next post in the next few days. I’ll also explain why secondary indexes are one of the main disadvantages with IOTs as well.

As many have identified, the first thing to point out is that the two queries are not exactly equivalent.

The BETWEEN clause is equivalent to a ‘>= and <=’ predicate, whereas the original query only had a ‘> and <‘ predicate. The additional equal conditions at each end is significant.

The selectivity of the original query is basically costed as (max condition – min condition)/(max column value – min column value). As this equates to 1 day/2000 days, the selectivity of the first query is therefore 0.0005.

The calculation for the BETWEEN clause is different. It’s the above formula PLUS the selectivity of 1 day for the each of the two equal conditions (as the CBO assumes you not only want the values within the range but the specific values on each side of the range).

So that’s a selectivity of 0.0005 for the date range as above plus 2 x 0.0005 for each of the equal conditions = 0.0005 + 0.001 = 0.0015.

1M muliplied by 0.0015 = 1500, the CBO cardinality estimate of the second query.

As many have mentioned in the comments, the Clustering Factor of this index is awful, as each of the distinct 500 occurences of each day is spread evenly throughout the whole table. It’s therefore very expensive to use this index to retrieve a “larger” number of rows.

The cost of the first query that used the index is 505, about as bad as it gets when retrieving 500 rows. If we were to retrieve not 500 but 1500 rows, then the cost would effectively triple and be approximately 1500.

However, the cost of the FTS as highlighted in the second query is 933. This is less than 1500 and so the FTS is prefered by the CBO in the second query.

It all comes down to the relative costs and these all come down to the estimated selectivity of the query and the associated segment (and system) statistics, of which the Clustering Factor of the index is one of the most significant factors of all (no pun intended of course). If we effectively triple the estimated costs of a query as we do with the second query, then this can obviously have an impact on the CBO calculations and the resultant execution plan.

If we were to rewrite the first query to be equivalent to using the BETWEEN:

I have a table that has 1M rows with dates that span 2000 days, all evenly distributed (so there are 500 rows per day for the mathematically challenged). All stats are 100% accurate and I have an index on the date column.

And now everything has gone wrong. I’m still getting the same 500 rows but the CBO is choosing an inefficient FTS. The estimates are now way way off, with the CBO expecting 1500, not 500 rows to be returned.

Indeed, the subtle but significant difference between the two demos was that demo one created the table in a tablespace called USER_DATA with manual segment space management (with freelists/freelist groups set to 1), while demo two created the table in a tablespace called USER_DATA1 with automatic segment space management.

In the first demo, the 3 separate sessions all followed the same freelist and inserted their rows concurrently into the same table blocks, resulting in the table being effectively sorted in ID order.

We notice the Clustering Factor of 2171 is relatively low for an index with 300000 rows, as indeed the order of the rows in the table almost exactly matches the order of the index entries.

In the second demo, ASSM ensures the 3 separate transactions don’t cause contention and insert their rows in a different set of blocks from each other. This is good in that contention is reduced but has the nasty side-effect on now having the resultant rows scattered randomly between different sets of 3 varying blocks. The actual Clustering Factor isn’t particularly bad in that Oracle has to now visit 3 different blocks for a range of values that previously might have been co-located within the 1 block, but because of the manner of which the Clustering Factor is calculated and that it will increase even if forced to visit a block it had just visited a couple of I/O calls beforehand, the calculated Clustering Factor can be appalling.

If we look at the Clustering Factor of the index from the second demo:

We notice the Clustering Factor is now terrible at 271936. It’s a classic example of a table with the data that is relatively well clustered but has an appalling Clustering Factor. If Oracle didn’t increment the Clustering Factor for a block it had only visited a couple of index entries previously, then it would likely have a similar Clustering Factor to the first demo.

But statistics collection doesn’t take this into consideration, it will increment the Clustering Factor even if the block had only just recently been visited (only if it’s the same table block as the previous index entry will the Clustering Factor not increment during stats collection), so hence the terrible Clustering Factor and hence the dramatic difference in how the index is now considered, costed and used by the CBO.

The moral of this story is that if you use ASSM or you use mutliple Freelists/Freelist Groups to avoid contention, seriously consider the impact of the Clustering Factor on indexed columns that would ordinarily have a good Clustering Factor and the impact this in turn may have on your resultant execution plans …