So When Does An Oracle B-Tree Index Increase In Height ? (Almost Grown) April 3, 2008

I’ve basically been asked this same question a number of times over the past few days with regard to the discussions on indexes and different block sized tablespaces, so I thought it might be worth quickly sharing the answer to a wider audience.

Imagine a new, empty table and a corresponding new, empty index. At this stage, the index structure basically consists of one, empty block. The index has a BLEVEL of 0 (from DBA_INDEXES) and a HEIGHT of 1 (from INDEX_STATS), yes it can be confusing ;) This block is basically the Root block of the index as it’s the first (and currently only) block to be accessed during an index scan, but at this stage is used to also store the actual index entries as well (and so can kinda be viewed as being a Leaf block as well).

We now start to insert rows into the table and thus row entries into the index. These index entries basically consist of the indexed column(s) and its corresponding ROWID, and are sorted based on the indexed column values.

Eventually, this single index block will fill; Oracle simply can’t add any more index entries into it. Now comes the fun bit.

When Oracle wants to insert a new index entry but it can’t as this Root index block is full, Oracle will allocate two new index blocks. If the new index entry is the maximum value currently to be indexed, Oracle will move all the index entries from the full block and put it into one of the new index blocks and place the new index entry into the other block. This is known as a 90-10 index block split.

If the new index entry isn’t the maximum value, Oracle will place the lower 1/2 valued index entries into one new block and the other 1/2 into the other new block. This is known as a 50-50 index block split.

These two new blocks are now the new leaf blocks in the index structure.

The contents of the previously single filled block is now totally replaced with pointers to the two new blocks. This block therefore remains the Root block in the index structure. These pointers basically consist of the Relative Block Address (RBA) to the new index blocks and a value which represents the lowest indexed value found in the specific referenced leaf block. These indexed values in the Root block are now used by Oracle as the method by which it can navigate the index structure to find the specific index leaf block containing a required indexed entry.

The index has just increased in height and now has a BLEVEL of 1 and a HEIGHT of 2.

As we continue to add more rows into the table, we add more index entries into our 2 leaf blocks. Eventually they will fill again and will again perform either a 90-10 or 50-50 block split depending on the new index value to be inserted. With a non Root block split, only one additional index block is allocated and the index entries are distributed between the full and new index block. Each time a leaf block splits in a BLEVEL 1 index, a new entry is also added into the Root block to point to the new Leaf block.

Once we have enough Leaf blocks, the Root block will again eventually fill. At this point, Oracle will again allocate two new blocks and distribute the contents of the Root block into these two new blocks, again 90-10 or 50-50 depending on the new indexed value to be inserted. The contents of the Root block is now totally replaced with pointers to these 2 new “Branch” blocks which of course in turn now contain the pointers to the Leaf blocks.

The index has again increased in height and we now have an index with a BLEVEL of 2 and a HEIGHT of 3.

As the leaf blocks continue fill and split, a new entry is added to the corresponding Branch block each time. When these Branch blocks fill and split, a new entry is added to the Root block. When the Root block eventually fills, it will again allocate 2 new blocks and so the index grows in height again.

So basically, an index increases in height whenever the index Root block splits and the two new allocated blocks result in a new level within the index structure. Note the index Root block remains the same throughout the entire life of the index, no matter the index height.

Note also a Root block split is the only time an index increases in height. Therefore, the number of levels between the Root block and any/all of the Leaf blocks is always and must always be the same. Hence, an Oracle B-Tree index is always structurally height balanced, always.

Share this:

Like this:

Related

Just exploring the information in this post, to ensure that I understand the physical implementation here correctly, ignoring any practicalities for the sake of a better understanding;

So, if the number of index entries that can be fitted on a block increases, then the number of potential leaf blocks will increase.

Assuming that the DB block size is increased, this would then increase the number of leaf blocks at any level in the index, meaning two things;

1) An index rebuilt with a larger block size might reduce in level.
2) An index range scan of this rebuilt index would be more costly ( depending on the value being searched for, more blocks might need to be traversed ).

Is my understanding correct?

I’d be interested in a description of the mechanics of an index range scan of an index with multiple levels. Maybe a later blog post?

BTW – if you can get access to Radio 2 or Radio 6 for April 1st, Radiohead played a couple of live radio sessions and were also interviewed. See http://www.bbc.co.uk/musicevents/radiohead/. Looks like there is a free MP3 download of the session at that link.

No, a larger block would allow more index entries per block therefore the number of index blocks would decrease. However, as I explained in the Small Index Scans post, if you don’t decrease the number of blocks you still need to access, or don’t decrease it by a similar magnitude by which you increase the index block size, then you end up reading more index related data, which causes overheads and possibly reduced performance.

I explained in the height reduction 1/2 myth post how a larger block size can (and might not) decrease the index height, but it’s very much dependant on the actual block size increase and again can still result in a significantly larger resource footprint when performing an index scan.

Yes, I will describe the mechanics of an index scan in a later blog entry as it’s important in understanding why larger blocks can hurt rather than improve performance.

Thanks Richard, I agree. If we have the same data in the index and the index is rebuilt with a larger block size then the number of leaf blocks used to index this data would decrease.

This wasn’t quite what I was thinking about. But re-reading my first comment, I’ve realised I have some faulty logic in terms of my mental model of the structure of the index. It might help if I make the time to read your presentation.

The Radiohead gig was excellent – only heard this on the radio. I’ve noted that you may not be able to download the MP3. I’ll check if it’s legal (and feasible) to email over to you. If you want me to send this through, then please drop me an email.

Height – What
1 – 1 Block. Both Root and leaf. No branches.
2 – 1 Root Block which points to many leaves.
3 – 1 Root Block which point to many branch blocks. Branches point to leaves.
4 – 3 Root blocks. 1 points to the other two. Which each point to Branches, which point to leaves.
5 – Many root blocks. 1 points to the many. Which each point to Branches, which point to leaves.

And you’re right, it’s excellent. The OK Computer numbers sound just as good as they did 10 years ago which is a good sign there’s something special about them :) They always seem to play with everything in it’s right place …

However, an Oracle B-Tree index can only ever have the 1 root block. So with a height 4 index, there’s the 1 root block which points to a “few” level 1 branch blocks. These in turn then point to “many more” level 2 branch blocks which in turn point to the leaf blocks.

Note the branch blocks simply contain enough index column data so we can navigate down the index structure to find the specific leaf block in question.

With a height 5 index, we have the one root block (so 1 level is always just this one root block) which points to a “few” level 1branch blocks which in turn point down to “many more” level 2 branch blocks which in turn point to “many many more” level 3 branch blocks which in turn point to what would usually be a massive number of leaf blocks.

So the top level is always the root block, the bottom level is always the leaf blocks and any intermediate levels are branch blocks.

Most deleted space is simply reused by subsequent inserts and so has no real affect. You delete an index entry one minute, the space is reclaimed the next time an insert is made in the leaf block. In some rare cases, there are no subsequent inserts in the leaf block and so the space isn’t reclaimed, unless the entire index block is emptied.

There generally would have to be an extraordinary amount of delete activity with no subsequent insert activity in the associated leaf blocks for the index structure to be so impacted that the index height has increased prematurely and could be reduced with an index rebuild.

Hi Richard,
Just want to clarify my understanding. You mentioned “Note the index Root block remains the same throughout the entire life of the index, no matter the index height.”
But the root block also splits when it does not have enought space to hold the pointers to the leaf blocks. So the root block keeps chaning right?. or does the above sentence mean that the original root block exists with information on leaf blocks and RBA but the contents of root block can actually be spawned across multiple blocks.
Also another clarification is, whenever root block splits then there is a change in height of the index ; whenever the leaf block is full but the root block does not split then a new leaf block is added horizontally but not vertically (width increases but not the heeight). Depending on the index entry being added the width increases either on the left side of the root block or the right side of the root block..

When the root block splits, Oracle allocates 2 new blocks. Oracle then distributes the contents of the root block evenly between these 2 new blocks and replaces the contents of the root block with pointers to these 2 new blocks. Therefore, the root block remains as the root block and indeed the index height now increases with there being a new index level that at this stage consists of these 2 new blocks.

The root block splitting is the only time the height of an index increases and there is only ever the one root block in an index structure.

Correct, when a leaf block splits, the index increases in width. When the root block splits, the index increases in height.

“You delete an index entry one minute, the space is reclaimed the next time an insert is made in the leaf block. In some rare cases, there are no subsequent inserts in the leaf block and so the space isn’t reclaimed, unless the entire index block is emptied.”

Does this mean that the space, that is now empty in the leaf block due to a deletion is reused by Oracle? I heard somewhere that the whole leaf block has to be empty in order to be reused.

Thanks Richard. The question that now arises in my mind is “When do we actually need to rebuild a B*Tree Index?” How can we come to know that its time now to ‘rebuild an Index’? Or to look at it the other way, “Does an Index (B*Tree) ever needs to be rebuilt?

An invalid or unusable index is simply one which is no longer being maintained by Oracle and can not be considered by the CBO until it’s been rebuilt. There are a whole stack of reasons why an index may be in this state, generally as a result of some DDL operation (say on the partitioned parent object) or because it’s been manually set that way via the ALTER INDEX command to say speed up bulk DML operations.

the number of levels between the Root block and any/all of the Leaf blocks is always and must always be the same. Hence, an Oracle B-Tree index is always structurally height balanced, always.

As you replied to Kumar,the index height will increase when the Root block will split.I have some questions about it.Please have a look,
Q1)So the height of the index indeed increases?May be I am saying a classic “myth” only but with large depth in the index strcture from Root to Leafs,wont it be more costly to scan the index?I mean from Root ->Branch/es->Leaf/s as the levels are much more now?
Q2)I am not able to understand the last line of the reply,’
Hence, an Oracle B-Tree index is always structurally height balanced, always

What does it mean to say “height balanced” here ? And if in case there would be an imbalance(may be its not there but just for the sake of question) , what it could be?
Thanks and regards
Aman….

Q1: You used the word “much” a number of times in the question but generally it’s a very uncommon event for an index to increase in height. When it does, it needs to typically be several 100s of times larger again for it to next increase in height. So yes, there will be a logical I/O for each level in height the index needs to navigate, but if the index needs to access say 100 rows, that’s possibly 100 I/Os required to access the table so the extra cost may not be that “much” significant.

Q2. Yes, an index is always height balanced meaning it costs the same number of LIOs to access any specific leaf block in the index structure. However, it’s possible for an index to be “skewed” such that an area of an index is not as densely populated as other areas. In which case, it may require the access of more leaf blocks to read a similar number of index entries in different parts of an index.

Hi sir,
Thanks for the reply.Please bear with me as I am still struggling with the answer.
What I have understood is that oracle will increase the index.It may be not so often but with the massive amount of inserts happening,it will happen.What do you mean by saying that if the index needs to access 100 rows that’s 100s to access the table?And why this wont be significant cost?
2)So this is what we mention as “unbalanced index” for which we are recommended to create Reverse Key indexes?Is it correct sir?
Thanks and regards
Aman….

Firstly, for some reason. your comment was initially treated as spam, hence the delay in it appearing.

1) What I was trying to say, is that if you have a lots of inserts and the index does indeed grow so that it increases in height by 1, that’s an extra logical I/O required to access the index. So if the index has say a height of 4 now, that’s 3 LIOs to read down to the first leaf block and say 2 I/Os to read the index entries for 100 rows (as we can typically hold many hundreds of index entries per leaf block). So let’s say that’s 5 I/Os to read the index for a 100 row index scan.

However, depending on the clustering factor of the index, we may need to read 100 table blocks to access the 100 rows of interest, especially if the table is massive and if the CF is only average.

So that’s 5 I/Os due to reading the index vs. 100 I/Os required to visit the table blocks. The extra single I/O associated with the index increasing in height is now relatively insignificant, with it being 1 I/O out of 105 I/Os in total.

2) No, you would not typically create a reverse key index to deal with a skewed index, although it may help in that sparse deleted space can be reused for monotonically increasing indexes. Reverse key indexes solve contention issues but introduce other problems and so should only generally be considered for contention issues.

I’ve previously discussed reverse key indexes, perform a search on Reverse on this site for the related posts.

May be you covered this theme earlier, but I don’t know.. I will be appreciated for your reference..

Let’s consider not full first block of index. It is a root block, it is a leaf block. Rows entries are ordered on key values in this block, am I right?

That is, with the each new entered row Oracle rebuild this block to save the order of rows inside this block. Is it not so expensive to do it?

And the second question. How Oracle seek necessary row inside this (any) block to read a ROIWD or to find the place where to insert a new index row? Is this process looks like the search on index tree? (IE, compare taking value with a kept values by criteria of smaller-greater)

Yes, index entries are always ordered. Yes it comes at some, relatively small cost but the cost would be worse if index entries were not ordered within a block as it would make finding the required index entry much more difficult.

Oracle has to just read through the index entries within the block in order until it finds the appropriate location for the new index entry.

It depends somewhat on the type of index and why you’re getting contention. Is it a monotonically increasing b-tree index ? If so, then you may need to consider reversing the index. Else maybe converting the index to be a local index if on a partitioned table. If there’s a lot of block splits, then look at how the freelists are defined. If it’s a bitmap, then look at the locking implications.

I got list of 3 indexes in the OEM and all are B-tree index.2 of them are primary key index and values for this column are generated using sequence number and there are no deletes on all of the 3 indexes..the tables are not partioned table and tablespace is locally management with manual segment space management.

next example you put the commit inside the loop,so that it will commit after each insert,how does the the values get changed..?
SQL> SELECT lf_blks, pct_used FROM index_stats;
LF_BLKS PCT_USED
————————
36 51

When an insert is made, it needs to traverse the b-tree to ensure the insert ıs made in the correct index leaf page and if the associated branch blocks are not in memory, then they need to be read off disk like any other block. However, this is relatively unlikely as index root/branch blocks have a high caching charateristic.

I’m in the middle of a benchmarking exercise, I’m finding a lot (30% and upto 70%) of read i/o at O.S level for an application that has tables with multiple indexes and the application is primarily insert only. The inserts are simulated by a load process. So, was just trying to figure out if an insert on a table induces read i/o on a O.S level as I expected the b-tree traverse to put an additional load on cpu and the insert to induce “write” i/o than “read” i/o at O.S level. So, inserts on tables with multiple indexes should infact contribute more write i/o than read i/o due to their caching characteristic. Still trying to figure it out!

Hi Richard,
I am a big fan of your oracle index blog. Thanks for sharing your extperience through this blog.

We can get the statistics related to indexes in “User_indexes” and “index_stats”. What is the major difference between these two.
I created the table and Index, inserted 5000 Rows. After that Deleted all the rows and Gathered the statistics(& Validated the index structure).

Now In User_Indexes I am seeing Leaf_blocks=0 and Blevel=0
but Index_Stats is showing
LF_BLKS=25
Height=2

I had one scinario where the index rebuild has helped in improving the performance.
–
1) I have a table which is part of Oracle Applications product suit.
2) this table has an index on a column called “invoice_flag”
3) this invoice_flag can have only 2 values either NULL or ‘N’
4) 99% rows will have invoice_flag as NULL and 1% or lesser have ‘N’

– this column (invoice_flag) gets updated very frequently and too many times as well

this kept happening for quite sometime..and the performance of this program (the program has that update statement as well as many othe SQLs and the SQLs with explicit hint to use ) started degrading over a period of time.

to give an sample of this

update mmt set invoiced_flag=null
where
(((invoiced_flag=’N’ and transaction_source_type_id in (2,12)) and
transaction_action_id in (1,27)) and not exists (select null from
hr_organization_information hoi ,oe_order_lines_all oel ,
mtl_intercompany_parameters mip where ((((oel.line_id=
mmt.trx_source_line_id and hoi.organization_id=mmt.organization_id) and
hoi.org_information_context=’Accounting Information’) and
mip.ship_organization_id=TO_NUMBER(hoi.org_information3)) and
mip.sell_organization_id=oel.org_id)))

This fits one of the “classic” profiles for when an index rebuild can be beneficial.

The issue here is that you once upon a time had 1.6GB of index entries. In other words, previously you had a large number of records that had a value set to ‘N’. However, when you perform an update and set the flag to NULL, you’re effectively deleting the index entries from the index because an index entry that is entirely NULL is not indexed in a b-tree index. By updating all flag values from ‘N’ to NULL, you’re effectively deleting all the index entries in the index.

So you end up with an index that’s now made up of nothing but deleted entries. Now this space can be reused, leaf blocks that contain nothing but deleted entries are placed on the freelist, but you’re simply not adding in enough rows with a value of ‘N’ to reuse much of this space.

By adding in a relatively few number of index entries, you don’t make the index any bigger, but you don’t fill in and reuse much of this deleted space.

This explains why you’re index was so large but can be rebuilt to be so much smaller. You have effective “permanently” reduced the number of index entries you store in the index.

Now the problem you have is that although an index leaf block that contains nothing but deleted entries can be reused, it remains in its current location within the index structure until it gets reused. Because all the index leaf blocks had previously stored flag values of ‘N’, all branch blocks point down to leaf block that COULD house an “N” value for the flag.

Therefore, when you perform an index range scan, Oracle is forced to visit the first leaf block because it potentially could contain a “N” value of interest and has to navigate across all the leaf blocks to guarantee it will find all the “N” values in the index. Reading a 1.6GB index in this manner is going to be slow …

By rebuilding the index, you dramatically reduce the number of leaf blocks and so reading the whole index is nowhere near as expensive.

If you only insert and then subsequently update to NULL consistently a relatively few number of flag values, you will not have to rebuild the index again, because all the deleted space will be reused and the index will remain at a stable size.

If however, you were to ever insert a massive number of rows with a flag of “N” again (hence increase the overall size of the index), then update them to NULL (hence delete all the index entries), then proceed with only inserting and updating a small number of “N” values, you may need to rebuild the index at this time to permanently reduce the index size again.

This example of yours is similar to the scenario when a table is permanently reduced in size, except in the case, it’s the index, not the table, that has permanently reduced in size.

For more scenarios when index fragmentation can occur, see this presentation of mine:

How does we compute the CR at the individual steps of a tkprof report. this is more like an acedemic question than a feedback. so please ignore if you don;t have enough time.
but if you can enlight that will help many of us.

It of course depends on the specific step. However in the case you highlighted, the CRs of the unique index scan is basically calculated by the sum of the CRs for all distinct unique scans, as determined on the number of interations based on the outer table in the NL join.

So if 100 unique scan are required within the NL join, then CRs will be the sum of 100 unique scans on the inner table.

I have table in Oracle which has more than 500 mill rows and grws at high velocity. Operations ar mostly reads and updates. Deletes are rare. It is indexed good. B level idex is at 3.
It was suggested that this table should be split and unwanted rows could be moved to a history table (involves a delete and an insert) depending some conditions. on an average 150 rows would be moved over per condition. assuming this happens 10000 times a day, an average of 150 * 10000 rows would be deleted from one table and inserted into another. With this logic the new table woul have an average of 8mill rows at any given time versus the 500 mil rows and growing.
It was pointed out that the splitting of the table will not improve any performance while querying the table. Table size has no bearing on query performance as long as the table is well indexed. Would this be a true statement.
Also would this degrade the overal performance since it now involves deletes and inserts and therefore there will be increase in I/O.
Thanks for your time. Your blogs are very helpful.

Very briefly, it depends whether or not splitting the table improve performance and it depends on the queries in question.

If the queries only access a small amount of data and the index can effectively access all the data, then splitting the table will indeed make no real difference because the index is still going to access the same amount of data anyways. A query returning a couple of rows will basically perform the same on a 1 million row table as it does on a 500 million row table.

However, if the query is returning lots of data, then potentially an index might be a little better than a FTS because the FTS needs to access all the table blocks and the index may not, but if only the table contained just the data in question, then a FTS will become more efficient because it can reduce the number of blocks in needs to access while the index might not.

So the table size can have a bearing on performance if by varying the table size you can reduce the blocks you need to access via a FTS such that it make the accesses cheaper than an equivalent index. It’s effectively what partitioning pruning achieves.

If you have the storage and you have selective indexes, then generally it’s less maintanence and more flexible to keep the data online, perhaps made more flexible still with the use of partitioning where it’s might be less problematic to remove portions of a table or make history information read-only as necessary.

Thanks for taking the time to answer my question. It is helpful.
The second part of my question is however unanswered in the sense that when a i move this data to a history table it involves the deletion of the rows from the original table and inserting into the history readonly table. Does this I/O offset any gains obtained by splitting the tables making the working table smaller. The Insert/delete will be quite a frequent operation – about 10000 a day and would move an average of 10000 * 150 rows.

It depends on what “gains” you actually do obtain, if any. You don’t mention what “reads” means, whether you generally read one or a few rows at a time or whether you read many rows at a time, Whether you perform reads relatively few times a day or whether you have massive numbers of reads per day. You don’t mention what updates you perform, how frequent, how Oracle finds the rows to update, etc. Do you access and read the history data any differently to the current data, etc. You need to roughly determine:

a) cost of mantaining a history table

b) savings of each different read/update operation

c) number of each read/update operation

so you can determine whether a) is less than the product of b) and c).

From an indexing perspective, if you delete older, history table and insert newer data and if the index values monotonicaly increase and you only delete some of the older history records, then you also run the risk of fragmenting the indexes. Which in itself might not be a problem, except wasting unnecessary space, unless you often read lots of older records.

So it all depends.

But what I would recommend as I previously mentioned if you’re looking at deleting lots of older records and moving them into a history table is partitioning. Because rather than perform expensive delete operations, you may find exchanging older table partitions is a far less costing way of moving 1.5M rows per day.

Lets say the index is based a numerical ID. Let’s say the maximum ID currently in the index is 424242.

If a new index entry with a value of say 424243 is to be inserted into the index and it causes an index block split because the right hand most leaf block in the index is currently full, then because the new value 424243 is the maximum or the greatest value currently being indexed, Oracle will perform a 90-10 split.

If however, the new index entry was only 424241, then it would not have been the maximum index value (as 424242 is greater), and it would only have resulted in a 50-50 block split.

Does the volume of data in a table on which the index is built have any impact on the insert performance?
For eg, let’s say I have a partitioned table that’s subjected to high transactional inserts, in this case will I better off in terms of insert performance with locally partitioned indexes or global non partitioned index? Just want to confirm whether height of the index has any impact on dml performance.

It can potentially have “some” impact in that to insert into the index structure, Oracle has to traverse the index tree and the greater the height, the greater the LIOs. These LIOs on the index branches are very likely to be cached and so have minimal impact unless the insert rates are extremely high (in which case other factors such as associated block splits come into play as well).

Hi Richard
Very good article, Thanks for this.
I have one questions after reading your reply No. 37
How to find out that in my particular index contains lot of deleted entries and because of application behavior those space are not reused and as a result index size is growing. Is there any way to find out such index so that I can rebuild such indexes?

You mentioned in your April 29, 2008 reply that “the space isn’t reclaimed, unless the entire index block is emptied.” So when the space is reclaimed when entire block is emptied does this process happens during the deletion process?

When an entry is deleted does Oracle adjust the leaf node entries, sort of a reverse of the insert process? So in your example, if the last entry added in the root node that cause it to split is deleted will Oracle just leave things as is or will it move all the entries back to the root and free the 2 new blocks?

The short answer is no. Once an index has increased in height, it will not automatically reduce in height again following subsequent updates/deletes. There will always be at least one pointer value in each index level.