August 7, 2009

Index Fragmentation

When people talk about “index fragmentation”, what do they mean, and why do they care ?

I often see email or forum posts from people claiming that their indexes are fragmented and need to be rebuilt – but they rarely explain (even when asked) what they mean by “fragmented”, and how they have measured the “fragmentation”, and why they think they have evidence that the index needs to be rebuilt.

So if you are accustomed to talking about indexes being “fragmented”, would you let me know what you mean, and how you measure “fragmentation”. (I can think of three or four interpretations for the term – but I’m interested to hear from people who actually use it.)

I’m not using this term ;) Here is how I can interpret “fragmented index”:
1) there are many leaf blocks with “holes” which would never be reused
2) many index segment extents are under-utilized, i.e. there big number of unused free blocks
3) indexed data values are in fragments, i.e. data is sparsed when ordered, with points of very high and low density

I agree with Daniel.
Sometime ago it was a formula with DEL_LF_ROWS and LF_ROWS and IIRC index could be called fragmented if it was more then 20 % of delete rows comparing to LF_ROWS.
Now I have found in Metalink note 30405.1 that

“As a rule of thumb if 10-15% of the table data changes, then you should
consider rebuilding the index.”

From time to time I use the feedback links on Metalink notes to point out errors and omissions – but somehow there’s always another note waiting to crawl out of the woodwork.

I can’t help noticing that the note you’ve quoted was: “Checked for relevance on 13-Jun-2007″, and yet the “Related Documents” are both Oracle 8 manuals, and when you follow the links to the two metalink notes referenced you find:

So in that case if coalesce can be more efficient Metalink rule of thumb is useless, but what about deleting about 50 % of blocks ? Of course it’s depend as always in Oracle but I’m just looking for something which can be used a trigger to take a look on particular index.

ps.
Thanks God there is still google and thefreedictionary.com to translate all English idioms ;)

Marcin,
Sorry about the “crawling out of the woodwork” idiom – when writing quickly I don’t always remember to edit for EFL (English as a foreign language) readers.

I think the del_lf_rows vs. lf_rows comparison might be of some use to a new DBA (or visiting consultant) who is trying to identify interesting spots in a badly documented system – but if you have a process that has a pattern of deletes and inserts that makes that ratio useful, then you should have a proper coding strategy embedded in the routine that does the deletes anyway. (And the “validate index” command locks the table anyway, so isn’t appropriate to a lot of systems these days).

So – you need a strategy to get from ignorance to familiarity, and something this code (which has an error that I will correct eventually) is a “brute force” way of discovering interesting items – but once you know which indexes are possible threats, you shouldn’t need to use the method ever again.

I’ve often wondered if people asking such things have sql server in mind, so I googled for fragmented index sql server. It seems that it is common enough there to have magical DBA incantations to deal with it, varying by version. So I guess it must be a problem.

In sql server it is assumed to be a problem when using the index for the equivilent of range scans esspecially where results are to be ordered on the indexed values. Th eproblem I have is I don’t have the toolset to prove/disprove that there is actually a performance problem with fragmented indexes in sql server. A lot of DBA’s will rebuild/defragment sql server indexes simply because its recomended best practice.

Here is my understanding:
Index Fragmentation is free and available space within the index structure which can not be used for future inserts.
This fragmented space can be cleaned up by coalesce or rebuild.
I have used COALESCED option (can be done online with no additional space required) only and not REBUILD on indexes to reclaim free space.
For identifying candidates we can use analyze index index_name validate structure;— run during planned maintenance window if production object and see how much percentage of the allocated index space is actually used in the index_stats view.
Locally Managed Tablespaces should be used to avoid fragmentation in segments.
Keep up the good work. You are great!
Regards
Rakesh

I’ve just realised that you’ve used the “fragmented” word about tablespaces and segments – would you care to try your hand at giving a detailed description about what you mean by fragmentation in that context, why you think it matters, how you would detect the problem, and what symptoms you would expect to see that prove that it’s a problem.

I mentioned “Locally Managed Tablespaces should be used to avoid fragmentation in segments.” :

By using the Dictionary managed tablespaces and not correctly considering the extent sizes, chances of fragmentation are more in the tablespaces.
Oracle recommends use of locally managed tablespaces and Auto-allocate Extent Management/ Uniform Extent Management to avoid fragmentation.

Also, index segments get fragmented due to repeated manipulations of the balanced tree structures and incorrect setting of storage parameters like PCTFREE/ PCTUSED.

Below are some of the documents helped in my understanding of the same:

I hope you won’t think I’m picking on you – but your contribution is useful because it demonstrates how sometimes a concept gets detached from any real meaning.

In your original comment, you said of “index fragmentation” that it was:“free and available space within the index structure which can not be used for future inserts”
That’s a pretty good description – except: (a) all space with the index (leaf) structure is always available, so “and available” is redundant; (b) all space within the index (leaf) structure can be used for future inserts; and (c) the failure to re-use that space is a feature of the application, not of the index.

So, for indexes, I like to see people stop talking about “fragmentation” and start saying things like “I have an index with (a lot of) available space that my application doesn’t want to use.”

Your comments about “repeated manipulation of the balanced tree structure” look like a quote and the comments about pctused (it’s ignored for indexes) and pctfree (it applies only when you create/coalesce/rebuild the index) are particularly irrelevant. Note that every single insert or delete from an index may manipulate the tree structure.

As far as “fragmentation in segments” is concerned, you’ve now switched this to “…fragmentation [are more] in the tablespaces.” But you haven’t explained what you mean by fragmentation and why it matters. (Given the articles you list, I think I know what you have in mind – and again I’d rather not see the “F” word used as a label for it.)

I really wasn’t interested in what person A thinks person B means when person B says “my index is fragmented”. I wanted a few people (person Bs) who actually do say it to explain what they mean when they say it and, if possible, to explain how they detect the phenomenon.

I don’t like “fragmentation” in discussions about indices; there are enough arguments in previous posts.
But as there is a question about “fragmentation” per se, the 1st time I was confronted with it (in my PC-life) was fragmentation on an old DOS (I guess it was 3.0) FAT filesystem. In this context, “fragmentation” was the level of non-continuous physical blocks belonging to a file.
It’s hard to find equivalences in oracle structures: sequential vs. scattered reads (which could be a measurement) are affected by block size, segments and blocks in the buffer cache (at least), and as they are limited by segment-boundaries, it doesn’t matter if segments are in sequence or not. (Ok, on a single disk, it does matter, but I hardly can tell from within oracle DB only).
Maybe this can be used against the misuse of “fragmentation” in discussions.

When I posted the original note I wanted people to think about what they meant by talking about an index being “fragmented” – and then to realise that (as you suggest) it’s a very bad term to use about indexes.

Rakesh made the most important point – though with some errors: sometimes an index accumulates free space that is unlikely to be reused (at least in the near future) and results in queries doing more work than is really necessary. If this free space can be eliminated safely and cost-effectively then it is worth taking some time to notice the effect and do something about it.

Using an unsuitable and misleading term like “fragmented” stops people from thinking carefully about what the problem is, why it happens, and what to do about it.