November 14, 2010

Local Indexes – 2

In the previous note on local indexes I raised a couple of questions about the problems of different partitions holding different volumes of data, and supplied a script to build some sample data that produced the following values for blevel across the partitions of a list-partitioned table.

Q1: What do you think Oracle will record as the blevel at the global level for the two indexes ?
A1: As one of the commentators suggested, it seems to be the highest blevel recorded for any partition – in this case 2. (It’s possible that this assumption is wrong, of course, there may be some subtle weighting calculation involved – but I haven’t yet tested that hypothesis.)

Q2: If you have query with a where clause like “id between 100 and 400 and n1 != 5″ – which is designed very precisely to exclude the last, very big, partition – what value of blevel is Oracle going to use when considering the cost of using the index t1_id to access the data ?
A2: As I pointed out in an earlier note on list partitioned tables, Oracle doesn’t recognise the (obvious to the human eye) option for partition pruning in this predicate, so it uses the global blevel in the calculations.

The second answer is the one that is causing me a problem – because I have a client system where almost all the data is in a “dead” partiiton – it has a status, stored as the partition key in a list-partitioned table, of “COMPLETE”, and lots of their code includes the predicate: status != ‘COMPLETE’, but this can make the optimizer take the wrong execution path because it uses a global blevel that has been dictated by the huge volume of data that we know we don’t want to see.

The client queries are fairly complex, of course, but here’s a very trivial example demonstrating the basic problem (using the data generated by the code in the previous note – running under 11.1.0.6):

From Oracle’s perspective it has to visit all six partitions because it can’t use the most apporpriate index and do partition pruning – and the final cost of this simple query is 14 because the value used (six times, in effect) for the blevel in the calculations is two; but we have inside information that tells us that this is essentially an unreasonable cost.

If Oracle were to believe that a more appropriate blevel for this query was just one then the cost would drop significantly (although in this case the plan wouldn’t change):

Note how changing the global blevel by one makes the cost change by six – a consequence of the fact that we have six partitions with no pruning. If you’re having trouble with queries against partitioned table that don’t use the right index, take a close look at the data volumes and values recorded for blevel at the global, partition and subpartition levels – it’s possible that you’re suffering from a bias introduced by one partition being much larger than all the rest.

If you’re wondering how I got these plans (without simply editing them) it was by using dbms_stats.set_index_stats() to change the stored statistics – see “Copy Stats” for an example of the type of code needed. In cases like this, where I have better information about the data and the intent of the code than the optimizer has, I am perfectly happy to give a “more truthful” picture of the data to the optimizer by writing scripts to adjust statistics.

There are three drawbacks to such an approach, of course. First: on the next upgrade the optimizer might get smarter and make my clever little hack a liability rather than a benefit; secondly, there may be examples of application code that I haven’t noticed that might go wrong because of my hack; finally, and more importantly in the short term, I have to make sure that my code runs every time the statistics on the index are modified by any other program (such as the automatic stats collection job).

But the principle is sound – if we understand the system better than the optimizer then it’s positively important to help the optimizer in the most truthful way possible. List partitions (in a way similar to frequency histograms) are an obvious target for this type of treatment.

Related

Armed with this knowledge, one would question whether this design with all data been in one table with sub-partitions is the right design. Given that to move from partition 1 to partition 2 takes just as much work, why not have a separate table for all “Completed”.

Yes, slightly more complicated, but then you don’t have to deal with any of these problems.

That certainly makes sense – given the limited amount of information I’ve supplied about the system, and in some ways is very similar to Todor Botev’s suggestion on the previous blog entry to reduce the table to just two partitions – “old” and “new”.

In fact it might even be possible to implement your suggestions invisibly by creating a constraint-base partition view on top of the two tables and writing pre-insert/update/delete triggers on the view to make sure the data goes into the right place as a short-term strategy for mnimising the code change required. Depending on the definition of the primary key, splitting the table into two might even make it possble to define the “completed” table as a range-partitioned table so that we could make history read-only and do some compression.