Wednesday, July 22, 2009

Oracle provides in recent releases the PLAN_HASH_VALUE information, which according to the documentation, is the following:

"Numerical representation of the SQL plan for the cursor. Comparing one PLAN_HASH_VALUE to another easily identifies whether or not two plans are the same (rather than comparing the two plans line by line)."

So according to the documentation the PLAN_HASH_VALUE can be used as a shortcut to quickly and easily determine if two execution plans are the "same".

I think that the statement "the two execution plans are the same" suggests that execution plans having the same PLAN_HASH_VALUE yield the same or at least similar performance at runtime. An interesting point that I would like to cover here in the following test cases.

This raises the interesting question, what exactly is the PLAN_HASH_VALUE based upon? Obviously it is a hash value calculated using the execution plan information as input. Apart from the fact that hash values are in theory always subject to potential hash collisions, which means that two different inputs can lead to the same hash value, the more interesting question is, which attributes of the execution plan are used as input?

Having this information at hand allows us to get a better understanding if two execution plans with the same PLAN_HASH_VALUE actually have to a yield similar execution profile (which Oracle doesn't say but I assume is a common assumption - or may be misconception?).

Let's start with a simple example to determine what makes the PLAN_HASH_VALUE different. All results shown below come from an 11.1.0.7 Win32 instance with a 8KB default block size, a MSSM 8KB LMT tablespace and default system statistics.

Note the subtle differences: We have an index on a simple table, but these are actually three different objects, once owned by a different user and once with a different definition, but all three statements get the same PLAN_HASH_VALUE.

So this simple example already raises some of the most important points:

- The OBJECT_ID of an object obviously doesn't get used for the PLAN_HASH_VALUE calculation. Although present in the V$SQL_PLAN view, checking the PLAN_TABLE definition suggests why: It's not part of the PLAN_TABLE definition. This means that e.g. replacing an index with a different one but keeping the name will result in the same PLAN_HASH_VALUE if the operations and their order remain the same, like demonstrated (Although I have to admit that doing so might be called bad practice).

- The owner of an object doesn't get used either. Why this is so, remains unclear, since it seems to be part of all related object definitions (OBJECT_OWNER attribute in PLAN_TABLE/V$SQL_PLAN), but might be an very important point if you have multiple schemas with objects of the same name but representing different data or data volume. As you can see from the example the PLAN_HASH_VALUE is the same but the number of rows estimated is different due to the different underlying object (and its intentionally manipulated statistics).

- One of the most crucial aspects is also demonstrated: The FILTER_PREDICATES and ACCESS_PREDICATES information is NOT part of the PLAN_HASH_VALUE. This means that two executions plans can have the same PLAN_HASH_VALUE but behave significantly differently at actual execution time.

- What also is obvious from this example is that the ROWS and BYTES information is not used as part of the PLAN_HASH_VALUE. More on this later.

Whereas the previous examples mainly used different SQLs (and therefore had different SQL_HASH_VALUEs or SQL_IDs) this example uses the same SQL and demonstrates the following:

- Bad application behaviour (in this case different types of bind variables) or other reasons can lead to unshared cursors, i.e. multiple child cursors for the same SQL (not actually the main topic here)

- But although the different cursors have different access predicates (in this case the implicit type conversion), as already shown, different access predicates don't lead to different PLAN_HASH_VALUEs

For this particular statement the difference in the predicates very likely doesn't represent a threat, but there are more subtle cases where these differences can lead to significant changes in behaviour. Possible reasons are different order of the predicate evaluation (e.g. when having system statistics enabled, which is the default from 10g on) which can make a significant difference in CPU usage or general resource consumption depending on the actual cost of the predicate evaluation (e.g. a costly PL/SQL function call), or even the evaluation of (filter) predicates at different steps of the same execution plan which could lead to significant differences in the number of rows generated by each operation step of the execution plan, and therefore make an execution plan much more inefficient due to the larger number of rows processed.

What about the actual estimates associated with the particular operations of an execution plan?

It's obvious that none of the ROWS, BYTES, COST nor derived information like TIME in the later releases are used to calculate the PLAN_HASH_VALUE. So again this makes clear that the same PLAN_HASH_VALUE of two statements doesn't say anything about the similarity of the runtime performance.

Another example that demonstrates this point with a slightly more complex plan:

Notice how one statement performs an effective join using appropriate join predicates whereas the other one generates duplicate records. Again, since the filter and access predicates are not evaluated, these two plans get the same PLAN_HASH_VALUE, although we can say that one of them is potentially suboptimal (and usually would result in a significantly different execution plan which in turn would have different PLAN_HASH_VALUEs but there are certainly more complex scenarios where the optimizer goes wrong for whatever reason resulting in such potentially inefficient execution plans).

What about extended execution plan information like partitioning and parallel execution?

The PSTART and PSTOP information is not used either for the PLAN_HASH_VALUE, so different execution plans accessing different partitions of the same object might get the same PLAN_HASH_VALUE, but again the runtime performance might be dramatically different.

By the way above example demonstrates that the CBO peeks at the binds and uses then the partition level statistics of the corresponding partition defined by the bind values (in case the bind values prune to a single partition), although the execution plan might be executed with different bind values actually accessing different partitions at runtime.

The example uses different parallel distribution options for the same execution plan. In 10g and later this is reflected in different operations (like PX SEND BROADCAST) and this suggests that the PLAN_HASH_VALUEs are going to be different due to the different operations.

Running a similar test case on 9.2.0.8 shows it seems that actually some of the attributes related to parallel execution are also used to calculate the PLAN_HASH_VALUE:

Although the main operations and their order of execution stays the same, the calculated PLAN_HASH_VALUE is different in all three cases.

So in summary the following conclusions can be made:

- The same PLAN_HASH_VALUE is merely an indicator that the same operations on objects of the same name are performed in the same order.

- It tells nothing about the similarity of the expected runtime performance of the execution plan, due to various reasons as demonstrated. The most significant information that is not covered by the PLAN_HASH_VALUE are the filter and access predicates, but there are other attributes, too, that are not part of the hash value calculation.

- Still it's very likely that for the same SQL statement and the same unchanged underlying data the same PLAN_HASH_VALUE indicates similar expected runtime performance, except for some particular cases where the execution plan itself already contains "conditional" execution paths which might be taken by the runtime engine and therefore lead to significantly different runtimes. Some of these examples have already been covered by Jonathan Lewis: Conditional Plan and Hierarchical Queries in some versions of Oracle.

In the second part I'll demonstrate a possible approach how to calculate your own PLAN_HASH_VALUE that covers some of the information omitted if you suspect that you might have encountered a scenario where the same PLAN_HASH_VALUE suggests similar runtime performance but there are significant differences in the execution plans that you would like to have covered by the hash value calculation to detect these plan changes easily.

5 comments:

Nice post. There is a plan_id column in sqlobj$ (which contains info about sql profiles and sql baselines in 11g). It looks like a variation on a plan_hash_value - but doesn't match the plan_hash_value for the statement that the profile or basline is attached to. Any idea how they come up with that value?

FYI: I've just discovered that plan_hash_value depends on the platform Oracle RDBMS is running. Most likely it depends on Big/Little endian format. So, having two identical plans on different platforms results in different plan_hash_value. Something I didn't even think about before I tried to compare a bunch of plans from two different servers automatically.