In Part 1 of this series I talked about the basic problem, which is that plan_hash_values are not based on the whole plan. One of the main missing components is the predicates associated with a plan, but there are other missing parts as was pointed out in Part 1 of Randolf Geist’s post on the topic. At any rate, predicates seem to be the most critical of the missing parts.

The purpose of this second post on the topic is to talk about diagnosis. Basically how do you identify when some other part of a plan has changed that doesn’t affect the plan_hash_value, specifically a predicate.

So first I thought I would show a few examples of statements with the same sql_id and plan_hash_value that have other plan differences (in the predicate section). To do this I used a method proposed by Randolf Geist a few years back in his 2nd post on the topic which covered Alternative Ways to Calculate a PLAN_HASH_VALUE In that post, Randolf shows several ways to compute a hash value on any or all of the columns in the v$sql_plan table. I wrote a simple script around one of the those methods (find_pred_mismatch.sql), and as you might guess from the name, I limited this version to not include all the columns in v$sql_plan, but to only identify statements with mismatched predicates. To be more explicit, the script will locate statements in the shared_pool that have multiple child cursors, where there are more than one set of predicates to go with a single plan_hash_value. Here’s an example:

Last week an interesting issue popped up on a mission critical production app (MCPA). A statement that was run as part of a nightly batch process ran long. In fact, the statement never finished and the job had to be killed and restarted. This particular system is prone to plan stability issues due to various factors outside the scope of this post, so the first thing that the guys checked was if there had been a plan change. Surprisingly the plan_hash_value was the same as it had been for the past several months. The statement was very simple and a quick look at the xplan output showed that the plan was indeed the same with one exception. The predicate section was slightly different.

As a quick diversion, you probably already know that the plan_hash_value is calculated based on partial information about the plan. Arguably it’s the most important parts, but there are some important parts of the plan that are not included (namely the stuff that shows up in the predicate section of the plan). Randolf Geist explained which parts of the plan are used in calculating the plan_hash_value well in a post on How PLAN_HASH_VALUES Are Calculated several years ago. His summary was this:

So in summary the following conclusions can be made:

– The same PLAN_HASH_VALUE is merely an indicator that the same operations on objects of the same name are performed in the same order.

– It tells nothing about the similarity of the expected runtime performance of the execution plan, due to various reasons as demonstrated. The most significant information that is not covered by the PLAN_HASH_VALUE are the filter and access predicates, but there are other attributes, too, that are not part of the hash value calculation.

The production issue got me thinking about several things:

Could I come up with a simple test case to show a severe performance degradation between executions of a statement with the same plan_hash_value because of a change in the predicates? (the answer is it’s pretty easy actually)

What forensics could be used to determine when this situation has occurred?

Since I’m on vacation and not “really” working, I thought I might have time to write up a quick blog post. The idea for this one was triggered by one of Maria Colgan’s presentations at Hotsos last week. Maria was talking about SQL Plan Management and Baselines and somehow got me thinking about the DBMS_XPLAN option to display plans for Baselines. This is a pretty neat feature that allows you to the see the plan associated with a Baseline (well sort of).

The 11.2 documentation (Oracle® Database PL/SQL Packages and Types Reference) says this about the DISPLAY_SQL_PLAN_BASELINE function:

This procedure uses plan information stored in the plan baseline to explain and display the plans.It is possible that the plan_id stored in the SQL management base may not match with the plan_id of the generated plan. A mismatch between stored plan_id and generated plan_id means that it is a non-reproducible plan. Such a plan is deemed invalid and is bypassed by the optimizer during SQL compilation.

But what does that mean? Well in short it means that Baselines don’t store plans, they store hints that when fed to the optimizer will hopefully cause it to come up with the desired plan. Baselines also store a plan_hash_value so it’s possible to tell whether the hints worked or not. Baselines do not actually store all the steps of a plan. So if that’s the case, then it’s obviously not possible for the display_sql_plan_baseline function to show the plan if the optimizer can’t reproduce it for some reason. When the doc’s say “it is possible that the plan_id stored in the SQL management base may not match with the plan_id of the generated plan”, that’s what they are talking about. I decided to create a test case to see what happens when the generated plan can’t match the original. Here’s the basic idea:

I modified my create_1_hint_sql_profile.sql script (which I blogged about here: Single Hint Profiles) to allow any arbitrary text sting including quote characters. This is a script that I use fairly often to apply a hint to a single SQL statement that is executing in a production system where we can’t touch the code for some reason. For example, it’s sometimes useful to add a MONITOR hint or a GATHER_PLAN_STATISTICS hint to a statement that’s behaving badly so we can get more information about what the optimizer is thinking. I recently updated the script to allow special characters in the hint syntax. This feature is useful when you want to add something like an OPT_PARAM hint that takes quoted arguments. The change makes use of the q-Quote feature which I blogged about here: q-Quote. (the original version just barfed on quotes being input as part of the hint)

In the example I turned off cell offload processing with the ALTER SESSION and ran a SQL statement that took 28 seconds. Then I used my fsx.sql script to verify that the statement was not offloaded and to find the SQL_ID. Next I created a 1 hint Profile with an OPT_PARAM hint that set the cell_offload_processing parameter back to TRUE using the new version of my create_1_hint_sql_profile.sql script. Next I used my sql_profile_hints.sql script to verify the text of the hint that was added to the Profile. It looked good including the quotes. When I executed the statement a second time it ran in 5 seconds. I then used fsx.sql again to see that the statement was offloaded for the second execution (child 1).

I ran into an interesting issue last week having to do with plan stability. The problem description went something like this:

“I have a statement that runs relatively quickly the first time I run it (around 12 seconds). Subsequent executions always run much slower, usually around 20 or 30 minutes. If I flush the shared pool and run it again elapsed time is back to 12 seconds or so.”