You are here

Todays story is about, how dynamic sampling can effect your plan stability during upgrade and how you can recover from it as long as you know what your are doing.

A little bit of background, I’m currently in the middle of migrate and upgrade cycle for 12TB warehouse from 10.2.0.3 Solaris to 11.2.0.3 (yet) Linux.
Plan stability on this system is maintained by locking the main driving table statistics and hinting the queries in case CBO does not pick the right driving table.
The system uses temporary tables a lot for staging operations and they rely on dynamic sampling Level 2 which 2 is enough. They also use parallelism for the heavy batch operations. Good part of the system is, since they use partitioning heavily they can simulate the close to real workload on cut down version of the database. What we are doing is getting the cut down version 10.2.0.3 export from solaris imported into 11.2.0.3 linux and run the batches for the close of business date with same memory and other settings. Initial load test done on OFE(optimizer_features_enable)=11.2.0.3 and we saw many parallel queries started to suffer. Our aim was not to check what is slow just collect the necessary data so we did not tune anything. Then we did another test on OFE=10.2.0.3 and saw that 10.2.0.3 did way better for parallel queries but it also suffer for other issues which is not subject of the current post. We have many other runs with different options for OFE and new stats but they will also be covered later if I come up something interesting. Todays story is why our parallel queries suddenly went bad on OFE=11.2.0.3

After this background now I can start the actual story.

Once I check the plans I saw something interesting. Oracle suddenly started to use dynamic sampling 7 without any manual setting. I did ask it on twitter and thankfully Timur Akhmadeev pointed me the right direction on the manual (RTFM rocks)

About where the dynamic sampling is used

Oracle 10GR2 documentation says

14.5.6 Estimating Statistics with Dynamic Sampling
This dynamic sampling feature is controlled by the OPTIMIZER_DYNAMIC_SAMPLING parameter.
For dynamic sampling to automatically gather the necessary statistics, this parameter should be set to a value of 2 or higher.
The default value is 2. See "Dynamic Sampling Levels" for information about the sampling levels that can be set.

Oracle 11GR2 documentation says

13.6.2.2 When the Optimizer Uses Dynamic Sampling
During compilation, the optimizer decides whether to use dynamic sampling based on a number of factors,
including whether the statements use parallel processing or serial processing.
For parallel statements, the optimizer automatically decides whether to use dynamic sampling and which level to use.
The decision depends on the size of the tables and the complexity of the predicates. The optimizer expects parallel statements to be resource-intensive,
so the additional overhead at compile time is worth it to ensure the best plan. The database ignores the OPTIMIZER_DYNAMIC_SAMPLING setting
unless set to a nondefault value, in which case the value is honored.
For serially processed SQL statements, the dynamic sampling level depends on the value of the OPTIMIZER_DYNAMIC_SAMPLING parameter and
is not triggered automatically by the optimizer. Serial statements are typically short-running, so that any overhead at compile time
could have a huge impact on their performance.

Did you spot the difference ? on 10GR2 the decision of the level completely depends on the optimizer_dynamic_sampling parameter (system,session,hint level),
however on 11GR2 for parallel statements optimizer “automatically” decides whether to use dynamic sampling and which level to use.

Since 2005 where my Oracle career starts, I learned one thing, When there is something “automagically” adjusted and decided, there is always big possibility for less stability.

Below is the difference in a simple case where I create a table, do not gather statistics, keep optimizer_dynamic_sampling=2 and run explain plan with parallel hint.

As you can see Oracle adjusted my default dynamic sampling level to 6 on OFE 11.2.0.3 when the statement is parallel and keeps it as Level 2 when statement is serial or OFE is lower than 11.2.0.1. Since this is a simple case there is no effect but when you have complex joins which you are sure about the plan you can suddenly start to see changes and got heavily hit by the feature. I can assure, I had the issue with tables that has got statistics as well so it is not the case for tables that need dynamic sampling.

Since I introduced you with the feature, next question is how can we control the feature?. I think optimizer guys did a mistake here and did not give any control via an optimizer parameter. How do I know it ? If you follow what I have done in the optimizer features post you can see there is no controlling on this feature via a parameter. So we only have optimizer bugfixes as controller.
Lets check the bugfixes related with dynamic sampling.

For this I use the sql below for bugfix.sql

column sql_feaure for a34
column ofe for a10
column description for a64
select bugno,optimizer_feature_enable ofe ,sql_feature,description,value from v$system_fix_control where (sql_feature like upper('%&1%') or upper(description) like '%&1%')
order by 2 desc nulls last,1 asc;

As you can see dynamic sampling is back to the default level 2. You can use this fix control till oracle comes up with a better idea for your parallel queries where you want plan stability and you are happy with your defauly dynamic sampling level.

Attention I must also warn the a bit of unexpected behaviour that this automagical adjustment is working a bit weird. When I set the dynamic sampling to level higher than 2 Oracle accepts the new level as adjusted value like below

Update:After reading the blog posts Dominic shared in comments, I ‘m now sure feature is only active with default optimizer_dynamic_sampling=2