Plan Stability Through Upgrade-Why is my plan changed?-bugfixes-2

On a database where we upgraded from 10.2.0.4 to 11.2.0.1 , We had a sql reported to have changed its response time on 11.2.0.1 .(At this point I need to say I’m sorry that I can’t share the SQL and PLANS because of data privacy. Tried to edit the sql and plans but query is massive . )

Interestingly when I run the optimizer_features_test for the versions between 10.2.0.4 to 11.2.0.1
only change was extra nested loop. Cost is same and plan generated for 10.2.0.4 was not same as pre-upgrade version

Child number 1 on 11.2.0.1 runs with cardinality feedback and uses right plan for OR with extra nested loop make up and drops the response time from 1323 seconds to 2 seconds
Other

So what is changed and how can I catch it if changing OFE is not working ? Its pretty obvious that we have a stats problem but this query involves 10 different table
and its not always easy to gather stats on tables where all other queries are running fine on. I needed to find a solution.

When I check the query it has an OR and the new plans basically do not generate a plan for OR. I modified the query to use union all instead of OR I can get the same pre-upgrade plan which is interesting
because stats are still same so how come Oracle generates right plan when I use union all and wrong plan when I use OR ? There must be something causing this behaviour.

Since Changing OFE did not change anything in terms of response time we may look into the situation where 8.0 bugfix suddenly appeared on 11.2.0.1 but were not active on 10.2.0.4

How can we spot the bugfix ? You can use the bugfix builder that generates script to test bugfixes for all version below the given version

Script is build_optimizer_bugfix_test_below_e: Same thing but creates a script for all the bugfixes for the versions below the given version

Source code of the script

undefine optimizer_feature_version
set heading off
set feedback off
define optimizer_feature=&quot;&amp;optimizer_feature_version&quot;
set term off
spool optimizer_bugfix_test_e.sql
select 'set feedback off' from dual;
select 'set heading off' from dual;
select 'set term off' from dual;
select 'set echo off' from dual;
select 'spool optimizer_bugfix_test_results_e.log' from dual;
select 'drop table bugfix_table purge;' from dual;
select 'create table bugfix_table (bugfix number,plan_hash_value varchar2(20));' from dual;
select 'alter session set optimizer_features_enable=''&amp;optimizer_feature'';' from dual;
select 'alter session set &quot;_fix_control&quot;='''||bugno||decode(value,1,':OFF''',':ON''')||'; '
||chr(10)||'@test_e.sql'||chr(10)||
'insert into bugfix_table values ('||bugno||',(select substr(plan_table_output,18,12) from (select * from table(dbms_xplan.display)) where plan_table_output like ''Plan hash value:%''));'||chr(10)||
'alter session set &quot;_fix_control&quot;='''||bugno||decode(value,1,':ON''',':OFF''')||'; '
from v$system_fix_control
where
to_number(replace(optimizer_feature_enable,'.',''))&lt;to_number(replace('&amp;optimizer_feature','.',''))
and value in (1,0) order by bugno;
select 'set term on' from dual;
select 'set feedback on' from dual;
select 'set heading on' from dual;
select 'break on plan_hash_value' from dual;
select 'select plan_hash_value,bugfix from bugfix_table order by 1;' from dual;
select 'set feedback on' from dual;
select 'set heading on' from dual;
select 'spool off' from dual;
set feedback on
set heading on
set term on
spool off

As you can see 5129233 is again enabled on 10.2.0.4 but bugfix 3120429 does not exist and not enabled/disabled on a database running on 10.2.0.4 binaries.

Even both of them are solving the issue when we are running on 11.2.0.1, in my opinion controlling bugfix is 3120429.
Thanks to the reasonable developer who accepted to change the code using union all instead of or we did not need to enable disable a bugfix.

Since we find two bugs lets test them (I add comments to force new parse instead of flushing the sql for fresh runtime stats so sqlids are different )

On 11.2.0.1 and 11.1.0.7 plan is changed depending on the bugfix but for 10.2.0.4, 10.2.0.5 and 11.1.0.6 plan is constant for changing different bugfixes.
You can also see previous cardinality feedback plan is also slower then bugfix disabled version.

We can dig further but its not the point of this post and Since I managed to take the response time back to where it was from 1323 seconds to 0.5 seconds I can stop here.

As you can see we still spot a difference after upgrade even time travelling between releases by altering OFE did not help us at first point.

This post was to explain complicated issue and things you can face with my approach.

I also find it very usefull to have v$system_fix_control table snapped before you upgrade/patch and compared afterwards, so you won’t get shocked with newly added previos versions related bugs.