Parallel Parking – What is causing the Query to be Executed in Parallel?

31072012

July 31, 2012

I encountered an interesting OTN thread that caused me to stop and think for a couple of minutes. The OP (original poster) in the OTN thread wondered why parallel execution was used for a query when the parallel degree for the table accessed by the query and its indexes were specified to have a parallel degree of 1. For some reason, while the OP provided DBMS_XPLAN generated execution plan output, the Note section from the output was not included in the OTN post. With parallel execution enabled in the session (ALTER SESSION ENABLE PARALLEL QUERY;) the execution plan showed a cost of 2,025. With parallel execution disabled in the session (ALTER SESSION DISABLE PARALLEL QUERY;) the execution plan showed a cost of 36,504. From this, we can determine that the parallel degree for the query is 20 (36504 / 2025 / 0.9 = 20.03). The table is partitioned on a date column, apparently partitioned by month.

Interesting?

Could the parallel execution be caused by the PARALLEL_AUTOMATIC_TUNING parameter’s value? The DBMS_XPLAN output indicates that the OP is using Oracle Database 11.2.0.2, where that parameter is deprecated. What about the PARALLEL_DEGREE_POLICY parameter? When that parameter is set to AUTO, the query optimizer is free to plan a parallel execution for a query, even when the tables (and their indexes) accessed by the query are set to a parallel degree of 1.

1,048,575 partitions… I was expecting about 100 partitions because there should be 3,000 distinct dates in column C3. Maybe I should find some time to investigate? Moving on…

To test the theory that the unexpected parallel execution might be caused by a non-default value for the PARALLEL_DEGREE_POLICY parameter, we need to first use the DBMS_RESOURCE_MANAGER.CALIBRATE_IO procedure to indicate performance metrics for the server. To save time, I found a blog article (by another OakTable Network member) that shows a back-door approach that does not require the CALIBRATE_IO procedure (the usual warnings apply here – we are technically modifying objects in the SYS schema). The back-door approach, when connected as the SYS user (note that the script first selects from RESOURCE_IO_CALIBRATE$ so that if values are currently present, those values may be restored after testing completes):

In the above, the Note section indicates that the automatic degree of parallelism was set to 2, and that dynamic sampling was performed at level 2. I have yet to collect statistics on table T1, so that explains why dynamic sampling was used when the query was executed with and without parallel execution. If you take a close look at the OTN thread that is linked to at the start of this blog article, you will see that the dynamic sampling level was set to 6, but only for the parallel execution. The simple answer for this change in behavior is that it is expected, as described in this article by Maria Colgan (another OakTable Network member). We can confirm that the cost calculation is working as expected: 59271 / 32790 / 0.9 = 2.01.

Let’s try executing the query of table T1 using a WHERE clause that is similar to what is found in the OTN thread:

Notice anything strange… really strange about the above output? The new version of the query is limited to just 3 partitions, yet the calculated cost of the non-parallel execution is approximately 235,000 (much higher than the calculated cost of selecting from the full table without the WHERE clause) – that is just one reason why relying on cost for tuning performance purposes (as suggested by a couple of books) is potentially misleading. But wait, the plan using parallel execution with an automatic degree of parallelism of 16 has a plan cost of approximately 835,000 and the optimizer still selected that plan! I strongly suspect that a query optimizer bug caused a miscalculation of the plan cost; I could not reproduce the odd cost calculation on Oracle Database 11.2.0.1 or 11.2.0.2. Still suspicious of the cost in the execution plan, at the end of testing I dropped and recreated the table. After recreating the test table, the following execution plan was generated with a less crazy cost calculation:

I am curious about whether or not dynamic sampling might affect just the parallel execution. Let’s collect statistics on table T1 with 100% sampling (note that it could take a long time for this statistics collection to complete):

Notice in the above that the execution plan for the non-parallel execution has a calculated cost that decreased from 235,000 to just 1,816, even though the same Pstart (69) and Pstop (71) columns are listed, and approximately the same estimated number of rows (~2,286,000 vs ~2,531,000) are expected to be returned. So, if you are attempting to performance tune by attempting to simply reduce the calculated cost of a query (not a valid approach), you might be lead to believe that collecting statistics improved the performance of this query by a factor of 129.4 (the performance did improve a little due to dynamic sampling no longer being required, but the performance did not improve nearly that much).

Let’s try the query again with a modified value for the PARALLEL_DEGREE_POLICY parameter:

The Note section indicates that parallel execution was considered, but not used. So, after collecting statistics, the parallel degree for this query decreased from 16 (or 11 in the non-bug affected execution plan) to 1 – a non-parallel execution.

The above output was generated with Oracle Database 11.2.0.3 using the following system (CPU) statistics:

In the above execution plan, notice that the calculated cost is 4,082 (this is higher than the calculated cost returned by Oracle Database 11.2.0.3). The SREADTIM and MREADTIM system statistics were manually set to very small numbers – due to a bug in Oracle Database 11.2.0.1 and 11.2.0.2 these system statistics could be set to much large values, such as 2,491 and 9,163, respectively. What might happen to the calculated cost if we were to increase the SREADTIM and MREADTIM system statistics by a factor of 10? Such a change would indicate to the query optimizer that each disk access will required 10 times longer to complete. Should the calculated cost of this query increase or decrease after changing the system statistics?

Notice the changes in the execution plan? The calculated cost shown in the execution plan decreased from 4,082 to 3,020, and the %CPU for the third line in the execution plan decreased from 29 to 4. Why did the calculated cost decrease when the optimizer is informed that disk accesses require 10 times as long to complete? For the answer, take a look at this blog article written by Randolf Geist (another OakTable Network member).

—

OK, so I drifted a bit from the focus of the OTN message thread that provoked this article. What do you think caused parallel execution to be used when the OP executed the query found in the OTN thread?

Thank you for pointing that out. I completely missed the missing = sign (should have been <= 1000 and <= 100000).

One oddity down, a couple more to go. It might be interesting to see if anyone else is able to trigger/duplicate the 835K cost bug for the parallel query, where the parallel execution calculated cost was higher than the calculated cost of the serial execution, yet parallel execution was still used.

This is Mustafa KALAYCI who is the owner of OTN thread. this really surprised me! I was just looking for some oracle blogs and incidentally saw your article. Thanks for your tests, in my system I will also try to get some trace and also another dba team is talking with oracle about that.

Great that you found this blog article. The problem that you descriibed in the OTN thread is an interesting one, and I think that Tony Sleight has provided a couple of interesting possibilities to investigate.

First, how up to date are the statistics for the HAREKET_TABLE table (keep in mind that unless NO_INVALIDATE=>FALSE is specified when collecting the statistics, there could be a roughly 5 hour delay before the new statistics will result in a hard parse of a previously hard parsed SQL statement).

Let’s investigate Tony’s suggestion that PARALLEL_DEGREE_POLICY may have been adjusted at the session level, possibly by the tool that executed the SQL statement.

There are three views that might prove to be very useful in determining what is happening. As most readers of this blog likely know, the DESC command will list the columns and datatypes found in views and tables:

If we go back to my example, I still have a test session connected to the database. I need the SID of that session (as you very likely know, you can use V$SESSION to determine the SID of another session, but I will use V$MYSTAT since this is my SQL*Plus session):

SELECT
SID
FROM
V$MYSTAT
WHERE
ROWNUM=1;
SID
----------
96

Let’s take a look at the (majority of the) optimizer parameters for the session that affect the decisions made by the Oracle query optimizer (note that I have an arrow pointing at one of the rows):

In the above, we are able to see that the PARALLEL_DEGREE_POLICY for the session was adjusted to AUTO from the default parameter value that was in place when the database was brought online.

We are also able to see the parameters that were in effect when a SQL statement was optimized. In the above test case examples, there were two child cursors for SQL_ID 5bc0v4my7dvr5, one with the default value for the PARALLEL_DEGREE_POLICY parameter and one with the value of AUTO for that parameter. Let’s examine the optimizer parameters for this SQL_ID (again with an arrow pointing to the two rows that are of interest):

Having gone through the same tests as yourself on 11.2.0.3 I did not suffer from the 835K cost bug. In all my tests parallel execution was performed only when PARALLEL_DEGREE_POLICY was set to AUTO. The OP indicated that the parallel execution happened when TOAD or SQL Developer was being used. I can only assume that these applications set PARALLEL_DEGREE_POLICY to AUTO on a SESSION level. Perhaps it is for performance enhancement? If that is the case, then RESOURCE_IO_CALIBRATE$ must have been set. It would be interesting to see what the contents of RESOURCE_IO_CALIBRATE$ are on the OP system.

Incidentally, Charles, the second issue regarding PSTOP = 1048575 indicating 1,048,575 partitions. My execution plans showed the same, however, USER_TAB_PARTITIONS showed only 100 partitions as expected. As this is my first foray into partitions, I find it difficult to explain. I shall endeavour to investigate over the next few days.

Thanks Mustafa for a swift answer. As there is a record in RESOURCE_IO_CALIBRATE$ (created 28 March 2012), setting PARALLEL_DEGREE_POLICY on a session level to AUTO will enable parallel execution. Which is what Charles’ web post has shown.

I guess the procedure DBMS_RESOURCE_MANAGER.CALIBRATE_IO must have been executed for the table entry to be created as it is not the manual value suggested by Charles’ article.

I see, that leaves me clean out of new suggestions! Certainly the 10053 would show how the execution plan was reached.

From your post, I gather you are wishing to ensure parallel queries are not executed on your system? If so, the drastic way to ensure that would be to delete the existing record in RESOURCE_IO_CALIBRATE$.

A little further into the 10053 trace file we find the following block of text. Notice the calculated CPU cost of 20348604815 for the parallel degree (dop) of 1 – that number appears to be inconsistent with the Resc_cpu value (14984850150) above – I wonder if this is expected?

Notice in the above that the Pstop value is 7, not 1048575 as we saw with interval partitioning. Perhaps this is expected behavior?

(Edit: Aug 2, 2012, roughly 5 minutes after the comment was posted: the initial insert into table T3 failed – the insert statement had the HTML code for the less than (<) sign rather than the actual less than sign and I did not catch that error before posting the execution plan.)

Thanks Charles, that explains quite a lot. I think that with interval partitioning, Oracle does not know how many partitions will be created as it is a dynamic value dependent upon the inserts to the table and so the maximum allowed value is chosen. However, in your second example with fixed partitions, 7 were generated and that was fixed in stone for the pstop max value.

and as I mentioned in forum, I realized something that may be related to that subject. when I query V$DB_OBJECT_CACHE, I saw that, HAREKET_TABLE and all of it’s indexes are seems as “MULTI-VERSIONED OBJECT”

Hints for Posting Code Sections in Comments

********************
When the spacing of text in a comment section is important for readability (execution plans, PL/SQL blocks, SQL, SQL*Plus output, etc.) please use a <pre> tag before the code section and a </pre> tag after the code section:

<pre>

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;
TODAY
---------
01-MAR-12

</pre>
********************
When posting test case samples, it is much easier for people to reproduce the test case when the SQL*Plus line prefixes are not included - if possible, please remove those line prefixes. This:

SELECT
SYSDATE TODAY
FROM
DUAL;

Is easier to execute in a test case script than this:

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;

********************
Greater than and Less than signs in code sections are often interpretted as HTML formatting commands. Please replace these characters in the code sections with the HTML equivalents for these characters: