Tuesday, February 25, 2020

Starting with version 12c Oracle obviously has introduced another parallel distribution method for direct path loads (applicable to INSERT APPEND and CTAS operations) when dealing with partitioned objects.

As you might already know, starting with version 11.2 Oracle supported a new variation of the PQ_DISTRIBUTE hint allowing more control how data gets distributed for the actual DML load step. In addition to the already documented methods (NONE, RANDOM / RANDOM_LOCAL, PARTITION) there is a new one EQUIPART which obviously only applies to scenarios where both, source and target table are equi partitioned.

In principle it looks like a "full-partition wise load", where the PX partition granule gets used as chunking method and each PX slave reads from the partition to process from source and writes into the corresponding partition of target. Therefore it doesn't require a redistribution of data and uses only a single PX slave set. Depending on the skew (partitions of different data volume) this might not be the best choice, but for massive data loads with evenly sized partitions it might give some advantage over the other distribution methods - the NONE distribution method being the closest, because it doesn't require additional redistribution either - but here all PX slaves read and write from any partition, so potentially there could be more contention.

Of course this new distributed method works only for the special case of equi partitioned source and target tables - and according to my tests only for the simple case of loading from the source table with no further operations like joins etc. involved.

Monday, January 13, 2020

I do have a demo as part of my optimizer related workshops that shows the
restriction / limitation of DBMS_STATS not supporting extended
statistics on virtual columns / group of expressions, so for
example the combination of both expressions and column groups,
like ((TRUNC(COL1)), (TRUNC(COL2))).

Surprisingly, when following a certain sequence of operation,
this starts working (to some degree) from 12.2.0.1 on.

The official documentation up to and including 19c still mentions
this as a restriction, and since it doesn't work when explicitly
referencing virtual columns (see the test case what I exactly mean by
this) I assume this is more like a side effect / unintended
feature.

Nevertheless, the optimizer happily picks up this additional information and comes up with improved estimates when having a combination of skew and correlation on expressions, for example.

The following test case shows the change in behaviour from 12.2.0.1 on:

set echo on linesize 200 trimspool on trimout on tab off pagesize 999 timing on
alter session set nls_language = american;
drop table t1;
purge table t1;
-- Initialize the random generator for "reproducible" pseudo-randomness
exec dbms_random.seed(0)
-- ATTR1 and ATTR2 are both skewed and correlated
create table t1
as
select
rownum as id
, trunc(dbms_random.value(1, 1000000000000)) as fk
, case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr1
, case when rownum <= 900000 then 1 else ceil(rownum / 10) end as attr2
, rpad('x', 100) as filler
from
dual
connect by
level <= 1000000
;
-- Histograms on ATTR1 and ATTR2 for representing skew properly
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2')
-- In addition histograms / extended stats on (ATTR1, ATTR2) for covering both skew and correlation
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (attr1, attr2)')
-- Assuming a query expression TRUNC(ATTR1) = x and TRUNC(ATTR2) = x create a histogram on each expression separately for representing skew properly
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc(attr1)), (trunc(attr2))')
-- In addition histogram / extended stats on (TRUNC(ATTR1), TRUNC(ATTR2)) for covering both skew and correlation on the expressions
-- Officially this is isn't allowed and errors out up to and including 12.1.0.2
-- ORA-20001: Error when processing extension - missing right parenthesis
-- Starting with 12.2.0.1 this is successful, but documentation still lists this as restriction, including 19c
-- But: It's only supported when there is a corresponding virtual column already existing for each expression used - so Oracle can map this expression back to a virtual column
-- Remove the previous step and it will error out:
-- ORA-20001: Invalid Extension: Column group can contain only columns seperated by comma
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))')
-- Although it doesn't look like officially supported / possibly just a side effect of some bug fix, the optimizer picks up the information properly
-- Pre 12.2.0.1 don't recognize the correlation and show approx. 800K rows as estimate
-- 12.2.0.1 and later with the column group on the expression in place show approx. 900K rows as estimate which is (almost) correct
explain plan for
select
count(*)
from
t1 a
where
trunc(attr1) = 1
and trunc(attr2) = 1;
select * from table(dbms_xplan.display(format => 'TYPICAL'));
-- But: Explicitly referencing a virtual column doesn't work
-- This will error out:
-- ORA-20001: Error when processing extension - virtual column is referenced in a column expression
exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr1))')
exec dbms_stats.drop_extended_stats(null, 't1', '(trunc(attr2))')
-- Remove this and the extended stats on the virtual columns below will work in 12.2.0.1 and later (because the extension still exists from above call)
exec dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))')
alter table t1 add (trunc_attr1 as (trunc(attr1)));
alter table t1 add (trunc_attr2 as (trunc(attr2)));
-- This works and is supported in all versions supporting virtual columns
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, trunc_attr1, trunc_attr2')
-- This errors out: ORA-20001: Error when processing extension - virtual column is referenced in a column expression
-- even in 12.2.0.1 and later
-- But: Works in 12.2.0.1 and later if the call to
-- dbms_stats.drop_extended_stats(null, 't1', '((trunc(attr1)), (trunc(attr2)))')
-- above is removed, because the extension then already exists (!)
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, (trunc_attr1, trunc_attr2)')
-- But: This works again from 12.2.0.1 on, not explicitly referencing the virtual columns but using the expressions covered by the virtual columns instead
exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1 for columns size 254 attr1, attr2, ((trunc(attr1)), (trunc(attr2)))')

So, from 12.2.0.1 it looks like extended statistics on virtual columns are supported to some degree, when following a certain sequence of operation - first creating extended statistics on each of the expressions used, which creates corresponding virtual columns under the cover, and afterwards creating extended statistics using those expressions as part of the column group expression.

Strange enough, when explicitly creating virtual columns for those expressions creating a column group explicitly referencing those virtual columns doesn't work - but using the expressions covered by the virtual columns instead works even with those virtual columns created explicitly.