Sunday, December 7, 2008

Oracle's implementation of row level locking offers a high level of concurrency since you usually don't encounter blocking issues from concurrent modifications to different rows of a table, that could be caused e.g. by lock escalations (row locks to "page" or "block" locks, "block" locks to "table locks" etc.) that some DBMS products perform if too many low level locks are requested and a centralized lock management unit (usually a "lock manager") is unable to cope with the amount of locks. Since Oracle stores the "row level lock" information right in the block itself, it doesn't suffer from such scalability issues.

But there are certain scenarios in Oracle which still can lead to locking issues, even if different rows of a table are modified concurrently.

The following scenarios should be briefly outlined and demonstrated here along with possible remedies and supporting information how to identify if such an issue is encountered:

If multiple sessions attempt to write the same unique key but haven't yet completed the transactions, they will block each other, because depending on the completion of the transactions (commit or rollback) some might succeed or fail. This can be caused by updates, inserts and deletes.

There is no easy remedy in this case, and if this happens frequently then it could be called a design flaw of the application (or it is intended behaviour). One potential workaround could be the definition of the primary key as deferred, but this has other significant implications, e.g. direct-path inserts might not work (silently fall back to conventional inserts, depending on the Oracle version) and the cost based optimizer treats deferrable constraints differently which could cause execution plan changes.

Unindexed foreign keys and modifications to parent table primary keys

If referential constraints have been defined between tables and the foreign key columns in the child table are not indexed, then modifications to the parent table's key cause locks on the child table, and if the foreign key columns are not indexed this can cause contention.

SQL> -- and this in session 2SQL> delete from parent_table where object_type = 'OPERATOR';

1 row deleted.

SQL>

As you can see session 2 is now no longer blocked.

Insufficient block space and ITL slot shortage

If multiple transactions attempt to modify the same block, they can block each other if the following conditions are fulfilled:

- There is no free ITL ("Interested Transaction List") slot available. Oracle records the lock information right in the block and each transactions allocates an ITL entry.

- Insufficient space in the block left to add a new ITL slot. Since each ITL entry requires a couple of bytes a new one cannot be created if the block doesn't have sufficient free space.

The INITRANS and MAXTRANS settings of a segment control the initial and maximum number of ITL slots per block. The default of INITRANS in recent Oracle releases is 1 resp. 2 for indexes and the default value for MAXTRANS is 255 since the 10g release.

The following example demonstrates the issue. A block is almost full and several transactions attempt to manipulate different rows that all reside in this block.

SQL>SQL> -- increase the size of the rowsSQL> -- so that no space is left in the blockSQL> update t1 set c1 = rpad(c1, 1335, c1) 2 where length(c1) = 1;

6 rows updated.

SQL>SQL> commit;

Commit complete.

SQL>SQL> select substr(c1,1,1) from t1 where c1 like 'a%' for update nowait;

S-a

SQL> -- second sessionSQL> select substr(c1,1,1) from t1 where c1 like 'b%' for update nowait;

S-b

SQL>

SQL> -- third sessionSQL> select substr(c1,1,1) from t1 where c1 like 'c%' for update nowait;select substr(c1,1,1) from t1 where c1 like 'c%' for update nowait *ERROR at line 1:ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

The remedy in this case requires a rebuild of the table segment using a higher INITRANS setting, so that initially more ITL slots are created. Note this reduces the space available for row data, so the size of the segment might increase.

Note that a rebuild using an appropriate PCTFREE setting also alleviates the problem but still a unsuitable row growth of rows by updates can again cause the issue. So the only change that helps independently of the row growth is an appropriate INITRANS setting.

Bitmap indexes

A single bitmap index entry can cover a lot of rows (rowid ranges), this is one of reasons why a bitmap index can be much smaller than a corresponding b*tree index.

But in terms of concurrency this can cause issues if different sessions attempt to modify the same bitmap index entry:

This scenario only applies to pre-10g databases where the MAXTRANS setting can be set to a lower value than the default of 255. 10g and later ignore the MAXTRANS value if specified and use 255 anyway.

The issue is basically the same as in the previous case: An ITL slot needs to be allocated, but cannot be created. In this case this is due to the artificial limitation caused by the low MAXTRANS setting.

Monday, November 3, 2008

When enforcing a primary key or unique constraint using a non-unique index 11.1.0.6 allows to use direct-path inserts (append mode) in contrast to 10.2.0.4 which silently falls back to conventional inserts in this particular case. For a demonstration, see here.

But this enhancement in 11.1.0.6 seems to allow a situation where the enabled and validated constraint can be violated without any error message. This looks like a bug. If you simply set the non-unique index to "unusable" and then attempt to insert duplicates using direct-path insert mode you'll actually succeed and therefore end up with duplicate data in your table although you have a enabled primary key/unique constraint.

SQL> SQL> -- this generates an error ORA-12838SQL> -- and therefore shows that thisSQL> -- was a direct-path insertSQL> select * from append_test;select * from append_test *ERROR at line 1:ORA-12838: cannot read/modify an object after modifying it in parallel

SQL> SQL> -- now we have non-unique dataSQL> -- in the tableSQL> -- although the primary keySQL> -- constraint is enabled andSQL> -- validatedSQL> commit;

Wednesday, October 1, 2008

A bitmap join index is a special kind of index as it contains values of columns from table(s) joined to a table but holds bitmaps that point to the table being joined to. So it allows under certain circumstances to avoid join operations from taking place if you are restricting your results on your table by filtering on an attribute of a joined table.

For further information regarding bitmap (join) indexes you may want to have look at the following note by Jonathan Lewis that contains links to three Word documents that describe the basic principles of bitmap (join) indexes.

As can be seen in Jonathan's document Oracle internally supports these bitmap join indexes through various constructs - amongst these a "virtual column" is added to the column list of the "indexed" table in the data dictionary that represents the values from the joined table obviously based on the data stored in the index.

So in theory gathering statistics on this "virtual column" would allow the optimizer to get a very good estimate for the number of rows that correspond to a particular value of the attribute of the joined table, which would be very helpful especially if the "joined" value distribution of this attribute is skewed.

In Data Warehouses sometimes rows of a fact table are assigned to a special value of a dimension, like "deleted", "not mapped", "default" etc. which might become at some point a predominant value when joining the dimension to the fact table. So without the additional information stored in the bitmap join index the optimizer at parse time can hardly tell from the normal table and column statistics available how many rows a join will return if you filter on an particular attribute of a dimension table. Although the statistics can tell quite precisely how many rows of the dimension table will satisfy the filter condition, this doesn't say anything about the number of rows of the fact table that will join to the corresponding primary key values of the filtered dimension table.

However, using the data that is stored in the bitmap join index this information is available as the index can tell exactly how many rows of the fact table correspond to a particular value of an attribute of a dimension table.

Unfortunately all tested Oracle versions (9.2.0.8, 10.2.0.4 and 11.1.0.6) do not allow to gather statistics on these virtual columns that are corresponding to bitmap join indexes, and even if manually crafted statistics including histograms are applied to the virtual column by using "dbms_stats.set_column_stats" the optimizer does not consider them for cardinality estimated of the corresponding row sources.

So in future this might be a useful extension of the statistics framework to fully leverage the potential power of bitmap join indexes.

A small test case run on 11.1.0.6 (Windows 32-bit) shall demonstrate the issue. Note that 9.2.0.8 and 10.2.0.4 show similar results regarding the optimizer estimates.

First suitable tables are created that represent a fact table and its corresponding dimension table.

As we can see the bitmap join index is used to avoid the join operation but the optimizer estimate is still the average value, although the bitmap join index information could be used to determine the skew.

Now let's try to get statistics information on the "virtual" column created for the bitmap join index.

Interestingly the custom column statistics for the "virtual" column are accepted by DBMS_STATS.SET_COLUMN_STATS although DBMS_STATS.GATHER_TABLE_STATS is not able to analyze this column.

But no difference regarding the estimates can be seen even with a suitable histogram in place. This shows that the optimizer doesn't evaluate the information available from the "virtual" column of the bitmap join index.

Finally let's see what happens when querying one of the other dimension values.

The estimate is still the same (the average of 100), so in summary the optimizer is not (yet) able to take advantage of the potential added value represented by the "virtual" column created for the bitmap join index.

Monday, September 1, 2008

When attempting to display only the first n rows of a result set that is ordered the Oracle optimizer offers special optimizations for these so-called top-n queries. These optimizations are based on the fact that Oracle in this case only needs to order the top n rows of the result set rather than the complete set which can make a significant difference if the result set is large. If the optimizer was able to recognize this you can see it in the execution plan: It shows as "SORT ORDER BY STOPKEY" operation.

Tom Kyte has written some nice notes about these top n and pagination queries. You can find the details here and here.

If you have created a suitable index Oracle also can use this index to avoid a sort operation from taking place by using the index to retrieve the rows already in the requested sorted order without an additional sort operation required. You can notice this in the execution plan by the absence of any ORDER BY operation although you specified a ORDER BY in your statement.

Note however in case char columns are used to order the result set depending on your session's NLS_SORT setting a normal index might not be used, since the index is by default based on the "NLS_SORT = binary" sort order, whereas your session might use a different sort order.

There are two options in this case to still take advantage of the NOSORT operation: Use the "NLS_SORT = binary" setting in your session, or create a function based index using the NLSSORT function. This should work fine if all resp. most of your clients share the same NLS_SORT session setting, but keep in mind that it is a client controlled setting and therefore theoretically could be different for each of your clients.

A short sample script run on 10.2.0.4 (32-bit Windows) shall demonstrate the issue.

Elapsed: 00:00:00.01SQL> SQL> set autotrace traceonlySQL> SQL> -- this apparently should use the index nosort_order_by_test_ix1 but it doesn't due to the NLS_SORT settingSQL> select * from (select * from nosort_order_by_test order by owner desc, object_name) where rownum <= 10;

Elapsed: 00:00:00.01SQL> SQL> -- now this works as supposed using the function based index nosort_order_by_test_ix2SQL> select * from (select * from nosort_order_by_test order by owner desc, object_name) where rownum <= 10;

Note that the first attempt to use the index fails due to the non-default NLS_SORT session setting.Although the SORT ORDER BY STOPKEY operation is used we still scan the whole table rather than taking advantage of the index.

If the NLS_SORT is set to the default value of "binary" the index is going to be used.

If you have a common, non-default setting of NLS_SORT you can still use an index by creating a suitable function based index.

Note that similar issues regarding index usage can arise when using non-default NLS_COMP session settings. See here a note by Jonathan Lewis covering this and some other issues.

Monday, August 4, 2008

Oracle's exchange partition feature is a handy method for loading mass data. Data is usually being loaded into an "exchange" table that mirrors the attributes of a single partition or subpartition of the actual table.

Before 10g it probably didn't matter if you gathered statistics on the exchange table before performing the exchange or afterwards on the actual table, because the "SIZE AUTO" option of the method_opt parameter that was already in 9i was rarely used.

But starting with Oracle 10g the "SIZE AUTO" option is used as default when gathering table resp. column statistics. "SIZE AUTO" works in a two-level approach: In the first step it uses "column usage monitoring", another feature introduced in Oracle 9i to gather the information which columns potentially could benefit from histograms as they are used in WHERE clauses with certain kind of predicates, e.g. equal comparisons. In the next step a histogram is generated to detect if the column values are skewed or have significant gaps (which raised an interesting question about what are "gaps" in VARCHAR based data), and if they do, the histogram is stored in the statistics (this part corresponds to the "SIZE SKEWONLY" option behaviour).

So if you want to take advantage of new default "SIZE AUTO" feature, it is crucial that the statistics are gathered on the actual table because otherwise the "column usage monitoring" won't work as expected. It needs the workload of the actual table, not the one of the exchange table.

As already mentioned, from 10g on the SIZE AUTO option is the default option if you haven't changed it using DBMS_STATS.SET_PARAM resp. DBMS_STATS.SET_*_PREFS in 11g.

SQL> SQL> -- flush the monitoring info, so we can see it in col_usage$SQL> -- otherwise it may take up to 15-20 minutes to see the latest changesSQL> -- DBMS_STATS.GATHER_*_STATS will flush this as wellSQL> exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO

SQL> SQL> -- So if you exchange now you won't benefitSQL> -- from any histograms potentially being generatedSQL> -- due to column usage and skewnessSQL> alter table size_auto_test exchange subpartition pkey_1_101 with table exchange_test;

So it can be seen from this sample that gathering statistics on the exchange table uses the workload of the exchange table and therefore the SIZE AUTO option doesn't work as expected. Gathering statistics on the actual partitioned table makes use the column workload information and generates histograms based on both workload and skewness.

Another interesting observation is that the column usage in sys.col_usage$ is maintained on table level rather on partition or subpartition level. Although I've explicitly queried on subpartition level the monitoring info refers to the table object and not to the (sub-)partition subobjects. This means that if your data in the partitions is used differently by queries depending on the way you gather statistics you might end up with histograms on particular partitions that are unnecessary. This can have an impact on both the time it takes to gather the statistics as for each histogram a separate gather step resp. query is required and the execution plans and cardinality estimates can be influenced by the existing histograms.

Tuesday, July 1, 2008

If you check the official Oracle documentation regarding the restrictions of direct path inserts you'll find the following information (taken from the "SQL Language Reference - INSERT" 11gR1 manuals):

"Direct-path INSERT is subject to a number of restrictions. If any of these restrictions is violated, then Oracle Database executes conventional INSERT serially without returning any message, unless otherwise noted:

You can have multiple direct-path INSERT statements in a single transaction, with or without other DML statements. However, after one DML statement alters a particular table, partition, or index, no other DML statement in the transaction can access that table, partition, or index.

Queries that access the same table, partition, or index are allowed before the direct-path INSERT statement, but not after it.

If any serial or parallel statement attempts to access a table that has already been modified by a direct-path INSERT in the same transaction, then the database returns an error and rejects the statement.

The target table cannot be index organized or part of a cluster.

The target table cannot contain object type columns.

The target table cannot have any triggers or referential integrity constraints defined on it.

The target table cannot be replicated.

A transaction containing a direct-path INSERT statement cannot be or become distributed."

"The following restrictions apply to parallel DML (including direct-path INSERT):

Intra-partition parallelism for UPDATE, MERGE, and DELETE operations require that the COMPATIBLE initialization parameter be set to 9.2 or greater.

The INSERTVALUES statement is never parallelized.

A transaction can contain multiple parallel DML statements that modify different tables, but after a parallel DML statement modifies a table, no subsequent serial or parallel statement (DML or query) can access the same table again in that transaction.

This restriction also exists after a serial direct-path INSERT statement: no subsequent SQL statement (DML or query) can access the modified table during that transaction.

Queries that access the same table are allowed before a parallel DML or direct-path INSERT statement, but not after.

Any serial or parallel statements attempting to access a table that has already been modified by a parallel UPDATE, DELETE, or MERGE, or a direct-path INSERT during the same transaction are rejected with an error message.

Parallel DML operations cannot be done on tables with triggers.

Replication functionality is not supported for parallel DML.

Parallel DML cannot occur in the presence of certain constraints: self-referential integrity, delete cascade, and deferred integrity. In addition, for direct-path INSERT, there is no support for any referential integrity.

Parallel DML can be done on tables with object columns provided you are not touching the object columns.

Parallel DML can be done on tables with LOB columns provided the table is partitioned. However, intra-partition parallelism is not supported.

A transaction involved in a parallel DML operation cannot be or become a distributed transaction.

Clustered tables are not supported.

Parallel UPDATE, DELETE, and MERGE operations are not supported for temporary tables.

Violations of these restrictions cause the statement to execute serially without warnings or error messages (except for the restriction on statements accessing the same table in a transaction, which can cause error messages). For example, an update is serialized if it is on a nonpartitioned table."

I think there is still at least one restriction missing from the list, which will throw this error message:

ORA-12839: cannot modify an object in parallel after modifying it

Cause: Within the same transaction, an attempt was made to perform parallel modification operations on a table after it had been modified. This is not permitted.

Action: Rewrite the transaction or break it up into two transactions: one containing the parallel modification and the second containing the initial modification operation.

This means that you are not allowed to perform parallel DML on an object that has already been modified within the same transaction (by serial DML). Probably this can be explained by the fact that the parallel slave sessions are not able to see the modifications already applied to the object in the main session.

Interestingly this restriction does not apply to serial direct path inserts, these can be performed after an object has been modified by serial non-direct path DML.

Thinking about the fact that an object that has been modified by direct-path or parallel DML can not be read/modified afterwards within the same transaction it is quite obvious that a deferrable constraint can not be enforced in this mode, because I assume that in order to check the constraint the object has to be re-read and that is not possible before a commit has taken place.

So if there are any deferrable constraints enabled on the object to be modified no direct path or parallel dml operation will be performed, they will silently fall back to serial conventional dml.

So this small script shall demonstrate the latter two points. First the results from 10.2.0.4:

It's quite interesting to note that in 10.2.0.4 the direct path insert didn't work when the primary key was enforced using a non-unique index, although the constraint was not deferrable, whereas in 11.1.0.6 the direct path insert mode was used in this case, which suggests that the 10.2.0.4 behaviour wasn't actually a feature.

Friday, June 27, 2008

I'm starting here a loose series of applying the test cases used in the book "Cost-Based Oracle - Fundamentals" by Jonathan Lewis to 10.2.0.4 and 11.1.0.6.

The book covers 8i (8.1.7.4), 9iR2 (9.2.0.6) and 10gR1 (10.1.0.4), so I was keen on running the same tests on 10gR2 and 11gR1.

In the course of doing this I'll attempt to point out if there are any results that are noteworthy.

Before I begin here are the preparation steps and configuration details required to reproduce the test cases.

The INIT.ORA parameters relevant to the optimizer used for 10.2.0.4 and 11.1.0.6 have been taken from the sample INIT.ORAs provided as part of test case code depot. The INIT.ORAs used look like the following:

So no surprises here, works as expected, and the results correspond to those of 9iR2 and 10gR1.

The next script comes from chapter 1 ("What do you mean by cost?") and is called "first_rows.sql". It shows potential problems that may arise from the approach the optimizer takes when using the (deprecated) FIRST_ROWS optimizer mode.

remremִScript:ִִִfirst_rows.sqlremִAuthor:ִִִJonathanִLewisremִDated:ִִִִJuneִ2002remִPurpose:ִִProblemsִwithִFIRST_ROWSִoptimisationrem Adapted to 10g and 11g by Randolf GeistremremִLastִtestedִremִִִ11.1.0.6remִִִ10.2.0.4remִִִ10.1.0.4remִִִִ9.2.0.6remִִִִ8.1.7.4remremִTheִFIRST_ROWSִoptimizerִdoesִnotִhaveִaִsensibleִstrategyremִforִdealingִwithִoptimisationִofִanִORDERִBYִ(andִpossiblyremִotherִvariantsִsuchִasִGROUPִBY)ִwhenִitִfindsִitִcanִuseremִanִindexִtoִdoִtheִORDERִBYִ"freeִofִcharge".remremִThisִexampleִdemonstratesִtheִpoint.ִWithoutִtheִin-lineremִview,ִOracleִdoesִaִfullִscanִonִtheִprimaryִkeyִindexִtoremִreturnִtheִ100ִrequiredִrowsִinִorder,ִatִaִtotalִcostremִandִtotalִexecutionִtimeִthatִisִmuchִhigherִthanִthatִofremִusingִtheִrequiredִindex.remremִOfִcourse,ִitִisִarguablyִtheִcaseִthatִthisִisִcorrectremִbehaviourִifִweִassumeִthatִtheִtimeִtoִtheִfirstִrowִisremִimportant,ִandִthereforeִweִavoidִcollectingִaִlargeִnumberremִofִrowsִandִsortingִthem.ִִInִpractice,ִthisִisִperhapsִnotremִreallyִlikelyִtoִbeִtheִcase.remremִBugִnumberִ670328ִappliesremremִParameterִ_sort_elimination_cost_ratioִaffectsִtheִbreakremִpointִbetweenִoptimisingִforִtheִWHEREִclauseִandִoptimisingremִforִtheִORDERִBYִclause.remremִIfִtheִparameterִisִzero,ִtheִORDERִBYִclauseִtakesִprecedenceremִIfִtheִparameterִisִnon-zero,ִthenִtheִindexִisִusedִifִtheִcostִremִofִusingִitִisִlessִthanִtheִcostִofִdoingִtheִsortִmultipliedִremִbyִtheִvalueִofִtheִparameter.remremִSpecialִnoteִforִ10g:remִTheִparameterִstillִhasִtheִsameִeffectִinִgeneral,ִbutִzeroremִnowִmeansִzero,ִnotִ(effectively)ִinfinity.ִִTheִdefaultִofremִzeroִwillִnowִeffectivelyִignoreִtheִindexִoptionִunlessִitremִisִactuallyִcheaperִthanִtheִnon-indexִoption.ִAִnon-zeroremִvalueִwillִbehaveִasִitִalwaysִusedִtorem

The results in general correspond to those of the previous versions, but one interesting oddity showed up when I did the first test runs. The cost reported by my 10gR2 and 11gR1 systems for the query variations performing an ORDER BY was significantly higher than what is shown in the results that have been provided by Jonathan as part of the code depot.

It looks like the difference comes from the higher cost estimated for the ORDER BY sort. Further investigations showed that Jonathan probably performed the test runs using manual workarea_size_policy with 1M sort_area_size whereas I did the first test runs using the instance default setting of an pga_aggregate_target of 200M (see INIT.ORA parameters above).

The odd thing is that when turning CPU costing off ("_optimizer_cost_model"="io") then for that particular test case performing that tiny sort operation the cost of the sort is actually increasing when the amount of memory available for sorting is increasing. That means in other words, the higher the amount memory available the higher the cost of the small sort operation, due to the fact that the "IO cost per pass" in the optimizer trace file (10053 event) is higher if more memory is available and therefore the final IO cost of the sort is, too.

I was able to reproduce this strange behaviour in 9.2.0.8, 10.2.0.4 and 11.1.0.6. Due to the higher total cost when running the test case with more memory available I had to adjust the "_sort_elimination_cost_ratio" parameter accordingly used in the "first_rows.sql" example to get the same results. See the script for further explanation of this parameter.

All other parameters that I tried (using manual/automatic pga sizing, "all_rows" optimizer_mode instead of "first_rows") didn't change the outcome. When I was using "io" cost model I got in this particular case a consistent increase in I/O cost per pass when increasing the memory available for sorting.

I won't put further effort on this issue for the time being as there is a whole chapter (chapter 13) coming up later that covers sorting related issues in depth, so I'll see if I'm able to dig further there.

The final script from chapter 1 is called "view_merge_01.sql" and demonstrates the complex view merging transformation that can happen to your statement as part of the optimisation steps.

Conclusion for part 1: For first three scripts I couldn't detect any significant differences to the previous versions. A small oddity showed up regarding the I/O cost of a tiny sort operation when disabling the cpu_costing, but that could also be reproduced in 9.2.0.8, so that's not an actual difference to previous versions.

Because I have to use this silly dot character instead of space in order to keep a reasonable formatting in this blogger environment, I'll provide an updated code depot that contains the scripts and results I used for this series.

You can download it from here: My homepage (SQLTools++, an open source lightweight SQL Oracle GUI for Windows)

I'll update it from time to time when I do significant changes, and each time I add a new part to the series you'll find the updated code depot there as well.

The original code depot (still maintained by Jonathan) can be found here.