Patrick's Oracle DBA Blog

Post navigation

We’ve been experiencing some issues with multiple child cursors for a given SQL statement, and I’ve just spent some time working on building a reproducible testcase of one problem that I thought I’d share in the hopes of documenting the behavior.

It is a python script connecting to an Oracle Database (tested against 12.2 and 19.6), so requires the cx_Oracle Python module.

I’m not sure why it’s necessary to close and re-open the connection between the first three executions (when it becomes bind aware) and the subsequent executions, but it is. If both the nchar columns are char (or nvarchar2) the issue doesn’t reproduce.

It seems the database is not correctly handling the Cursor Selectivity Cubes:

We upgraded our first database from 12.2 to 19c last week, and encountered a nasty issue with ORDS. Credit goes to my colleagues Mingda Lu and Au Chun Kei for doing the hard work in understanding what was causing the issue.

The error stacks from ORDS seems to gives some clues as to what’s going on.

WARNING: The database user for the connection pool named |apex|pu|, is not authorized to proxy to the schema named HR
oracle.dbtools.common.jdbc.ConnectionPoolConfigurationException: The database user for the connection pool named |apex|pu|, is not authorized to proxy to the schema named HR
at oracle.dbtools.common.jdbc.ConnectionPoolExceptions.from(ConnectionPoolExceptions.java:46)
at oracle.dbtools.common.jdbc.ConnectionPoolExceptions.from(ConnectionPoolExceptions.java:53)

Caused by: oracle.dbtools.common.ucp.ConnectionLabelingException: Error occurred when attempting to configure url: jdbc:oracle:thin:@//localhost:1521/orcl.ords with labels: {oracle.dbtools.jdbc.label.schema=HR}
at oracle.dbtools.common.ucp.LabelingCallback.handle(LabelingCallback.java:147)
at oracle.dbtools.common.ucp.LabelingCallback.proxyToSchema(LabelingCallback.java:210)
at oracle.dbtools.common.ucp.LabelingCallback.configure(LabelingCallback.java:76)

From our testing, any service with a name with the format <pdb_name>.<any_text> exhibits the problem. We have used service names with such a format in 12.2 without issues, so it seems this is new behaviour introduced in 18c or 19c.

We’ve also noticed that when checking v$services, the value for con_id for the ‘problem’ service is 1 which may give a clue as to what’s going on, although it only seems to cause a problem for ORDS.

This is issue I always hit settitng up a test environment using Oracle REST Data Services (ORDS). Using SYSTEM user to call ords.enable_schema throws ORA-06598. According to the documentation it should succeed (note the SYSTEM user has DBA role) .

Only database users with the DBA role can enable or disable a schema other than their own.

Warning please don’t blindly follow the steps here without doing your own analysis of the risks involved, and ideally without getting Oracle support involved, I was hesitant to publish this, but as I’ve been in contact with someone else and it’s helped them workaround (to some extent) an issue they have been having I think it’s worth putting out there.

We had a problem during a dataguard switch-over (luckily planned switch-over for patching rather than a disaster situation) where Grid Infrastructure (clusterware) was unable to bring up one of the databases, it kept throwing “ORA-01017: invalid username/password”. Starting the database the ‘traditional way’ using “sqlplus / as sysdba” had no such problems.

Reviewing Oracle Support, particularly Doc ID 2313555.1 we identified some non-standard configuration in the Oracle home used for this database, but even after resolving them, the error persisted.

At times like these you realize (or at least I did) how little is published about the internals of how clusterware and the oracle databases it manages interact.

I suspected that restarting the entire clusterware stack would resolve the issue but that was difficult as this node also managed a production database which we didn’t want to take down.

However I guessed that restarting the clusterware agent for the oracle user might fix the problem. The executable is oraagant.bin and the process owner is oracle. I believe this is the process clusterware uses to actually start the database (You’ll also probably notice a similar process owned by grid and orarootagent.bin running as root).

I killed the oracle agent process and crossed my fingers. Luckily clusterware re-spawned this process and afterwards we were able to restart the problem instance without any problems.

Please re-read the first paragraph if you are considering to apply this work-around, and don’t blame me if you break anything, if it helps though I’m happy to take the credit!

This is something to file under the (admittedly rather large) category of things that I wasn’t aware that the Oracle database could do.

While tuning a query, I wanted to use a common technique of adding fields to an index to eliminate a “Table Access by Index RowID” operation, however this particular case was a complicated by the fact that the index was supporting the primary key, and the table was large and frequently accessed.

This is probably easiest demonstrated by the (much simplified) example below:

Now it is possible for a primary key to be supported by I_SINGLES_COVERING, but initially I thought I’d have to choose between dropping and re-creating the primary key to use the new index, or leaving the system in the non-optimal state of having the two indexes.

However I came across this blog post from Richard Foote, which referenced another post from Jonathan Lewis. It describes the following technique of modifying the constraint to use the new index without needing to re-recreate it. It’s worth noting that the index SINGLES_PK that the database automatically created to initially support the primary key gets dropped during this operation.

One thing I observed my testing was that if I created i_singles_covering as a unique index (id is unique as it’s the primary key, so obviously combination of id & artist must also be unique) then the database was unwilling to use this index to support the primary key:

SQL> create unique index i_singles_covering on singles(id, artist);
Index I_SINGLES_COVERING created.
SQL> alter table singles
2 modify constraint singles_pk
3 using index i_singles_covering;
ORA-14196: Specified index cannot be used to enforce the constraint.
14196. 00000 - "Specified index cannot be used to enforce the constraint."
*Cause: The index specified to enforce the constraint is unsuitable
for the purpose.
*Action: Specify a suitable index or allow one to be built automatically.

This case is documented by Oracle Support Document ID 577253.1 which states:

We cannot use a prefix of a unique index to enforce a unique constraint. We can use a whole unique index or a prefix of a non-unique index to do that. This is the way Oracle was designed.

However I can’t off-hand think of any technical reason for this limitation.

A tip I picked up from Nigel Bayliss regards purging SQL Plan Directives, I’ve been using it a lot recently and can’t see it documented elsewhere.

As some background these records, exposed via the DBA_SQL_PLAN_DIRECTIVES view, are cardinality corrections created and used when running with Adaptive Statistics enabled. There is a job that should automatically purge all records unused for longer than the value of SPD_RETENTION_WEEKS, but we’ve experienced occasions when this job doesn’t work as expected.

I’ll start this blog-post by posing a question. Is it possible to have multiple records in v$sql for a given sql_id and child_number combination? While the title of this blog post may give you some clues, I’ll admit I’d always assumed that those values uniquely identified a child cursor.

As a bit of background we had a database availability situation this week, which we narrowed down to SGA issues, specifically bug 15881004 “Excessive SGA memory usage with Extended Cursor Sharing”. Some of our more complex SQL Statements were getting many (more than 700) child cursors. The reported reason for the child was “Bind mismatch(33)”. Probably caused by bug 14176247 “Many child cursors using Adaptive Cursor Sharing with binds (due to BIND_EQUIV_FAILURE)”, although that is listed as fixed in 12.1 and this instance is running on 12.2.

We resolved the immediate issue by flushing the shared pool (admittedly not a great solution, but sometimes you got to do what you got to do), and created SQL Plan Baselines for those problem SQL statements so they would each just get one plan and child cursor.

We plan to monitor more closely for any SQL statements that do have many child cursors, however we need to make sure that even if that does happen it doesn’t break the system again. One thing that seemed promising is the _cursor_obsolete_threshold parameter. We had already reduced this parameter down to 1024 from it’s default of 8192 based on Mike Deitrich’s blog post but with this incident were considering reducing it further. I think it’s wise to be wary of messing too much with underscore parameters but per Doc ID 2431353.1 Oracle Support say “the … parameter can be adjusted case-to-case basis should there be a problem”. For sure we had a significant problem with the setting at 1024 so plan to reduce further to 512.

We involved super consultant Stefan Koehler to review our findings and action plan, he was broadly in agreement, even recommending further reduction of the parameter value to 256. However something puzzling me which I asked him was “What actually happens if the number of child cursors hit the value specified by this parameter”. His answer “Well what happens is this … if your parent cursor got more than _cursor_obsolete_threshold child cursors it invalidates the parent (and in consequence all childs) and it starts from 0 again”

I was skeptical, my expectation was that Oracle would just invalidate the oldest unused child cursor and then re-use that child number. Another thing puzzling me was happens if some of the child cursors were still held open? Time to test this out for myself…

First let me demonstrate how I can get 4 child cursors for a given SQL Statement using different values of optimizer_index_cost_adj as a quick hack.

Whoah…. each combination of sql_id and child number has two entries (not what I was expecting to see). To get a more full picture we need to look at a couple of additional fields, namely ‘address’ and ‘is_obsolete’.

Although we tend to use sql_id as our handle for the parent cursor, Oracle actually uses the ‘Address’ field, and when the _cursor_obsolete_threshold value is exceeded, Oracle allocates a new parent cursor with a new ‘Address’. This explains how Oracle copes when old child cursors are held open, they still stay in the shared pool, keeping their address, but are marked as obsolete, able to be aged out when they are no longer in use.

The other lessons here, firstly that Stefan knows his stuff, but also whenever someone tells you something, don’t just take it on trust, it’s normally easy to validate for yourself, and you may learn something about how Oracle works along the way

Recently I’ve been working with SQL Server and while it’s not all bad, sometimes it does helps to highlight some of the neat features available in Oracle. One of these is IMPLICIT cursors, which I shall demonstrate.

First I’ll show how to populate some data in a table and then loop over it in TSQL (SQL Server’s equivalent of PL/SQL):

By the way, note the neat way it’s possible to insert 3 records with a single INSERT statement. I didn’t say there weren’t some things that SQL Server does a little better 🙂

Next check out the equivalent SQL statements and PL/SQL code in the Oracle Database. Note my Oracle demos are running on an Autonomous Transaction Processing Database in Oracle Cloud although should work in all versions including Oracle XE, the free to use database.

Already I prefer a few things about the Oracle solution. The ability to use a cursor %ROWTYPE rather than having to define and use variables for individual columns, the fact there is only one fetch command required and the use of the %NOTFOUND cursor attribute rather than the somewhat arbitrary @@FETCHSTATUS == 0 check.

A few things to note. We’re down from 15 to 8 lines of code which makes this easier to write, and just as importantly with less chance of bugs. No need to worry about defining rowtypes, or opening or closing cursors, Oracle just does the right thing under the covers including tidying up in case exceptions are thrown.

I’ve been playing around with researching ORDS over the summer particularly trying to optimize performance on Tomcat. Trying something a little different I’ve created a Vagrant box that should allow anybody interested to verify my findings, find mistakes I’ve made or identify performance optimizations I’ve missed.

You can clone or download the Vagrant box from my github page hopefully the instructions should be clear, you need to download Oracle 18cXE and ORDS releases and put into the software directory. I’ve allocated 6GB RAM and 4 CPUs to the virtual machine, you may need to adjust these values depending on your test machine resources. Doing “vagrant up” should automatically configure the database, and configure ORDS running in Tomcat and with some reverse proxies. It will also generate a self-signed certificate and configure the SSL handling in both Tomcat and the reverse proxies. Most of the database and ORDS configuration scripts were taken from the Oracle Vagrant boxes or Tim’s Vagrant boxes.

The bench-marking tool I am using is Siege. There are many alternatives available but I chose Siege for a few reasons. Firstly it is Free and Open Source software. Secondly it is easy to configure, simply populate a file, urls.txt, with the URLs to hit and then run the executable with suitable parameters. Finally it is lightweight, being written in C, whereas many other similar tools are written in Java, as I am running the bench-marking tool on the same virtual machine that hosts the software components I’m trying to measure this is important.

Once the vagrant machine is up, you can connect to it via “vagrant ssh” and then type “ords-demo” to run the entire test-suite. I’ll go through the individual tests in the following blog posts and share my findings.

In a previous post I showed that by default when authentication_ldap_simple communicates with a Windows Domain Controller (or any other LDAP service), then the password is transmitted unencrypted during authentication.

This time I’ll demonstrate how to close this loophole. A pre-requisite is that the Domain Controller needs to be configured to accept secure connections. This is done by installing a certificate, the process is well documented elsewhere so I won’t repeat it here.

For simple LDAP authentication, whether connections by the plugin to the LDAP server are secure. If this variable is enabled, the plugin uses TLS to connect securely to the LDAP server.

In both cases we have to set authentication_ldap_simple_ca_path to point to the certificate authority file used when securing the domain controller. (Pro-tip ensure the both the file attributes of this certificate and of the directory it sits in are such that the mysql process is able to access it, you won’t believe how long I wasted due to this).

Of the two methods, I have been informed that the TLS method is optimal so that is what I will demonstrate. Note I have found that it’s better to load the plugin and set the variables in the mysql configuration file (my.cnf) and restart the service rather than setting them dynamically (it seems the otherwise the values do not correctly propagate to the appropriate processes due to LDAP connection pooling) so that’s what I’ll show you.