I’ve noticed at a few clients with data warehouses recently that the Developers and, upon occasion, Business Users have a real fondness for hinting the SQL they are producing with one particular hint. PARALLEL.

As any fule kno, this IS the magic go-faster hint. PARALLEL(2) is obviously twice as fast as serial execution. PARALLEL(4) is amazing and PARALLEL(64) like Usain Bolt on Red Bull.

The problem is that, like all database features, parallel query comes with a cost.

When you specify /*+ PARALLEL(n) */ you are telling the optimizer that it has a lot more resources to use to complete this particular query. Not a single thread but many. PARALLEL(10) will use 21 processes to complete its execution – 20 Parallel Execution Server (10 producers, 10 consumers) and a coordinator (which is your connections shadow process) which will deal with any aspects of the parallel plan which cannot be parallelised.

Allowed free reign to use PARALLEL, devs and users will quickly consume all of the resources available on a given server, causing contentions which will inevitably slow down the overall execution of every piece of code executing on there. To illustrate this, I’d like to use an example I came across a while ago to show how excess PARALLEL of a single statement can be problematic itself.

Lets say I have a single server with 16 cores, lots of memory and a decent SSD array so the problem will centre around the CPU. Inevitably your 16 cores will be hyperthreaded. This then looks to Oracle like you have 32 cores. Whilst Oracle knows you have 16 hyperthreaded cores, you get CPU_COUNT=32

NOTE: 16 cores hyperthreaded DO NOT have the power of 32 cores, especially when dealing with databases. Some database workloads are actually WORSE with hyperthreading enabled (e.g. Data Warehouse systems on SQL Server). Inevitably the server admins will have enabled it unless you can provide cast-iron evidence to have it disabled.

I have a statement which the users are complaining about. It starts with the following code: SELECT /*+ PARALLEL */ (complex multi-table join)

So what does this unrestricted (and therefore DEFAULT!) degree of parallelism (DOP) do in this case?
The default DOP is defined as PARALLEL_THREADS_PER_CPU x CPU_COUNT=2 x 32 = PARALLEL(64)
On RAC it is PARALLEL_THREADS_PER_CPU x CPU_COUNT x INSTANCE_COUNT!

Lets have a look at the ACTIVITY of this PARALLEL(64) query:

You can see from the screenshot that Oracle knows there are 16 cores but it has gone PARALLEL(64), using 128 parallel exection slaves and fully expecting to have the available resources to run PARALLEL(64) efficiently. The execution plan is calculated around this assumption. There are 64 parallel execution slaves attempting to work on this at the same time. It’s worth looking at the metrics associated with this query.

Peaks of 2GB/s disk, 140GB of TEMP and 32 CPU’s.

The query took 36.9 minutes to complete.

I had the query changed to inject a modicum of realism into the available resources at the time of the run, and restricted the DOP to PARALLEL(8).

Oracle is restricted to the limited amount of resource, which is availble. The execution plan is different, to reflect the lower amount of available resources. Looking at the metrics:

Peaks of 1GB/s, 3GB of TEMP and 12 CPU’s.

The query took 10.3 minutes to complete. 3 times quicker!

It is worth noting that testing the query in isolation with PARALLEL(16) took 7 minutes to complete, but that DOP would have resource-starved the server as a whole causing everything else currently executing to slow down, and was discounted as an option.

With PARALLEL, less can be better.
Using PARALLEL for everything can be counter-productive.
Co-ordinating PARALLEL across multiple RAC nodes can be disasterous without careful design to take advantage of this feature (using PARALLEL_LOCAL_FORCE=TRUE will restrict parallel processing to a single node). Oracle recommend you don’t do this. Other opinions are available and I generally recommend setting this to TRUE.

We have a limited amount of resources on our servers. Working within those resource limitations will provide substantial benefits.

No so long ago I was having fun creating SQL Plan Baselines in a old 11.2.0.3 database due to be decomissioned but which needs to keep running for a while (no doubt several years) – so minimal time/money to be expended on it. Then, one day, I couldn’t create a baseline and needed to figure out why…

We get the occasional painful plan change, so pinning down an acceptable historic plan using a Baseline is becoming a regular occurrence. The standard route for this is:

1. Notice plan isn’t good and that we have been running with a good plan historically
2. Identify an plan hash value which is acceptable
3. Load it from the library cache if that plan is in there (unlikely)

4. If not, create a SQL Tuning Set from AWR with that specific plan and convert the SQL Tuning Set into a Baseline. You will need the begin-and-end snap id’s containing the plan, the sql_id and the plan_hash_value for the specific plan you want to baseline:

Please ensure you have suitable licensing before running any of the following code!

You may not be aware but Oracle have been talking about their new ProActive support with user groups like the UKOUG. It’s basically trying to show you best practices you should be implementing before stuff goes wrong, and many other useful bits of information.

It covers a lot of areas and is really worth a look, whether you’re using Oracle Database, Linux, MySQL, EBS, JDEdwards or indeed most Oracle products.

However, I do have to wonder just how up to date some of the content is given the “Trending” topic in the Oracle Database area is about planning your move to Oracle 12.1. Really Oracle? Really? You might want to dedicate a bit of time to getting this changed to 18 or 19?

Like this:

Since Oracle Open World 2018, Oracle have been trumpetting a few cool new features in the Oracle 19C database, the headline two for administrators being “Automatic Indexing” and “Real-Time Statistics“.

With the release of Oracle 19.2 on Exadata (on-premises – not yet on Cloud!) this week, we also got the documentation released which allow us to answer a very important question: on which platforms will we be able to use these 2 (and other) new awesome features. We may not like the answers – Oracle has decreed that we cannot have them on-premises for SE2, Enterprise Edition or on an ODA.

This is confined purely to Exadata on-prem and the cloud offerings. That’s it. There are no technical reasons why this should be the case – it’s just code – so the restrictions can only be marketing-based (like Hybrid Columnar Compression).

I’m disappointed, but it does mean that the over 80% of Oracle database clients still not living in the cloud should continue to need that aspect of my services!

I needed to monitor a SQL statement in 11.2.0.3 (the limits mentioned below are the same in 12.1, 12.2, 18.4 and 19C) to determine what is was doing and why it was slow.

Usually I would use SQL Monitor [NOTE: You need to license the Oracle Tuning Pack to use SQL Monitor] for this but the SQL was not appearing in there, despite running for over 5 seconds, and being a parallel SQL (both of which qualify to be included in SQL Monitor). So I asked Twitter why, and thought I’d share the output here.

There is a hidden parameter “_sqlmon_max_planlines” which states that any SQL with a plan in excess of 300 lines should not be monitored (see below for SQLMon hidden parameters – and change them at your own risk, preferably with the backing of an SR from Oracle Support). This execution plan had well over 300 lines. The solution is to change either the session or the system to allow monitoring to happen when the plan is over 300 lines.

e.g.
alter system set "_sqlmon_max_planlines"=500 scope=memory sid='*';
or
alter session set "_sqlmon_max_planlines"=500;

The negative side effect it that the monitoring will use more resources (primarily memory and CPU), which is why there are default limits on this feature. You might want to change it back when you’re finished to conserve resources.

Note that if you change the system parameter whilst the SQL is running, it will start to monitor the SQL at that point, so you will only get a partial picture of what is taking place, which is less valuable.

select ksppinm, ksppstvl, ksppdesc
from sys.x$ksppi a, sys.x$ksppsv b
where a.indx=b.indx
and lower(ksppinm) like lower('%sqlmon%')
order by ksppinm;
KSPPINM KSPPSTVL KSPPDESC
------------------------- --------- --------------------------------------------------------------------------------
_sqlmon_binds_xml_format default format of column binds_xml in [G]V$SQL_MONITOR
_sqlmon_max_plan 640 Maximum number of plans entry that can be monitored. Defaults to 20 per CPU
_sqlmon_max_planlines 300 Number of plan lines beyond which a plan cannot be monitored
_sqlmon_recycle_time 60 Minimum time (in s) to wait before a plan entry can be recycled
_sqlmon_threshold 5 CPU/IO time threshold before a statement is monitored. 0 is disabled

You may also notice a few other parameters in there. The “_sqlmon_recycle_time” hows the amount of time that the SQLMon plan will be guaranteed to be retained. Any retention time after that will be a bonus and depend upon the amount of SQL needing to be monitored. I see monitoring plans disappearing after 2-3 minutes in some systems, so you need to be quick, and you should save the plans down to disk.

The mad thing is that I was aware of this restriction before I posted by request for help on Twitter but I’d completely forgotten about it. So here’s the blog post to help me remember!

When configuring a physical standby database for Oracle using Data Guard, you need to create Standby Redo logs to allow the redo to be applied in (near) real time to the Standby. Without standby redo logs, Oracle will wait for an entire Archive Log to be filled and copied across to the standby before it will apply changes, which could take quite a while.

Which leads me to the problem I encountered a while ago, and due to being forgetful, still encounter today when creating standby redo logs in Oracle 12C and 18C.

I have a Primary database with 3 Redo logs, and I have created a Physical Standby database with 3 Redo logs and then 4 Standby Redo logs created as follows:

Well, Dataguard works as expected for a while – applying change immediately – then it lags for ages and you get an OEM alert that it’s lagging. Then it works fine for a while again and then another lag.

The problem is that when I created the Standby Redo log files, I did not specify the Thread. But why do I need to specify the thread? This is not RAC. It does not have multiple threads for redo – one for each node. We have Thread 1 and that’s it.

When I used to do this in Oracle 11.2, the Standby Redo logs got Thread 1 by default. Always Thread 1.

From Oracle 12.1, and also in 12.2.0.1, 18C (12.2.0.2) and probably 19C (12.2.0.3), Oracle creates the Standby Redo logs with Thread 0 by default, although sometimes it creates them on Thread 1 (although this seems to be only in 12.1 but let me know if you know better). I don’t know why.

Standby Redo log MUST belong to the SAME Thread as the Online Redo logs for them to be used. Having (in this case) 1 x redo on Thread 1 and 3 x Redo on Thread 0, I only have 1 x Standby Redo usable for my online Redo.

The solution is stop the Apply, DROP the Standby Redo on Thread 0, and create new Standby Redo Logs explicity stating Thread 1, and restart the Apply.

It is worth saying that MOS note 1595034.1 gets the corrective advice EXACTLY WRONG, with the incorrect snippet reproduced here:

"The standby redo logs can't be used, since they are assigned to thread 0.
Standby RedoLogs should not be assigned to a specific Thread.
Please use fallowing statements:...
SQL> ALTER DATABASE DROP STANDBY LOGFILE GROUP <Group#>; and create new SRLs withSQL> ALTER DATABASE ADD STANDBY LOGFILE GROUP <Group#> SIZE ;"

No MOS analyst! Standby Redo Logs MUST be assigned to a specific thread. Thread 0 Standby Logs are ignored!

And don’t forget, if you are using RAC you need to have Standby Redo Logs for EVERY thread accessible to EVERY instance, just like the Online Redo logs.

If you have single instance, ensure you specify Thread 1, and ensure you have 1 more Standby Redo log available than Online Redo log.

As a final note on Data Guard, from Oracle 12.1 you really should be using the Data Guard Broker to configure and manage Data Guard. It’s simple and it’s much better and safer than configuring Data Guard manually and has some nice validation and monitoring features too.

Addendum: For clarity, not specifying a Thread means that Oracle is supposed to auto-adjust the thread to match what thread is needed, using the Thread 0 as a pool so you presumably need less redo defined in RAC environment, but it just doesn’t work very well (or not at all) leading to the fast/slow/fast/slow lag symptoms. It’s also a pretty pointless saving as redo logs are small relative to the database size. If you have a LOT of nodes in the cluster meaning a lot of standby redo logs, and they aren’t all committing redo heavily (when they will all need all of their standby redo logs anyway) why do you have so many nodes? And validating the setup will report that you don’t have enough standby redo logs. Looks like a solution to something that isn’t a problem for anyone.

So this is an undocumented problem-introducing enhancement in Oracle 12 solving no practical real-world problem.

I’ll let you know if it is fixed in Oracle 19C.

Here’s some real-world output showing the apply lag with Thread 0 SRL’s, and how it disappears when Thread 1 SRL’s are explicitly created. Please note the units and date. I can’t provide the alert log output in this blog to prove my annotations.

The excellent DOAG Oracle conference is currently taking place in Nuremberg in Germany.

I’ve been thinking about the UKOUG Conference due to take place in Liverpool on 3rd-5th December 2018. About 2 weeks away.

Why would you bother paying to attend these conferences? Why should I, or my company, use our money to attend? Oracle provide free conferences and 1-day events around the world talking about the Oracle product set. Oracle Partners also provide free events with their speakers too. They are FREE! FREE! FREE!

I only need to convince my boss to take time out of the office. They don’t need to raid their budget – they just need to let me out!

BUT! Those events are effectively a sales pitch. If you are not paying in cash you are paying in other ways, with partners and Oracle looking for you to adopt them and pay them via licensing, support and consultancy. I’m not saying those events are not without worth; learning about Oracle’s new product and partner offerings certainly has value and I would encourage people to attend those events. But what are they not offering? Running those events isn’t free and large corporations are not known for their philanthropy.

An Independent User Group, like the UKOUG, doesn’t do sales pitches. It has speakers from around the world coming to share their SUCCESSES AND FAILURES! They talk about what worked, and what didn’t. They talk about the limitations and the excellence of the software and services which you are being sold.

If you want to know what happens AFTER you’ve bought the software. AFTER you’ve seen the amazing new features and where they ACTUALLY WORK or FAIL, you need to talk to people who are doing it, not those who are selling it.

And you can talk to them, and me, at the UKOUG. There are networking events; you can look into the eyes of the speaker who just told you something was great and ask them a question about how they know it’s great and how it could be great for you! And the networking events are, above all, FUN. It’s not all work. It’s also play. Have a drink, a meal, a chat, a coffee, a beer with like-minded individuals.

The event is in Liverpool, well known as somewhere to enjoy yourself as well as seeing the historic buildings and great sports teams. It’s just over 2 hours to get there on the train from London. Not as far as you might think!