It feels like it’s been imminent for ages as we slipped from February 2011 to September then through October to November 2011, February 2012, March 10 2012 and latterly to March 25 2012.

On the whole, these slippages have been more to do with affairs of big, corporate IT rather than any issues with the testing of the application under 11g, for example, red tape, higher priority changes and procurement for the whole 11g development estate.

When we actually go live, I’d be surprised if we don’t see some significant issues and oversights in a relatively short timeframe.

However, before then I just want to go through a quick whirlwind of some of the highlights and challenges undertaken/encountered:

Migration from 9.2.0.8 on Solaris to 11.2.0.3 on Linux via an in situ upgrade to 10g on Solaris as an interim step to faciliate use of datapump as opposed to imp/exp
– Change of hardware.
– Change of OS.
– Change of db version.
– We do not use “incremental change”.

Moving from a 20G sga with 17G buffer cache to a 70G sga and a minimum 25G buffer cache – in reality nearer 60G – using sga_target and db_cache_size.

Not using AMM (memory_target) yet not using hugepages!

Async and concurrent IO.

Moving from a statistics strategy of ‘FOR ALL INDEXED COLUMNS SIZE AUTO’ fixed with the two-monthly release cycle to the default stats job (running in a weekend window).

Removal of a weekend maintenance job that rebuilt lots of indexes for undocumented historical reasons.

Application issues:

No (very limited) use of sql plan baselines.

Trial of cursor_sharing = force for modules with poor shared sql principles, abandonment of trial on 11.2.0.2 due to ORA-07445/ORA-00600 from pro*c code.

Manual tuning of any code that exhibited any performance degradation – plenty of that (not surprising given that the oldest comments in the code date back to 1992).

Here’s a little observation on some slightly odd behaviour with MERGE and some apparent inconsistency of error ORA-30926 (prompted by a question on the otn forums).

If you’re not familar with ORA-30926:

ORA-30926: unable to get a stable set of rows in the source tables

then this is expected with MERGE if the set of SOURCE rows (identified in the USING clause) that you’re merging into the TARGET (i.e. the MERGE INTO <target>) contain duplicate entries for the key columns concerned, i.e. if there are duplicates, which one do you want to be preserved by the MERGE.

Perhaps we get an insight into how the check that results in ORA-30926 is implemented?

It seems to be not as simple as a question of whether there are duplicates in the SOURCE but whether the MERGE would update the same row twice – i.e. an UPDATE that results in an UPDATE not just updating it to the existing value.

The bottom line is that you should never be merging in multiple updates to the same target row.
If there are such incoming data, you have to have logic to filter that down.

Just a short note to report on the impact that some config changes have made on IO times in a specific environment for a specific workload.

I mentioned previously that I’m working on an upgrade of an application from 9.2.0.8 to 11.2.0.2

There are changes in pretty much every area – new hardware, different OS, etc, etc.

We’re using a full-volume UAT environment within the new set-up to compare new against old (production) and that will form the main basis for performance changes to the application required for this upgrade.

It’s pretty much an apples vs oranges comparison and not helped by the fact that UAT runs on tier 2 storage to be compared against the current tier 1 storage – UAT IO is slow.

In summary, not exactly best practice upgrade approach – but that’s just how it is sometimes in the real world.

Anyway, anyway, anyway… we’ve been waiting for recommendations and official go-ahead from the database engineering group who run the tests and control the builds of the machines and the following config changes have been made to the following:

Veritas VxFS CIO

Linux Deadline scheduler

Ideally such changes would be made individually to gauge their individual impact, however, as mentioned, it’s not always like that is it?.

And on the UAT environments above, based on a before-flashback-after run of the main application workload, the following IO times were observed:

Wait Event

Average wait time (before)

Average wait time (after)

db file sequential read

10 ms

7 ms

db file scattered read

21 ms

11 ms

db file parallel read

38 ms

60 ms

direct path read

95 ms

64 ms

direct path read temp

32 ms

9 ms

direct path write temp

34 ms

8 ms

log file sync

9 ms

3 ms

I’m not convinced that we have a level of control over the whole environment and time to deliver change incrementally to read too much into a single before/after comparison of the same workload, however these initial findings were better than I expected.
(Bottom line is that it’s still apples vs oranges)

Downtime shouldn’t be too much of an issue. It’s not like it needs to be done in an hour or anything demanding like that. As long as it can be done within a day.

Within the client, there’s no official build for 11g on Solaris, so I think that might rule out solutions like an in situ upgrade to 11g on Solaris followed immediately by the move & conversion to Linux.