RAC GV$SQL type Bugs August 28, 2009

This week, I have mostly been hitting obscure bugs with Oracle RAC and GV$SQL/GV$SQL_PLAN.
ORA-00600 with GV$SQL plus ORA-12085 with GV$SQL_PLAN and ORA-03113 with V$SQL_PLAN. I’ve listed them in the hope anyone else hitting them will pick up this Blog entry (they are obscure but, even so, there is not a lot “out there” about them.)

I’m basically looking at getting a handle on overall Oracle SQL performance variation. This is driven by two areas of consideration:

SQL suddenly going rogue (IE a good plan swapping to a very bad plan)

The system-wide impact of changes, like gathering system statistics or changing initialisation parameters.

In an ideal world you would have a fully developed test system to try out the latter and your monitoring tools (like OEM/AWR or Toad or whatever you set up) would pick up the former. Back in reality, you can find yourself lacking both and I have had the idea for a looooooong time to set up something light-weight that checks the V$SQL… views directly to pick these up and now I am implementing something to do this. And finding out some of the pains and aggravations in doing so…

Enough meandering, these are some bugs some other poor sods are going to hit and these are the answers I have found. They may not be the best answers but it is all I have to offer :-).

Select count(*) from GV$SQL or select count(*) form GV$SQLAREA gives an ORA-00600 error and the usual trc file to wonder over. Happens in 10.2.0.2 to 10.2.0.3 on any platform and is fixed on 10.2.0.4. No backport I am aware of. Check out doc ID 357016.1 (Sorry if this is wrong but I am having major cut n paste issues over crap Virign wireless Broadband sevices, amongst other crap Virgin Wireless Broadband issues). Inclusion of the SQL_FULLTEXT (CLOB) column is suggested as the issue.

Our solution was to select the exact columns we wanted and this excluded the SQL_FULLTEXT column. The error went away. I did not pursue any further as it is a known but generally “swept under the Oracle Carpet” error. Heck, it only impacts those nutters using RAC… plus multibyte character set, like al32utf8.

Next up, selecting * or a set of columns from GV$SQL_PLAN on linux on 10.2.0.3 gives an ORA-12805 error.

ORA-12805: parallel query server died unexpectedly
ORA-06512: at “MWPERF.PKG_XXX_MONITOR”, line 311
ORA-06512: at line 1

Now, we really did NOT expect that as we had turned off parallel processing across the DB but I figure if you query against a GV$ view it is going to run across all nodes and compile the results, so parallel is forced. Well, on 10.2.0.3 on HP hardware/Redhat linux it barfs.

I reproduced the error by pulling the insert statement driven off GV$SQL_PLAN into a simple insert statement, it gave the same error.

Just using V$QL_PLAN you get ORA-03113 end-of-file on communication channel. I think this is occuring on the remote nodes and passing back the 12805 error.

INSERT INTO ad_sql_runtime_plan
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel

Anyway, google and metalink both threw up no 10.xxxxx errors when I searched but did reveal a 9i errorr when trying to look at FILTER_PREDICATES column, fixed in 10.1 (Bug 3545517).
Also, I found bug 5166445 linking to 5933643. Which is no bloody {sorry} use to anyone as 5933643 is one of those oh-so-common oracle bug/issue numbers that links to a page saying “you can’t see this as it is private” which is just an insult to people trying to solve their own problems. Bad Oracle Corp, Bad Boy.
{Yes, I am annoyed by this, it is just simply poor by Oracle to reference an SR on public metalink that is not publically viewable and TOO MUCH is not publically viewable – I’ve know incidents when my own damn SR is not publically viewable and I had no idea why}.

It looks like there was a known and specific bug with FILTER_PREDICATES that impacted 9.2.0.3 to 9.2.0.4. Maybe to commemorate it Oracle introduced a similar bug on the same table for 10.2.0.3 to 10.2.0.4? On a different column of course.

SO, the solution. Well, I managed to narrow it down to one column. I could select count(column_name) from GV$SQL_PLAN for every column except OTHER_XML which is a Clob and PROJECTION, which is the problem. THat was the only column that errored and it gave the error codes I stated above.

Do select * or include PROJECTION in your column list and selects against GV$SQL_PLAN and V$SQL_PLAN may well go wrong under 10.2.0.3 on linux or wherever you get the problem. Remove the column and you could be OK. If not, test by using “select count {column-name} from V$SQL_AREA;” for all the other columns one by one and see what errors for you.

I did check to see if GV$sql_plan resolved to a complex view on many x$ objects but it did not, all of the columns (including PROJECTION) come off the X$KQLFXL internal object.

Now, the question is, having solved the problem for my own specific requirement on a RAC/linux 10.2.0.3 platform and done a Blog entry, is it worth spending half a day raising this obscure bug with Oracle Corp…?

Despite all “improvements” to ML, the use of it can still be an uphill struggle. We have to keep naging the benevolent vendor on that, preferably with carefully worded praise for their valliant efforts.
To be fair: the ML analysts probably get a good amount of non-relevant trivia and downright BS as well.

The last SR on metalink took me days and days of clarifying (repeating the same question in more and more simple wordings – surely my mistake all the way, I am after all a non-native writer of english, and I should also have listed all the ML-notes I had already scrutinized: they didnt contain the answer).

The frustrations got documented, in gentle oracle-positive wordings, on my pdvfirstblog a few months ago.
Mainly I persisted because I did need some relevant answer for my client. For peronsal research or in case of a “found workaround”, I would have given up much earlier. My stubborness did result in an answer from the third analyst, the one who finally understood the issue (and then needed to speak to others to formulate a sensible and oracle-politically correct answer)…

When the answer came, Note:368252.1 got updated as a result of my question (insert broad grin here).

My Advice:
When using Metalink: keep at it! If we give up, they have won.