Another Day, Another Obscure Oracle Error September 11, 2009

I seem to be spending my life hitting odd bugs or errors in Oracle’s performance-related areas at the moment. I guess part of it is because I am using features not widely touched, part is I’m not working as the SYS or SYSTEM user {in my experience a lot of people working on such things do it when connected as sysdba or at least as SYSTEM} and part is that Larry Ellisson personally hates me {he fills in for God when it comes to Oracle}.

I’m seeing this a lot today,and it seems virtually unknown:-

ERROR at line 1:
ORA-20003: Unable to gather system statistics
ORA-06512: at “SYS.DBMS_STATS”, line 15882
ORA-06512: at line 1

I’m also seeing this, but this error is documented:-

ERROR at line 1:
ORA-20000: Unable to gather system statistics : insufficient privileges
ORA-06512: at “SYS.DBMS_STATS”, line 15822
ORA-06512: at line 1

The rest of this post will explain my path to only partial enlightenment. If you have hit the above errors and found this page via Google, this might help you out and you might like to skip towards the end.
If you just read my blog, regard this as a very long “tweet” or whatever the hell they are. Again, feel free to skip to the end. Or just skip.

The task I’m doing which is causing me to hit these issues is that I’m trying to assure myself of the ability to role back gathering system statistics before getting the DBA team to do it on a production system. {I am not in the DBA team on this one}.

No system statistics have been gathered yet so I have the default set, I’m sure many of you will recognise them:

Two records created, first is the header record about what this statid is all about, the second contains the stored info. I’ve highlighted the relevant records.

All well and good, I can save my stats. Why do I want to? Because we may gather system statistics on an infrequent but regular basis, and I want a record of them. The automtically stored history only goes back a month {by default}.

Of course, there is no need for me to explicitly export the stats to the table, it can be done as part of the gathering process, by stating the statistics table, owner and an ID when I do the gather, so I did:-

Well, it’s a test box, there may be too little worload to detect anything – like none! I checked the SGA for recent code and the only SQL was that for recording the stats gathering :-).
So, repeated with me running some workload in another window for 15 minutes.

Still no change, still no system stats. Double Damn!

I remembered from reading around that if MREADTIME is less than SREADTIME the stats gathered could be ignored so I tried again.
And again.
And then decided it would not magically work, there was a problem.

I’ll gather the system stats with START and STOP and really hammer the box for a while, ensuring with autotrace
exec dbms_stats.gather_system_stats(gathering_mode => ‘START’,stattab=>’AD_BF_STATS’,statown
=> ‘DWPERF’ ,statid=>’SYSSTAT’||TO_CHAR(SYSDATE,’YYMMDDHH24MI’))

Oh Hell. Well, at least I now know that the chances are my interval systenm stats collection failed with this, but silently.

I bet if I dug around I would find a few failed jobs recorded where the INTERVAL version of the stats collection failed with this. I have not checked, I am just up to my eyes right now.

I found nothing at all via Google or Metalink about this exact error for gathering system stats. BUT maybe I lacked the correct priveleges on my non-SYS account. So I granted ANALYZE AND DICTIONARY and ANALYZE ANY to my user, as those or the priveleges mentioned for many of the DBMS_STATS management procedures.

Nope, did not help.

Blow it, If DBMS_STATS won’t let me keep the stats in my table, I’ll just rely on the default store of them, and not state my stats table in the call:

That works, because I am not involving a stats table. If you look back at the two errors, they come from different (but very similar) line numbers. I’d missed that for a couple of hours, I blame dyslexia.

Conclusion:

Gathering and storing System Stats is as buggy and random as the rest of the dbms_stats functionality. {yes, I am peeved about this}. It Seems..

If you gather system stats with the interval method, it can silently fail. This might only be if you involve a stats table.

You need the DBA role to gather system stats, otherwise you get ORA-20000, via ORA-06512 at line 15822

analyze and dictionary and analyze any are not enough to allow a non DBA account to gather system stats.

If you try and store the previous system stats in your stats table as part of the gather you get ORA-20003, via ORA-06512 at line 15882 {different line}, even if you have DBA role.

If you just rely on Oracle preserving the previous version of the stats, you can gather system stats fine.

You can work around the issue by exporting the system stats to your stats table as a single step first, then gathering system stats without involving your stats table.

There is no metalink for the ORA-20003 via line 15882 in metalink and no google hit for it, so I reckon it is a damned rare event. And since the last time I blogged about a damned rare error, I see 1 or 2 hits on that blog page every day since :-).

Damn, you caught me, I forgot to put the version and it is of course very relevant when blogging about odd behaviour.

It was 10.2.0.3. on linux (red hat), Enterprise Edition, single node.

As for privileges, without an exhaustive list (and if anyone wants a list, send me an email) then I had connect, resource, select any dictionary, execute any procedure, select any table and a few other standard “high access but not DBA” privs. I guess “select any dictionary” is a DBA-type privilege but it is needed to do performance work.

As I say in the blog, I added ANALYZE ANY & ANALYZE ANY DICTIONARY privs and DBA role as I worked through the issue.

Thx for that one Martin, I remember again why I always use SYS or “/” to do this sort of work.

And for any stats gathering, system or objects: Trust but Verify. Do the number make sense? And do they still make sense in a few weeks time?
System stats should remain more or less stable as long as the underlying hardware doesnt change, whereas object-stats can vary wildly depending on the content of tables and indexes and the sampling applied. But in all cases, I tend to be in favour of locking stats, and avoid unexpected changes, especially in OLTP systems.

As for you stats: did you notice how close your CPUSPEED seems to be to the NW Default ? Have you tried to gather them multiple times and compare them for consistency? Christian Antognini as a good chapter on this issue in his book.

I used to be very wary of the values in Object Stats when DBMS_STATS first came out but I found, so long as the sample size is reasonable, the figures are reasonable , even under 8. Much more of an issue is gathering Object Stats at “the right time” {which is a whole topic or three}.

I’ve not really checked the truth of system statistics before, it is part of what I am doing at the moment. I’m more interested in looking at how the mreadtime and sreadtime match the information in the rest of Oracle, but as that is where Workload System Stats are supposed to come from, they should be very similar.

So what are your thoughts Piet (or anyone else) on gathering different System Stats for different work periods? Worth it or not?

So try not to gather too often: I would value stability over “total accuracy”.

I would presume the CPUSPEED should not differ much (but it Does on a VM machine, if the VM or the underlying platform is stressed, I noticed).

The other stats are mostly IO related, and I would perfer to gather/set/keep them at conservatively (low?) values to assume the worst. Hence I favour CPU over IO, and try to avoid (physical) disk access.

There is a little knowledge-nugget out there regarding system-stats. For those with metalink access, have a look at 368252.1, point 7.
It seems Oracle EBS also values stability. To the point where they dis-recommend gathering System Stats when running EBS. You may remember my metalink rant from June, well, the long-term result was that oracle added this section to the note…

CPU speed varies on a VM machine? Now why would a faked-up computer on a system that could be running some, none or lots of other things depending on if you have itunes playing and you are surfing Youtube vary in CPU performance 🙂 But it’s good to point out that VM adds a whole new layer of complexity.

I understand your caution, like most people who deal with performance issues, I’ve had to work around odd behavious, hidden issues and down-right bugs with CBO and with the DBMS_STATS. IN my experience, especially with DBMS_STATS.

However, I personally go for “as accurate as possible” as opposed to your suggestion of “conservatively (low?) values”. If you ask Oracle to use values for I/O lower than your system is generally using, then it will only give the CBO slightly off figures to start from. It would be like asking me to tune a SQL statement manually but telling me the tables are smaller than they really are – I’ll be likely to come to sub-optimal decisions.
If you start from “slightly off” basic figures, that just can’t be good in my book.

The bug is still persistant in Oracle 11.2.0.3.0 , just stumbled upon it on a SUSE Linux Enterprise Server 11 (x86_64) VERSION = 11 PATCHLEVEL = 2.
And your blog is still one of the best hints towards the error.

What happened?
We just installed our 1st Oracle 11R2-Server and created an instance.
Then we created the schemas for our application.
After this i wanted to (re-)gather the system stats, we got a script for it:
connect / as SYSDBA;
execute dbms_stats.gather_dictionary_stats;
execute dbms_stats.gather_fixed_objects_stats;
execute dbms_stats.gather_system_stats(‘INTERVAL’,60);
exit;

There is no reference to ORA-20003 in the error messages book, neither for 10.2 nor for 11.2 😦
Based on your researches i got some closer to the bug.
execute dbms_stats.gather_system_stats(‘INTERVAL’,60); most times failes, but sometimes works, unpredictable.
However when i
execute DBMS_STATS.DELETE_SYSTEM_STATS;
before
execute dbms_stats.gather_system_stats(‘INTERVAL’,60);
it always worked for me.

So the bug still remains a mystery, but i found at least a workaround.

Martin, regarding the ‘Unable to Gather System Statistics’ when using INTERVAL, doc 277098.1 might shed some light – that error is (also?) thrown when an INTERVAL gathering is already occurring. That was my case – albeit I hadn’t run that command for a couple of years 😀 – and “DBMS_STATS.GATHER_SYSTEM_STATS (gathering_mode => ‘STOP’)” solved it. HTH 😉