Stats Need Stats to Gather Stats February 16, 2010

Did you know that you sometimes need good stats so that the stats-gathering package can gather stats in an efficient way? This is a recent, quite extreme example.

I’ve been forced to learn a lot about gathering Oracle stats using DBMS_STATS over the last 4 or 5 years. But no matter how much I learn about the “challengingly bizarre” way in which it works, it seems every week or two there is a new oddity. I plan a whole series on the topic “soon”.

This particular example is from a 10.2.0.3 system.

I am gathering partition-only table stats as we are using the ability of Oracle to roll up Partition stats to Global stats under certain specific conditions. One of the conditions is that you need stats for every partition. Plus, to get global column stats, each partition must have stats for each column. Some of our partitions lacked or had very bad stats.

So I quickly knocked up a script-generating script to create DBMST_STATS.GATHER_TABLE_STATS statements that collected, for those partitions:

ONLY partition stats.

NO cascade down to indexes

BLOCK sampling so it is fast {and poor, but there you go}

ESTIMATE_PERCENT of 2, which is quite low for block sampling

collect histograms as we decided the devil of having them was better than the devil of not having them.

the above is not ideal to get “good stats”, but it is quick and gets OK stats which is what we need right now. An example statement is:

I’ve collected statistics for a few thousand partitions over the last couple of days and the time taken is anything between just under half a second to 10 seconds per partition, the odd unusually large partition taking a minute or so. {I believe it takes half a second to gather stats on an empty partition, on our system at least, due to the time it takes for the internal housekeeping, including copying the old statistics information to the SYS.WRI$_OPSTAT_… tables to support restoring stats}. Sorry, I drift away from my main point.

After 10 minutes it was still running. WHY? I quickly checked the number of rows in the partition and then the size of the partition segment, incase either was much larger than I expected. Neither were.
select count(*) from eric.W_ACTIVITY_FAILURE partition (DY07092008)
COUNT(*)
———-
42182

From dba_segments.
BYTES BLOCKS
———- ———-
2621440 320

There is one advantage of a DBMS_STATS statement running for a long time – you can grab from the SGA the actual code being executed for the DBMS_STATS statement. I saw this.

By this time 3 partitions for this table had been processed by my code, taking around 15 minutes each one. Incredibly slow.

I did a very quick 0.01% sample size DBMS_STATS.GATHER_INDEX_STATS on that index which took about 1 minute. As soon as the partition DBMS_STATS.GATHER_TABLE_STATS statement that was in flight finished, the following similar statements on that table’s partitions took under 3 seconds each. I’ll buy a pint for the first person to guess (within 10 minutes) WHEN I collected the global index stats {You can collect from any pub in central London if you give me a few day’s notice🙂 }.

Like this:

Related

If you’ve ever worked on large databases that use partitioned and subpartitioned tables, you’ll be aware that there are significant challenges in maintaining up-to-date/appropriate statistics. We’ve encountered a few problems at work recently and I …

[…] later – whilst digging out a link to Martin’s blog, I noticed that he’s planning a whole DBMS_STATS series soon. Sigh. Keep an eye out for that, because it will be as in-depth as always. I’ll stick to the […]

There is a little-known feature of dbms_stats.gather_schema_stats/gather_database_stats where you can call it with OPITIONS=>LIST AUTO/LIST STALE /LIST EMPTY and it will pass out a list of objects it will gather into our paramerter OBJLIST, type ObjectTab. So you could run that and dump it out and then run the actual gather.

But to see what the stats job really is collecting, you need to identify the process and then get the current SQL_ID out of v$session. With that you can then pull out the sql itself. OEM (or toad or any other DBA GUI) would make that easier to do of course.

Thanks Martin.
v$session is what first came to my mind. But then I thought there might be a better way to achieve the same.
As for OEM (or any other GUI tool), I have turned into text-based tools’ advocate ever since I started working on Oracle.🙂