A question on the OTN forum has prompted me to quickly knock up a demo on the possible dangers of the default behaviour in 10g with regard to the METHOD_OPT option in DBMS_STATS.

When collecting statistics with DBMS_STATS in 9i, the default value of METHOD_OPT was ‘FOR ALL COLUMNS SIZE 1′. This basically says to Oracle please only collect basic column statistics (min, max, distinct values etc.), do not collect histograms on these columns. For columns that are evenly distributed and for columns that are not referenced in SQL statements, this is perfectly adequate. If a column was unevenly distributed and detrimentally impacted the CBO’s costings of an execution plan, then one could generate histograms for those particular columns separately.

However, this default behaviour changed in 10g and IMHO this change is possibly the most significant and problematic difference when migrating to 10g.

The new default value of METHOD_OPT with 10g is ‘FOR ALL COLUMNS SIZE AUTO’. This basically means that Oracle will automatically decide for us which columns need histograms and which columns don’t based on what it considers to be the distribution of values within a column and based on the “workload” associated with the table (basically are there any SQL statements running in the database referencing columns which might need histograms for those statements to be costed correctly).

This sounds like an ideal scenario, just let Oracle work it out for us.

However, the problem is that Oracle in many cases doesn’t do a particularly good job at determining when it should generate a histogram and when it shouldn’t. In fact, the likelihood is that Oracle will actually generate many many many unnecessary histograms while still missing out on some columns that should have them.

In environments with few tables and with few users executing few distinct SQL statements, the impact of some unnecessary histograms may be minimal. However in environments with many tables and columns (potentially many thousands) with many users executing many different SQL statements, the ramifications of potentially suddenly having thousands of additional histograms can be disastrous.

Note also that by having a histogram, Oracle changes the manner in which the DENSITY statistic for a column is calculated (as stored in DBA_TAB_COLUMNS). This is often used by Oracle to determine the selectivity of predicates so the impact of suddenly having additional unnecessary histograms can be wider and more significant than one might initially imagine.

Of course, the impact on the shared_pool and the row_cache and it’s associated latches in particular can be extremely ugly indeed if suddenly Oracle had to deal with thousands of new histograms when parsing statements.

This silly little demo, “Dangers of default METHOD_OPT behaviour in 10g“, creates a simple little table with three columns. The first column has an outlier value and as previously discussed here, a histogram might be required to correctly cost range scans. The second column is perfectly distributed, it has 10 distinct values with 100,000 occurrences of each. The third column is also perfectly distributed but it’s a special example in that it has only 1 distinct value.

As you can see by the results of the demo, Oracle has got it wrong one way or the other in varying degrees in all three examples. It hasn’t created a histogram when it was needed and created histograms when they weren’t needed, impacting the Density column statistics as a result.

My advice. Just be very careful when using the default method_opt ‘FOR ALL COLUMNS SIZE AUTO’ behaviour in 10g.

Share this:

Like this:

Related

“However in environments with many tables and columns (potentially many thousands) with many users executing many different SQL statements, the ramifications of potentially suddenly having thousands of additional histograms can be disastrous.”

That is the core issue! Bingo.

It’s a perpetual trade-off between sub-optimal plans and unneeded overhead.

An exploration of “intelligent histogram creation”, would be great. You could use AWR to correlate the SQL to the objects, avoiding histograms that are never used, and develop a method to only create histograms that “make a difference”.

As mentioned the default for METHOD_OPT has changed from 9i to 10g, but I think it is worth noting that because workloads evolve, so do the stats, and in this case, the potential for histograms to be collected. New histograms can be collected the first time stats are gathered in 10g (from an 9i upgrade) as well as any time the SQL workload and data change to trigger this.

I am not sure that I agree with your view that a histogram should be collected on column ID1. There is only one value that is outside the contiguous range of values and the query you used has an equality predicate, not a range predicate. If there was 1000 distinct values with ranges from 1 to 100000000 and none of them were contiguous or they were in several contiguous ranges, and the workload used range predicates, it might be more plausible that queries could benefit from a histogram.

For example, if you take your same example and run this update statement:
update hist_test set id1 = id1+50000 where id1 between 500 and 999;
And add this query to your workload:
select * from hist_test where id1 between 1 and 999;
and then gather stats again using FOR ALL COLUMNS SIZE AUTO you should see a hight balanced histogram of 254 buckets.

Yes, very good point, my demo (which I have to admit I put together in a bit of a rush after I saw the OTN thread last night) didn’t produce a very good workload for the first column.

I’ve now changed the demo to now illustrate how problematic the query on the first query could be with a range scan that doesn’t perform at all well due to the outlier value and how the AUTO option still doesn’t generate the histogram which would help.

Hopefully this makes it a lot clearer what I was trying to highlight and how the AUTO option doesn’t deal particularly well with outlier values.

I’ve now updated the demo yet again by making the table somewhat larger to highlight the differences in costs a larger table may have with all this to to show how a histogram would have been useful in dealing with an outlier value by creating a histogram manually at the end.

I am not a big fan of this new behaviour either. Oracle’s implementation seems sub-optimal and it is turned on by default when you go to V10 (though I think ‘strongly encouraging’ regular stats collection generally is a positive step forward).

Histogram stats and large numbers of partitions can also lead to your SYSTEM tablespace increasing in size under V10 also. This can lead to your SYSTEM tablespace suddenly increasing from the usual 1/2GB or so to a couple of GB. I’ll need to be back in the office to qualify this with a couple of references though.

The issue seems to be that, with the new default behaviour under 10 of gathering histogram stats where Oracle decides it is sensible, all partitions now get far more histogram data. Each time a partition is DBMS_STATS.GATHER’d (ensured by the nightly/weekend DBMS_STATS_PROC_JOB when the object becomes stale) the old stats are stored away so you can return to them.

Lots of partitions, lots of column histograms, lots of stats gathering as the table grows…It creates a lot of data. The increase of SYSTEM tablespace may catch you out, but so can anything looking at the historgram stats on tables with lots of partitions. I’ve not yet dug into the actual impact on the system of all this data but it makes my hand-cut scripts to look at histogram stats run very slowly :-(.

It does become less of a problem as you get beyond a month past migrating to ten as Oracle trims the stored stats gathering information but it is something worth knowing about if you are migrating a large system with many, many partitions to V10.

I think the biggest issue with collecting histograms unnecessarily is the subsequent impact it has on the parsing of statements and the additional stress and overheads it puts on the shared pool and the row cache in particular.

The golden rule with regard to histograms (or at least my golden rule) is to create them only when they’re required and don’t create them when they’re unnecessary.

The AUTO option breaks this rule because it doesn’t necessarily create them when they’re required but more importantly, it creates way way way too many unnecessarily.

And lots of unnecessary histograms just add overheads for no benefit, which is never a good thing.

A possibly interesting observation: of course, density is used for the cardinality estimation (of equality predicates) only for unpopular values, and since the Frequency Histogram on ID2 shows only values whose multiplicity is > 1, they are all popular. Hence if you select an existent value, the cardinality estimation is exact, otherwise, if you select a non-existent value, the cardinality estimation is 1 (actually 0.5 rounded up to 1) – as exact as it can be:

Without an histogram, the cardinality for “6.1” would be 100K as well, not accurate at all, and producing a useless FTS.

So it might seem that the histogram is doing a good job – generating a more accurate plan for all values.

But, enter bind variables and bind variable peeking: no problem if the value of the bind variable is always an existent value (or always a non-existent one) – big problem if the application selects a mix of existent and non-existent values (or, a value that is not represented in the histogram since it was added after the stats collection) thus generating a plan that is optimal only for a part of the statements. It would be probably better, in this case, to get rid of the histogram, since a single non-optimal range scan of 100K rows (thus issuing potentially 100K consistent gets on the table) is probably enough to consume a noticeable share of the resources/latches … oh, unless the application almost always selects non-existent values …

So, it really depends on the data, the workload, whether the statements are binded or not … and only a (competent) DBA or Developer has all the informations needed to make a good decision.

Just tying this up with my previous discussion on constraints, if I had a table with a column such as ID2, which only has a smallish number of possible values (eg. 1,2,3,4,5,6,7,8,9,10) that were evenly distributed, I would have no histograms as it would be somewhat useless for values that exist AND a validated, non-deferrrable check constraint which would protect us from values that don’t exist (10g onwards).

But even so, it all still depends on bind variables, the workload etc. as you correctly stated.

One other thingy that has been sneaked into 10g is the default for NO_INVALIDATE. In 9i the default is FALSE, but in 10g the default is DBMS_STATS.AUTO_INVALIDATE – this would enable Oracle to automatically decide whether to invalidate dependent cursors or not. One more thing to watch out for.

This might be very basic…!
We have some tables where stats are collected only for indexed columns with no histograms…(method_opt => for indexed columns size 1).
Currently, the num_distinct,low_value,high_value,density,num_nulls ar all zero for all other columns except for columns that are indexed.
My understanding is that it is necessary to have basic stats on all columns and not just the indexed columns even when they are not part of the predicates as they would influence the optimiser’s decision on selectivity of those unindexed columns.
But, i am not sure on this.
Could this anamoly influence the execution plan?

You really should collect the minimum statistics on all your columns, not just the indexed ones. Without them, Oracle will be unable to get accurate cardinality estimates for the various steps in your execution plans not involving indexed columns, such as the cardinality of unindexed predicates and many of your join results. If these are inaccurate, then the order and position of subsequent joins could be sub-optimal, resulting in poor execution plans.

I know I am late to the table, we have just finally upgraded to 10g. Performance was bad untill we made sure all instances had good system stats.

We have a couple of schemas which still perform badly. One is a warehousing schema which contains many partitioned tables. I am not sure why but when we gather stats manually on the affected tables (those performing badly during data loads). Performance returns to normal. It seems like the auto stats process is not gathering good stats on these tables. The script we have been using did not set METHOD_OPT so we were using the 10g default.
The second schema is an OLTP system which performs many updates to quantity columns, only a small percentage of the other attributes change. Oracle has gathered many histograms on this schema.
I would like to change the METHOD_OPT paramater to the 9i default. However, since the auto stats gathering process also gathers stats on system tables. I am concerned that changing the default METHOD_OPT will impact stats on the system tables. Is there a danger in setting ‘METHOD_OPT’ to ‘FOR ALL COLUMNS SIZE 1′ for the gathering of stats on the system tables?

We have been using all the defaults with disasterous results using OEM to view tablespaces and files. Response is pitiful against LMT (as much as 5 minutes) but great against the system tablespace which is still DMT (1 -2 seconds). This applies only to 10g databases. We still have a few 9.2.0.7 because 3rd party vendors are not ready. A TAR with oracle did not help. I gave up after more than a month of running queries and submitting trace file after trace file.

Yes, we also had issues with execution plans with partitioned objects as well, several of which Oracle Support were not able to sort out.

We solved most issues by manually calculating globalpartition level stats and by using stored outlines on any remaining nasties.

My suggestion would be to use SIZE 1, even for system statistics. Only use histograms when you need to on a case by case basis. You simply don’t need or desire 1000s of unnecessary histograms which is the danger of the auto options, which can be just as disasterous with system generated tables and code.

Updated: That should have read manually calculating partition level stats

The schema has about 4000 objects with 21,000 indexes,which is a packaged application. The SQL’s are generated dynamically.
The performance is acceptable till now and we have been using the METHOD_OPT’ to ‘FOR ALL COLUMNS SIZE AUTO and ESTIMATE_PERCENT=>DBMS_STATS.AUTO_SAMPLE_SIZE for Stats Collection. All of a sudden last week,due to one Single SQL,the Application crawled,bringing the application to a stand still. Finally,We identified a Single table had casued the performance issue and we computed statistics and now the reponse is flying.

can you throw some light on what could have been wrong and how to correct these issues?

The fact you collected stats and things went bad and then recollected full stats and things went good again suggests there was a problem with the accuracy of your stats. The danger of collecting sample statistics.

Always a good idea before you make such changes to try and fix things that you try and quickly capture some information while things are bad. The previous stats and the previous execution plan and costings would have been useful in seeing what was wrong. The new stats and the new execution plan will then confirm how things need to be and the differences between the two would have told you much.

It only takes one bad SQL to bring a system down to its knees, it only takes one bad statistic to make a SQL statement go belly up.

My recommendation is not to use Size AUTO, it’s problematic and dangerous in my experience.

Hi Richard ,
i faced an issue with the generation of histograms in Oracle 10.2.0.4 with the method_opt parameter defaults to auto..
we got
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
then we degrade to the Oracle 9i default value of for all columns size 1.
Oracle support Reports it as a Bug..
I want to know whether Using of auto_sample_size for estimate_percent will it be Problematic?

Sorry for the delay in the comment appearing but the comment was originally treated as spam and nearly got lost !!

I don’t have time to respond in detail but a couple of quick comments:

1) 90000 histograms sounds like way way too many for most applications. Oracle going mental with the auto option again.

2) Histograms with bind variables and no bind peeking are somewhat useless as the CBO takes a stab at the selectivity (eg: 0.25% for bounded predicates, 5% for unbounded predicates, etc.) without considering histograms.

3) With bind peeking, then it all depends on the values being peeked, although 11g is much better at dealing with these scenarios.

Thanks for your input. The maintenance window that we have is very short and we have tables ranging from 1m to 185m rows. Computing statistics is not possible and we can go with estimate percent between 10 to 25 max. In addition to it, we have started exporting the existing statistics at schema level right before gathering fresh statistics. This would help us in restoring the statistics just in case something is wrong. what is the general guidelines for using METHOD_OPT’.I have seen ‘FOR ALL COLUMNS SIZE AUTO’ is dangerous sometimes. if i had to choose between other options, i can either use FOR ALL COLUMNS SIZE SKEWONLY or explicit numbers.The tables have column data skewed and how can i find the appropriate bucket size for oracle to generate histograms.
I guess SKEWONLY would take few extra hours when compared to AUTO.

I wasn’t suggesting you compute stats, we also don’t have the time or resources and in most cases it’s unnecessary.

However, there’s always the risk …

SKEWONLY is even worse than AUTO because at least AUTO restricts histograms to just those columns that haveparsed sql statements with associated referenced columns whereas SKEWONLY will collect histograms on any column Oracle deems appropriate, regardless of any workload.

Until Oracle gets it right more consistently, I recommended SIZE 1 for all columns, except those columns where it actually makes a difference to collect histograms. That’s where knowing the applications, knowing the data and historical references comes into play.

I disagree with your demo.The reason why the index wasn’t picked up irrespective histogram present or not that you’re not collecting statistics for the index on the table.I mean to say CASCADE => TRUE option was missing in your demo.

I just did the same demo by including CASCADE option while collecting stats,the index is picked up by the optimizer.

You’ve missed 2 important facts here that has caused you to misread the situation.

Firstly, the index does in fact have stats in my demo. Since 10g, index stats are automatically collected when an index is created. When the PK is added and the index created, Oracle has actually collected the stats behind the scenes. Re-run my demo and check the index stats just after creating the PK and you’ll see the stats are indeed there.

Therefore the difference you’re seeing has nothing to do with missing index stats …

The second point you’ve missed and what has caused the difference is that in my demo I compute the stats (estimate_percent is null) whereas you only collect sample stats. Here lies the key difference.

In my demo, Oracle is fully aware of the outlier value and has an endpoint_value of the outlier value. However, in your example, because you only collected sample stats, Oracle has missed out on collecting the outlier value and thinks the max value is only 999735.

Therefore, when you run your query, Oracle doesn’t think there are any values above 999735 and uses the average selectivity of a column value to estimate the number of rows to be retrieved (ie. 1 row). Therefore of course the CBO selects the index.

When I run my query, Oracle is aware of the outlier value and therefore assumes the vast majority of rows will be retrieved with my query because it assumes even distribution between my min / max value ranges and doesn’t pick up the fact there’s actually only the 1 row greater than 1000000.

The purpose of the demo was to hightlight issues with outlier values but the CBO isn’t aware of the outlier value in your example. If you were to compute stats fully, you would get the same results as in my demo. You made the classic mistake of changing two things (expliciting collecting stats on the index and only collecting sample stats on the table) and assumed the differences were due to one of the changes (the index stats) when in fact it was caused by the other change (the sample table stats).

You are right as the index stats are automatically collected in 10g.My mistake.oh no!But your demo and the answer to my comments forced me to ask few questions.Bear with me :-)

In reality,it’s impossible to collect/compute stats for all tables using 100% sample size.If I use AUTO SAMPLE SIZE (which is the only option i have,considering 3000+ tables and 100s of table having 100+ million data) the endlier value is always going to be incorrect.

Oracle documentation says that the AUTO option of METHOD_OPT force Oracle “to determine the columns to collect histograms based on data distribution and the workload of the columns.”

I could understand the workload part on columns and it’s straightforward.But what drives Oracle not to create histograms on column ID1?

<>

from the above explanation,I understood that the histogram on ID1 doesn’t influence optimzer in making decison on execution plans for SQLs with predicate greater than the outlier value as it’s uses average selectivity..This would be the case in real-time databases where it’s impossible to collect stats with 100% sample size.

Note that an outlier value could be picked up with sample stats, if depends on whether the sample used includes any outlier values. So it’s not “always” incorrect. In fact with 11g, the auto sampling method is likely to pick up such values.

Unfortunately, the processing that takes place to decide to collect or not to collect histograms is imperfect. In same cases (many in fact), it collects histograms unnecessarily, in other cases it doesn’t collect them when they could be useful.

Therefore, I don’t much like the auto method_opt option.

My thoughts are to generally collect stats with no histograms and only collect them specifically when necessary. In some cases, not having the outlier value in your stats is actually a good thing but is one of the reasons why things can suddenly go sour when they suddenly do get collected during a sample stat collection.

Life wasn’t meant to be easy and despite the docs, life can still be very unpleasant even with auto collected stats.

Yes, but the ID1 column has an outlier value so that the distribution of data between the min and max values is not even.

Yes and equality search condition will result in the correct cardinality estimate by the CBO but a range scan predicate will not.

In the example, when searching for data that has an ID1 value > 1000000, there is actually only 1 row that meets that condition however the CBO is estimating the vast majority of rows will be returned because it uses the following forumla to determine how many rows will be returned:

(max_value – predicate_value) / (max_value – min_value)

Because the max value is so massive and because the data isn’t evenly distributed between the max and min values, the estimated rows returned is way wrong.

You might indeed need a histogram, even if there are no duplicates, in such a scenario.

You suddenly introduce thousands of histograms that need to be loaded into the row cache and indeed the associated latches can burn red from overuse.

Providing the histograms are accurate they shouldn’t generate unwanted range scans, they simply result in Oracle having to do a lot more work to get to the same conclusion as it would have come to had the histograms not been there with the data evenly distributed.

Can you please confirm if both these methods are same in 10g: ALL INDEXED COLUMNS and ALL INDEXED COLUMNS SIZE AUTO. Do both of them generate histograms? What is the best method to gather schema stats if SIZE AUTO is not recommendable as Oracle creates histograms on required columns for us? How do we decide the best method_opt and also sampling size? We can certainly decide whether to have histogram on a column or columns for few SQLs. But, at whole schema level, what is the best approach to collect stats? We know that skewed data need histograms. Stating this, generally speaking, histograms are best for OLTP databases or Data Warehouse databases?

Yes, they’re both effectively the same as both would use an Auto sample size.

Just a couple of quick comments.

1) I wouldn’t recommend ALL INDEXED COLUMNS. Non-indexed columns can benefit from stats and histograms as well as indexed columns as the stats can provide important cardinality estimates of steps within an execution plan that can influence join orders and when steps are performed

2) The other way around, not all skewed data necessarily needs a histogram if the column is not used in such a manner as it governs how the CBO costs a plan (eg. if it’s only in the select list)

3) IMHO, in order to do the “best” thing by the CBO, you must know your data, you must know your applications and you must know how your data is being accessed. If you don’t know these things, you can’t accurately determine when a histogram should be collected and you’re forced to miss some you need or create (many) you don’t need. The best approach is to only collect histograms when they’re needed.

4) Generally speaking, histograms are just as useful with both OLTP and DWH applications. But with OLTP, bind values can be problematic, although gradually less so with each new release of Oracle.

Thanks for the valuable information.
When we are tuning complex queries with multiple join and filter conditions and huge million row tables Is the suggested approach is to take predicate column and run it seperately and get the data distribution to determine if it would benefit with histograms. Also the number of buckets in the following syntax
method_opt => ‘FOR COLUMNS ID1 SIZE 254′
estimate_percent=>null

For method_opt – will 254 still be valid if the number of distinct values is > 254. Is there any recommended value for SIZE based on distinct no of values?

estimate_percent – we usually specify 10% for huge tables so this will still be the valid and works ok for million rows table?

Hi Richard ,
i faced an issue with the generation of histograms in Oracle 10.2.0.4 with the method_opt parameter defaults to auto..
we got
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
then we degrade to the Oracle 9i default value of for all columns size 1.
Oracle support Reports it as a Bug..
I want to know whether Using of auto_sample_size for estimate_percent will it be Problematic?

1. Histograms are a real problem in a DW I support (10.2.0.4). Can I change the Histograms parameter in OEM > Manage Optimizer Statistics > Global Statistics Gathering Options from “FOR ALL COLUMNS SIZE AUTO” to “FOR ALL COLUMNS SIZE 1″ to prevent histograms from being generated? Do you recommend any other changes to the Global Statistics Gathering Options?

2. If I want to disable the daily gather_stats_job (it never finishes by 6am) and create gather_schema_stats for each schema, does the following command look correct? Does ‘GATHER AUTO’ override ‘FOR ALL COLUMNS SIZE 1’?

1) If histograms are causing you problems, then yes, ‘FOR ALL COLUMNS SIZE 1′ will get rid of them. Note though histograms are generally more useful in a DW so save your stats first (eg. with export_schema_stats) to make sure getting rid of them all doesn’t cause more problems than it solves. Then reintroduce any that really are needed.

2) Yes it does, the method_opt will be ignored. Get rid of the options clause to make method_opt have an effect.

I found this thread very useful for my problem with System TableSpace. I have to disable Auto Gather job because of frequent increment of System Table space. However i have asked by metalink to set ‘METHOD_OPT’,’FOR ALL COLUMNS SIZE 1′. However how can i determine what is the current value for ‘METHOD_OPT’?

If you look at the number of buckets used for any given segment (eg. dba_tab_columns), if it’s a height base histogram, you’ll know the value and if it’s frequency based you know that at least this value was specified for method_opt.

1)We were facing some issue with one of siebel application database.The plan is getting changed for one of the sql and froze the application for three times.Then after analysis there were some unanalyzed indexes corresponding to the table. Then we analyzed it ,then it went to a better plan and issue never came. Also we saw lots of unanalyzed indexes,then planned to analyze all the indexes and found the performance drastically became down and dropped the idea.This was done in stage .

2) There was one more occurences where in we had a plan change and siebel recommended not to collect statistics for zero row tables and we are doing the testing for the same.

3) Now the same plan change issue is coming for some sql or the other. The plan gets changed of course after a weekly analyze job. Then when application report slowness,we flush the shared pool either the previous better plan comes or even some time new plan which is very efficient comes in .We do a default analyze of 10% for table and 10% for index and was running without no issues for last 2 years after 10g upgrade .Application team as usual tells the data is being loaded as it was loaded for last 3 years .. Do you have any comments on it where could be the issue?

Firstly, you need to capture the before and after plans, make comparisons and determine why the CBO has made a different decision.

Without much to go on, it sounds like you could have a bind peeking issue, with plans changing depending the different values you may have for bind variables at different points in time.

If you go with the default settings, you likely have lots of histograms and hence more likely to run into these sorts of issues.

If you have some tables/indexes, some not, then this is usually not a good thing as the CBO as Oracle needs to either make guesses or perform dynamic sampling on the fly which can cause different stats on the fly at different times.

That said, for tables that are empty when you collect stats but are populated later, having no stats is actually a good idea as dynamic sampling is far better than the CBO thinking a table is empty when it’s not.

I would ensure all segments have stats (that are not empty), that you check number of histograms and that you check for bind peeking issues (capture v$bind_capture at different times).

This is a very useful article followed by very useful discussions. I have a question. We are not 9.2.0.8. As you suggested and many people opined that it is not a good idea to gather histograms all on columns and not even on all indexed columns. We run gather stats on all all tables using DBMS_STATS once in every three weeks. My question is, is there a way to figure out what tables I need to gather histograms using method_opt > ‘for all indexes columns size auto’?

There’s no easy way. It really comes back to knowing your applications, knowing your data and spotting which execution plans are problematic due to skewed data (users are good at letting you know which parts of the application run poorly, checking statspack/v$sql for high resource queries etc.).

Once an application has been around for a while, these problem columns rise and bubble to the surface and can be dealt with via a histogram as necessary.

The day will come when we can simply just collect histograms on everything but in most cases, we’re not at that point yet.

Need your suggetion on below, we have a Production Database which i have started looking into recently, where 10g Auto stats gather job(GATHER_STATS_JOB: daily as per the maintanence window) and additionally a manual stats gathering job(DBMS_STATS.GATHER_SCHEMA_STATS : weekly thrice) are running.

Problem is these jobs are taking more the 12 hours, which means even these are running in bussiness hours as well.

My questions are

1. Is it required to run both the jobs, if not which one is recommanded.
2. Below is the job which runs thrice a week, is it required to modify/tune ?

If you’re not comfortable with how best to collect stats, then the automatic job and the 10% stale mark is probably the way to go.

If you are comfortable with how to best collect stats, you know your data and applications, then probably collecting stats manually with a default size of 1 is best, collecting histograms as necessary.

If also depends somewhat on the size of the database, number of columns in the database, load, number of users, distinct queries, amount of system resources, etc. If these metrics are low (eg. few users, fewish columns, lowish load), then the automatic job is likely fine.

Additionally, if it’s taking too long (12 hours being too long), then look at the sample size and how you’re collecting partition level stats as well. Lower sample, less time and resources and possibly accurate enough stats.

We have a 10.2.0.4 database small datawarehouse, I have a 32gig table that is taking 21 hours to regather stats after deleting stats. I use 20% estimate and method_opt of for all columns skewed only. I’ve read many articles on 10g STATS and they say to grab the biggest estimate% possible. What are your thoughts ? Do you think the “skewed only” clause is slowing me down ? Also my plan is to run this INITIAL gather of ALL but then nightly just regather STALE with same STATS settings. Does this sound like a good approach ?

20% is a very large sample, you might consider a smaller sample size (say 10% initially, then to 5% if there are still no issues, etc.) and see if the stats are good enough while reducing your stats gathering times.

I would certainly caution against skewed only, it will likely produce many more histograms than are necessary in your enviornment. AUTO is bad enough but skewed only is even worse as it doesn’t consider columns being used in SQL to warrant a histogram in the first place. So yes, it’s very likely slowing you down AND generating unnecessary histograms in the process. Only collect histograms when necessary (which might not be so easy to determine initially with tons of histograms around).

These 2 steps alone should significantly reduce gathering times. Also, if you have partitioned objects, might need to consider how partition stats are gathered as well.

I understand FH is when there are upto 254 distinct values and then beyond this a HBH will be generated? But I’ve see HBH even when there are less than 254 distinct values so how’s that created pls and could I somehow tell Oracle to generate the type of histogram that I want pls?

Oracle uses HBH when there are fewer distinct values than there are buckets available as this means each value can have its own bucket and be effectively 100% accurate (based on a 100% sample size of course).

The default number of buckets is 75 so you can have FBH even if there are fewer than 254 distinct values if not enough buckets have been defined.

Answer given in case someone else has read your question but not the doco :)

Thanks for the blog post richard, very interesting findings. We had a similar issue with histograms in our environment. We had gather stats running every weekend. Everything was working fine for the past months until one day suddenly everyone was screaming that their queries were taking a long time to run all of a sudden. We did everything, even added an additional CPU to the server because we suspected that it was a capacity issue – which was a bad move. Later on we found out that the problem was with the histograms. Disabling them for some indexes did resolve the issue.

By the way, I am still not able to view your demo. Would it be possible to ask you if you can add me to your blog or to the privacy page so that I can download your demos? It seems like the new guidelines for wordpress on privacy of media is making it difficult to share things.

I read comments on the blog. They are very helpful to me. I have a question. Assume I have gathered stats using method_opt=>’FOR ALL COLUMNS SIZE 1′. Then after collecting stats, can I add histograms to the already collected stats one by one basing on AWR reports ? I am using 10.2.0.4 version of oracle. What I mean is after collecting stats with no histograms, Can I add histograms based on my AWR Report one after the other. Please clarify.