If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

- Severely Skewed distributions
If you have a lookup code that can be 1 of 5 values, an even distribution would have 1/5th, or 20% of the records containing each value. Severely skewed statistics would be more like:
Value1 - 60%
Value2 - 35%
Value3 - 3%
Value2 - 1%
Value1 - 1%

In such circumstances, it is almost impossible for the optimizer to choose an execution plan that will be optimal for all values. A Histogram will give it the information it needs to make the proper choices for each different value.

- 'Extreme' Data Values
This is simply an extrapolation of the first case.
Say, for example, you have an indexed date field and 99.9% of the values are within the past ten years. Now what happens if there is a single date that has the year 1895? Such an entry will severely affect the optimizer's view of the data in that field. The ANALYZE process finds the lowest and highest values and stores these stats. The optimizer assumes that the distribution between these low and high values is 'flat'. That extreme date value will make the optimizer think that you have just as many 1899 entries as 1999 entries, which is certainly not the case. Again, the optimizer will make bad choices based on this incomplete data. In particular, it will affect ranged queries, which are quite common in date fields, hence my example. ie WHERE DateColumn BETWEEN '1/1/00' and '2/1/00'

In both of these cases, creating a histogram will provide the optimizer with enough information to make 'good' decisions when building the execution plans for statements using these fields.

Now, there are several problems with histograms.

- They *do not* work with binds. The values *must* be hard-coded in the statement. This means that they also will not work with CURSOR_SHARED=FORCE.

- Hard-coding these values will mean more statements in your shared pool and more parsing.

- Histograms must be re-built after every ANALYZE, which wipes them out.

- I hit a problem that I cannot find any corroboration on with exporting and re-importing stats with histograms via the DBMS_STATS package.

- Histograms go out the window with distributed queries.

However, given all that, they were still a life-saver for performance on my current project where many columns were already using hard-coded values, several columns have 'extreme' date values and *every* bloody column has severely-skewed distributions.

Originally posted by chrisrlong IMHO, - They *do not* work with binds. The values *must* be hard-coded in the statement. This means that they also will not work with CURSOR_SHARED=FORCE.

- Hard-coding these values will mean more statements in your shared pool and more parsing.

If your system is OLTP or running 24x7 and/or has heavy load by many SQL with hardcoded values and it depends on Oracle version as well (some say 816 is quite ok), then the point above can be real performance killer (even cursor_shared=force will consume 25% of CPU more per parse).
The ixora site and OAUGnet-DBA mailing list has A LOT OF problem descriptions regarding that and my experience is that while on development system you may not experience the problem on a real system it may mean system almost freezing.

If your system suffers from that the only solutions folks found are NOT TO USE hardcoded values(this is a stringly recomended solution ) or run alter system flush shared pool on some regular basis, e.g. before starting work day.

The real problem is, as noted by some authors, that Oracle does contradicts itself: binding of variables is ok for shared pool performance, but not for optimizer, hardcoding of values can be a killer for shared pool but is good for optimizer.

What is Histogram ?

Histogram===>These are statistics collected by ANALYZE command which is stored in data dictionary views and used ny the CBO to tune the execution of SQL statement.

CBO==>Cost Based Optimizer.

For Uniformly distributed data the CBO accurately guesses the cost of executing a particulat statement.However when the data is not uniformly distributed the optimizer cannot accurately estimate the selectivity of a query .In this casesHISTOGRAMS are necessary to describe the distribution of a particular columns.

When to USe Hostograms
-----------------------------

For columns frequently used in WHERE clauses, i.e indexed columns and moreover having highly skewed data distribution.

When Not to USe Hostograms
----------------------------------
On columns used in predicate with bind variables
On uniformly distributed data columns
on unique cols used in equality predicates