Stabilize Oracle 10G’s Bind Peeking Behaviour

I wrote this post because I feel there is a great need for it. The number of people struggling with unstable query plans due to bind peeking in Oracle 10G is enormous, to say the least. More than that, solutions like disabling bind variable peeking are driving us away from understanding the root cause of the problem and applying the right fix to it.

What are the causes of unstable plans due to bind variable peeking?

There are three things that might put you at risk of unstable plans due to bind variable peeking. Those are histograms, partitions, and range-based predicates. I’ll cover last two in upcoming blog posts.

Here comes my déja vu. If you tell me that you disabled bind peeking, my immediate response will be “do you have a lot of unnecessary histograms?” On an OLTP system, there is no way that you need histograms on a third of your tables (and I can hardly think of any DSS system where this amount of histograms can be justified).

That is, Oracle is able to choose an index range scan when N=1 (returns only one row) and do an FTS when N=0 (returns 9900 rows) — perfect execution plans given the conditions. What does this have to do with bind peeking you ask? Well, imagine that you’ve submitted the following query:

select * from t where n=:n;

We used a bind variable in place of a literal. On a hard parse, Oracle will peek at the value you’ve used for :n, and will optimize the query as if you’ve submitted the same query with this literal instead. The problem is that, in 10G, bind variable peeking happens only on a hard parse, which means that all following executions will use the same plan, regardless of the bind variable value. This is easy enough to demonstrate:

Both queries would use an index range scan if I had started my example with exec :n:=1 instead of exec :n:=0.

Now, if 99% of your queries are not interested in querying the above table with :N:=0, then you want the plan with an index range scan because it will provide optimal performance most of the time, while resulting in suboptimal performance for only 1% of executions. With the histogram in place, one day you will be unlucky enough to have Oracle hard parse the query with a bind variable value of 0, which will force everyone else to use an FTS (as was demonstrated above), which in turn will result an abysmal performances for 99% of executions (until the next hard parse when you might get lucky again). And if you have a system where a third of the tables have histograms on them then — I think you probably get the idea now.

What to do?

Well, just get rid of any histogram that does nothing but messes up your execution plans. That’s easy enough:

No more surprises! This also means that you’ll get the same plan regardless of the bind variable value during a hard parse. However, the real question is . . .

What is making all those histograms I didn’t ask for?

It is a default behavior. With the declaration of RBO’s obsolescence in 10G, we were also presented with a default gather stats job in every 10G database. This jobs runs with a whole bunch of AUTO parameters, but one parameter is of particular interest to us:

The SIZE ... part controls histograms collection. You can get the definition of AUTO in the Oracle documentation:

AUTO: Oracle determines the columns to collect histograms based on data distribution and the workload of the columns.

Well, as it says — Oracle will decide on which columns to collect histograms. Not you. As a result, every day when this jobs runs, you might be presented with something new to make your life more interesting.

Why SIZE AUTO is doing such a bad job?

Because coming up with optimal parameters for statistic gathering involves many more variables than the DBMS_STATS package can ever have. As a result, AUTO tends to be a “lowest common denominator”.

For example, if you have an OLTP system, chances are you don’t need histograms at all (apart from on a couple of tables, maybe). But how can DBMS_STATS know which type of a system do you run? OLTP, DWH, or mixed? Or maybe you’re doing OLTP five days per week and DWH on a weekend? DBMS_STATS tries to use some heuristics to come up with an answer, but that’s why two days of DWH on a weekend can completely screw up your OLTP activity for the rest of the week. DBMS_STATS just doesn’t have enough information. We humans do.

What could we use instead of AUTO?

Because to the above, there is no single answer since it involves “know your data” and “apply your domain knowledge”. However, there is one option that works particularly well for most environments. I’m talking about REPEAT.

REPEAT: Collects histograms only on the columns that already have histograms.

That is, Oracle will no longer make histograms you didn’t ask for. This will be your first step in stabilizing your bind peeking behaviour:

What to do with existing histograms?

Dealing with them depends on the situation. Chances are, however, you have many more histograms than you’ll ever need. That means that starting from scratch to figure out when do you need histograms is usually a much simpler task compared to the clean-up of existing onces. If this is your case, then it might be a good idea to wipe out all histograms in a database (gather your stats with FOR ALL COLUMNS SIZE 1 clause), and manually add them when you decide that you really need one.

The number of times I have had to go back to add histogram after a complete wipe out is surprisingly low — much lower than number of surprises histograms were causing on these systems, and those systems never had any bind peeking surprises due to the excessive amount of unnecessary histograms.

Are there any other sources of unnecessary histograms in my system?

Absolutely. People are the next source of unnecessarily histograms. It constantly surprises me how many people treat histograms as a kind of a silver bullet. For example, someone recently communicated to me that select count(*) from table was running slow, and added that, “maybe we should collect histograms on that table.” How on earth will a histogram help you to run this query faster? Many histograms on your system might be a result of a complete misunderstanding how histograms work, what they do, and, more important, what they do not.

I have a misbehaving query due to incorrect peeking caused by histogram, and I need to fix it right now. What do I do ?

Don’t hurry to flush your shared pool. First, as a result of a complete brain cleaning, your instance will have to hard parse everything it had in the library cache, causing tons of redundant work. Second, and this is much more important, these hard parses might well result in an incorrect peeking for some other queries. So you might end up in a worse situation than you were in before.

The right way is to get rid of a histogram by collecting stats on required table, with METHOD_OPT => 'FOR COLUMNS X SIZE 1' and NO_INVALIDATE => FALSE. This will cause all dependent cursors to be invalidated immediately after stats have been gathered.

Sometimes, however, you don’t have enough time to understand what caused a problem (or you simply don’t have time to regather the stats) and, if probability theory is on your side (chances for a good peeking are much higher), all you have to do to invalidate dependent cursors is to create comment on a table:

The above will invalidate cursors that depend only on a specific table, thus significantly decreasing the risk of side effects.

Isn’t that much more work compared to simply turning off bind variable peeking?

Actually it is not. If you are starting a new system, all you have to do is to modify the default parameter from AUTO to REPEAT and you are done. You’ll have to create all required histograms manually but that’s our intentional goal: to do histogram creation in a controllable and predictable manner.

For existing systems that are plagued by gazillions of histograms, you’ll have to figure out what to do. Wiping out histograms for entire database will do for queries with bind variables pretty much the same as turning off bind peeking. I still think, however, that disabling bind peeking in this situation is the wrong choice, since (a) you still run the risk of getting unpredictable results from queries with literals; and (b) you will be doing tons more work during statistics collection, since histogram computation is expensive.

Is there anything else wrong with disabling bind peeking?

Running Oracle Database with an underscore parameter makes you different from the rest of the world, and this is not how Oracle Database was tested and intended to be run in a first place (think bugs). While disabling bind peeking seems to be relatively safe, it also very easy to avoid doing so.

Get histograms under control

Making sure new histograms appear in a controllable and predictable manner will be your first step in building a predictable environment.

Excellent!
Would like to mention that there is a patch out (10.2.0.3 ; linux ; p5579764) which makes it easier to delete histograms rather than
SQL> exec dbms_stats.gather_table_stats(user, ‘t’, method_opt => ‘for columns n size 1’, cascade => true);

Controlling histograms is good, but there are a couple of higher level problems with that approach.

1. How often do you go and analyze your databases and decide which columns need histograms?
2. What would you do when you have a mix of values that are bound and skewed data? Simply don’t gather histograms? Isn’t that the same as turning off peeking?
3. How would queries with literals cause unstable plans?

One thing I found to be a bit unreliable is gathering histograms with sampling. That is just asking for trouble, I agree.

For me, bind peeking as implemented until 10g (11g has a new feature for this) is just a quick and dirty fix to the inability to use histograms with binded sql. To me, it never looked like a good idea.

And finally, and most importantly bind peeking will also cause problems without any histograms at all. The min/max values can cause you enought trouble with peeking.

Notice the consistent gets, this is actually an index range scan. Why? Because the value we passed as parameter is above the “max” value for that column. Oracle assumes there will be no data. So far so good, bind peeking makes our stuff run faster !

Note, this is still a range scan. Ooops, not so good now, accessing the hole table via index. Note that I had to add the number column and have the index on 2 columns, so that I dont access the table top to bottom, but in a controlled random way. I.e. Reading block 1, then block 101, then 201 and etc.

1. When business tells me that something is running slow and we can benefit from a speed up. If histogram is a solution when I’ll apply it (but it may as well be something else).
2. I found this to be a rare case. I’ll cover my favorite solution in the upcoming post about range based predicates.
3. Well, you don’t have a histogram today and you do have tomorrow. So quires with literals against this table may change their plans. I’m pretty sure 100 of those will run faster but there might be one which will run slower. And you know… this will be the only one they (users) remember.

1. Ah, see that’s reactive tunning. What about proactive? Reducing the overall footstep of the application? Why waiting for something to break before trying to fix it? If process A is slow, but no one cares, and process A consumes a resource (cpu,memory,io) and causes process B to be slow, that happens to be critical?
We need to be proactive and make our systems run stable and as efficient as possible.

2. Looking forward.

3. Too specific to the system. I cant argue :)

Seems like we both agree that peeking can cause problems. You are emphasizing that the solution should be to not use histograms not turning off peeking.

[…] Fatkulin also looks at the CBO, particularly at strange explain plans and tears apart bind variable peeking. He offers us real options on how to deal with it. He says, “The number of people struggling […]

I think that you missed one thing !!!!
I know that it is “DESIRABLE” to share cursors, particularly in an OLTP environment, for all the good reasons however, we should think whether or not bind variable should be use for critical SQL statements.
Here is excerpt from Oracle 10.2 Database Performance Tuning Guide manual Chapter 13 or metalink note 387394.1.

“When bind variables are used in a statement, it is assumed that cursor sharing is intended and that different invocations are supposed to use the same execution plan. If different invocations of the cursor would significantly benefit from different execution plans, then bind variables may have been used inappropriately in the SQL statement.”

I personally rather to use literals or tune SQL with hints or code application to use proper version of execution plan as preferred methods rather than deletion of histogram or turning off bind peeking for critical SQL statements because neither of the suggested methods are free of side effect and 100% answer.

we are actually talking about the same thing. Replacing bind variables with literals are subject to “do the math” side of things as well.

Imagine you have a table with a lots of different values and you are queering on multiple columns each column with a skewed data. If every column has index on it, then you perfect execution plan will be to use one of these indexes depending on actual value. However, by simply replacing bind variables with literals you can easily trash your shared pool memory.

Replacing bind variables with literals are subject to “do the math” side of things as well.

I plan to stop by on a “catch 22” sort of things in upcoming blog posts.

I agree for most part with Christo Kutrovsky. I think that the problem are not histograms, but CBO: CBO and bind variables are enemies. We have an OLTP application and until 9iR2 we have used RBO without problems. Majority of CBO features are good for BI/DWH related queries. What effectively you demonstrate is that RBO is enought to do a good job. Unfortunately Oracle has choosen to use only CBO and this is a pity for me (until the we don’t make analitycal queries).

Question –
So If I remove the unnecessary histogram and bind peeking remains enabled, then during a hard parse when Oracle is peeking at my bind var, what data will the CBO use to make a plan decision since the histogram has been removed? It almost sounds like if I remove the histogram then I’m kind of disabling bind keeping for that query also. What do you think?

Eddie, you don’t disable BVP by removing histograms – it still happens. By removing histograms you just avoid most common scenario when bind variable peeking can cause different execution plans. There are other scenarios like range scan proximity to min/max value and partitioning (I wrote about the latter a while ago).

[…] my previous post, I described the most common cause for unstable plans due to bind peeking — histograms. It is now time to move forward and take a look at another case, namely range-based predicates. […]

I still prefer the optimizer to make a decision. The advantage of letting the optimizer to do the right job is that you are fixing not only the current query but potential future problems (including these you haven’t discovered yet). Outline fixes the immediate issue without actually addressing the root cause.

Addressing the root cause when dealing a system able to make decision is very a optimistic assertion. The question is not if you are good DBA or not, but only what will be the consequence if you made a mistake in the assumtion you solved the problem.

Once one recognize a problem, acknowlege a solution, acknowlege that everything that is not the solution may only cause harm, why on earth do you want to discover, post mortem, the reason the optimizer diverged?

We deal with a online banking transaction. the master SQL is run hundreds time per second with 9 gets. Shall it be 100 it is still good. But bind peeking may send it on FTS partition and that’s 27000 gets and machines down on its knee and transaction fall in timeout cancel third parties: a disaster it is.

Beside this SQL all others are fine and statistics are run with gather method_opt ‘auto’. the DB, 10.2.0.3, has created several hundred histogram. Good not good, why should you bother to analyze the perticence of any of them as long as the system is fast but weird behaviour on the master SQL (most run and primary SQL in the application).

So here your are with a system quite good out of the box but sometime a bind peeking may send the master SQL on the wrong path. Metling with the system gather statistics algorithm was not found to be a promising way since the problem is local to only one SQL.

The stored outlines is most appropirate in this case and you will find hard time to convince anybody in the department that we should leave even 0.0001% latitude to the optimzer to think otherwise in the futur.

You serie is interresting but the solution you propose is irrealistic in term of stability.
Manager want absolute stability, not adaptability.

Having said that, I am waiting to read on your third post in the serie :p

given that you have only one SQL in a dozen you may find it is well appropriate to use outline. I wouldn’t argue on that one, however, we need not to forget that “system being run fine for months, nothing changed, then one day..” actually means what your single SQL statement is what you hit by far and you just don’t know about the other ones (yet).

Well, there is no absolutely best approach aka silver bullet. Some environment are better off disabling bvp completely and some are fine just controlling histograms.

My most favorite would be to disable all workarounds unless there is very good reason to use them such as:
– _optim_peek_user_binds=false
– cursor_sharing=exact
– “FOR ALL COLUMNS SIZE 1” followed by regular “FOR ALL COLUMNS SIZE REPEAT”
etc.

The first two workarounds in the list is for bad software while the third one is for “unusual” data.

So both BVP and histograms are evil but there is an exception for every rule. ;-)

I’ve found your article very informative but there is still one question that is remained not answered for me and it also bothers me in my actual work is – how can I manually set the values of histograms?

I’m talking about something like DBMS_STATS.SET_COLUMN_STATS…

It’ll be very helpful in cases that I know the right skewed distribution of data in some column and don’t want to take a chance that during statistics gathering the distribution of the values in that column may be unusual due to irregular high/low load or error.

Currently am facing a case where we get a bad plan when Oracle peeks an out-of-range varchar ( out-of-range on the low side ) – this is without histograms – it’s as if the out-of-range code path doesn’t handle this case well. I really don’t want to disable bind variable peeking for the whole user community, but I can’t get to the application code either. Just thought I’d add the extra complication of out-of-range values – usually with bind variable peeking we talk about skewed data. Our data is plenty skewed, but I got rid of the histograms, and when that out-of-range value shows up the plan goes south.

Bind variable peeking can also be confused by partitioning. If you have both small and large partitions for the same table, you could end up with a bad access path because of that, even without histograms.

PYTHIAN®, LOVE YOUR DATA®, and ADMINISCOPE® are trademarks and registered trademarks owned by Pythian in North America and certain other countries, and are valuable assets of our company. Other brands, product and company names on this website may be trademarks or registered trademarks of Pythian or of third parties. Use of trademarks without permission is strictly prohibited.