Oracle 10053 Trace Analysis Essay

There are two trace files that I use when tuning. Cary Millsap @CaryMillsap and Method-R @MethodR have a great tool for analyzing the 10046 trace file. There is the profiler and my personal favorite mrskew http://method-r.com/store .

If you are an Oracle DBA, Developer or Performance Engineer, these should be in your tool box.

Like this:

LikeLoading...

Related

This entry was posted in Cost Based Optimizer, Database Stuff and tagged 10053 10046 by rlockard. Bookmark the permalink.

About rlockard

Robert Lockard is a professional Oracle Designer, Developer and DBA working in the world of financial intelligence. In 1987 his boss called him into his office and told him that he is now their Oracle Wizard then handed him a stack of Oracle tapes and told him to load it on the VAX. Sense then, Robert has worked exclusively as an Oracle database designer, developer and Database Administrator. Robert enjoys flying vintage aircraft, racing sailboats, photography, and technical diving. Robert owns and fly’s the “Spirit of Baltimore Hon” a restored 1948 Ryan Navion and lives in Glen Burnie Maryland on Marley Creek

View all posts by rlockard →

10053 to the rescue again!

I had a curious incident the other day. I had two databases, each containing an identical copy of a table. Previously, queries referencing the table in one database had run in the same time as queries referencing the table in the other database. Now, however, one of the databases ran the query painfully slowly.

When I took a look, the execution plans for the queries were not the same, so Of course, the first place I looked was in the statistics. What I found was quite interesting - the "fast" table had been analyzed with a sample of 10%, whereas the "slow" table had been analyzed 100%. When I re-analyzed the table with a 10% sample size, hey presto the original (fast) plan was once again chosen.

So I started taking a look at HOW the CBO had come up with the respective plans, and I did this with a 10053 trace. (Although the query was quite complicated, I've simplified it right down in this post, for clarity.) What I'll do here is explain first what the problem was, and then show how it affected the decision of the CBO...

The problem was that in the million row table, there was one row which had an incorrect value. It was a date of 13-JUL-8569. It was just wrong. Very wrong. There shouldn't have been anything higher than SYSDATE plus about a month. Here's what I saw (table and column names changed to protect the innocent!):

My_table has an index on my_date.

In the query we had the predicate WHERE my_date &LT SYSDATE. So without a histogram, the optimizer was assuming a uniform distribution of dates between 1996 and 8569, meaning that the likelihood of a date being &LT SYSDATE was pretty small. In fact, without this bad date, the range would be between 1996 and SYSDATE+30, meaning that the likelihood of a date being &LT SYSDATE was extremely high.

What had happened was that in sampling at 10%, we had not sampled the bad date, and hence calculated a sensible cardinality, but as soon as we sampled 100% we included the bad date and the cardinality was stupidly low. (Needless to say, the solution was to fix the bad data rather than to gamble on the sample missing it!)

To illustrate this I have recreated a simplified version below.

So you can see that the index range scan looks preferable when the bad data is analyzed. The operation is assigned a falsely low cost, and you can see the far greater number of consistent gets required.

As I mentioned, the original query was a quite complicated, and involved joining several tables, so when I started looking at the problem, it wasn't quite a "needle in a haystack" but it was, shall we say, not obvious. It was the 10053 trace which led me to find the cause of the problem. I did this as follows:

The trace files were in my udump directory, and this is what I saw:

When bad data excluded, query runs fast using Full Table Scan -

.When bad data included, query runs slowly using Index Range Scan -

.Here you can see here the falsely low computed cardinality, which in the case of the predicate I used, is NUM_ROWS * (value-low / high-low).

I don't propose to reproduce it all here, but the paper by Wolfgang Breitling entitled "A look under the hood of CBO" is a good place to start reading up on this stuff if you are not familiar, and of course you can't discuss this without mentioning Jonathan Lewis' excellent "Cost Based Oracle Fundamentals".