Another blog discussing Oracle. I will try to post technical and non-technical items related to Oracle and my job as a oracle DBA.

Thursday, August 10, 2006

I am in for it now

I have been having, "heated discussions" with a few (male, read on) OCP's from a subsidiary company. Arguing over some Oracle B*tree index myths and other things they "know" about oracle. These fellows are 10g OCP, but all straight out of university and very gung ho. ADF over oracle, .NET over oracle (shudder) and no need for any business logic in the database. Very smart and knowledgeable fellows who are an asset to the company. I just keep slamming them on their database interaction work as I do the performance monitoring on their databases and they are a little sick and tired of ticket after ticket after ticket coming their way of "Bad SQL" or "Inefficient SQL" and the one that started this "Unused Indexes" and thought they would speak up. All very friendly of course.

We came to a agreement that I would put together a nice document blowing away their index theories. I will work on the rest of their theories later. When I have successfully debunked their myths, they will wear skirts,bobby socks, a blouse and makeup to work for a period of no less than 4 straight hours covering at minimum of two of the following - Morning arrival,Morning coffee break, lunch, afternoon coffee break or 5pm departure. If I fail, I will have to wear the same outfit. We have manager approval and backing for this and odds are it will end up in the company newsletter. Not entirely fair as they have 9-12 people at their location. I have 200+ people in this office, but I think I have the legs to pull it off :).

I personally believe these to be myths, well one because Thomas Kyte says so, and two I did some testing and checked it out in the past.

The following are the 3 index myths we decided on, well statements I told them were wrong and they argued the point. They are going to put together tests to prove me wrong. In a event of a tie or unbreakable test cases, a simple toss of a coin will decide our fates. Best out of 3 wins.

Myth #1

You must use a index, full tablescans are always less efficient and slower.

Myth #2

If you index each individual column the optimizer will use multiple indexes in a query if the predicate has any combination of the columns, so build a index on every column.

Myth #3

You must build a calculated column to properly index a date field. So to store a date, the best way to do it is with a VARCHAR field to improve DML to the table.

Over the next few days, I will be putting together the tests and posting the results. Since we never spoke about the ability to look for outside corroboration or help, I thought I would do the tests, and then run them by my blog to see if any readers can find fault in them. Due date for the tests is the end of the month. Management didn't want us using too much company time for this childishness. I know I have 2 and 3 in the bag.. #1 is going to be hard to prove, I think in one of Tom's books he does something similar. I will be reading those chapters over again tonight.

Parameters are Oracle 10gR2 on windows (shudder) with the following somewhat abbreviated init.ora (some parameters removed):

6 comments:

I don't think you have anything to worry about. If nothing else it goes to prove that an OCP certification is no substitute for a good understanding of the workings of the database...

Myth 1 says that if there is an index, you should use it. Well what if you need to return every row from the table? Are your colleagues seriously suggesting that you should use the index to locate each row? How would this result in less i/o? It wouldn't, of course. That in itself should be all that is necessary to disprove Myth 1.

There is still a widely held "belief" that there is a magic 5% threshhold where if you are selecting more than 5% of the rows from a table it suddenly becomes more beneficial (in terms of i/o) to do a FTS rather than an index lookup. In reality, it depends on the length of the row and how the rows are distributed - the clustering factor. The higher the clustering factor, the more ordered the rows are in the index. In other words, entries in a given index leaf block will be likely to point to rows in the same data blocks.

So, in an index range scan, you walk the tree to locate the first leaf block, and then you go from leaf block to leaf block (along the linked list) getting all the rowids you need. The amount of i/o you will be doing therefore depends on the likelihood that as you go from one rowid to the next, that row is physically in the same table block as the one you have just read.

This means you can deliberately set up a situation where you have 2 identical tables each with an identical index, and fill one with ordered data and one with disorganized data. You can then see that when the data is disorganized, the use of the index results in more i/o than simply performing a full table scan. This is all you need to do to prove Myth 1 wrong.

There's a good example of exactly this in Tom Kyte's Expert One On One, on page 280.

I think your biggest 'threat' to myth 1 is full index scans. If all the required columns are in an index (and at least one is not null) then you'd probably get better performance from the index scan than the tablescan.Of course that would mean that if you want all the columns, they would all need to be in the index in which case you maybe want a index-organised table.