The first query is using a nested loop and indexes.
While the second one is doing a hash join and full table scan.
The only difference is value of HD_PERSON_ID.
Now when I force index hint on second query and check the plan the cost is much more.
The first question, how can same query have different cost for two different person ids.
Second, how can I enforce the plan of first query for the execution of other.

tsharma@healthdialog.com wrote:
The first question, how can same query have different cost for two different person ids.
Second, how can I enforce the plan of first query for the execution of other.

Cost is an estimate of how expensive a query will be to execute, based on a lot of different factors. The method to generate the hash join you mentioned will be different from that for the nested loops, and one will probably be more efficient than another. Nested loops work well when a small number of rows are looked up with indexes from one of the tables in a join. Hash joins work well when a large number of rows match in both tables using full table scans.

Tubby pointed out that if you have histograms the optimizer can intelligently decide when to use indexes for the lookup, and in the second case it is possible that the hash join is a better option.

You can use hints to try to employ the nested loops access method and see what happens.

I agree with the other posters - the difference in cost will be due to a histogram existing for that column. Check the data dictionary view USER_TAB_COLUMNS for that column in that table and see if HISTOGRAM is set. There are two types of histogram - height balanced or frequency. You can read up about the differences between the two.

If you have a frequency histogram then it stores the matching row count in the histogram for each value that exists when statistics are gathered on that table. Thus the Optimizer will have what it believes is a good estimate of the number of rows that would match your constraint on that column, and that will determine the cost of access and which is the cheapest access method.

A height balanced histogram is less 'accurate' as it counts the number of rows in certain value ranges. It can still estimate the number of rows that match your constraint, but to less absolute accuracy.

Either way it is the presence of the histogram that leads to the different cost estimates, and the use of a different and 'lower cost' execution plan for the other query.

If you want the same execution plan for both queries, and all such queries, then drop the histogram. Then it will assume that all values occur equally often in that table, and use the same execution plan for each query. Drop the histogram by gathering statistics on that table with the histogram size set to 1, which really means none.

It's not the total number of rows from the final query that makes the difference, but the number of rows for each step within the execution plan. If you look at the execution plan (or explain plan) you should see that the estimates for the number of rows for the step that is different in the two queries does have a significantly different row count.

By making the Optimizer try and be more 'intelligent' Oracle have also made it more 'sensitive'. Thus small changes between queries and constraints can cause it to choose radically different execution plans as a result of the statistics it has available to it on the tables and columns in the database.

tsharma@healthdialog.com wrote:
Well, number of records for first query (using nested loop and index) are 2600
and for second (using full table scan) are 2000.

2600 rows isn't that many. Assuming you are using indexes I would imagine the nested loops lookup to be the better method, but that is hard to say with certainty with the information I have on your system.

If necessary use the use_nl hint to try to persuade the optimzer to use that access method. Be warned that hints are called "hints" and not "directives" for a reason - the optimizer may choose to disregard them if it wants to. l have seen posts here on OTN that declare hints WILL be used if specified but that is not my experience - the optimizer will ultimately do what it wants to do regardless of hints.

If necessary use the use_nl hint to try to persuade the optimzer to use that access method. Be warned that hints are called "hints" and not "directives" for a reason - the optimizer may choose to disregard them if it wants to. l have seen posts here on OTN that declare hints WILL be used if specified but that is not my experience - the optimizer will ultimately do what it wants to do regardless of hints.

From what I understand, Oracle's hints are actually directives - the optimizer must follow the hint, unless the hint is invalid.

Hints may be invalid if the wrong alias is used in the hint, the hint is malformed, the hint is incompatible with another hint, or the query was transformed by the optimizer into a form that is incompatible with the hint before the optimizer applied the hint. I probably left out a couple reasons why Oracle would apparently ignore a hint.

Jonathan Lewis has an interesting blog post that shows the optimizer (apparently) ignoring a hint:
http://jonathanlewis.wordpress.com/2009/10/02/quiz-night-3/

Hints may be invalid if the wrong alias is used in the hint, the hint is malformed, the hint is incompatible with another hint, or the query was transformed by the optimizer into a form that is incompatible with the hint before the optimizer applied the hint. I probably left out a couple reasons why Oracle would apparently ignore a hint.

Or if using the hint would return the wrong answer, or if you happen to be unlucky enough to hit a bug.

tsharma@healthdialog.com wrote:
Well, number of records for first query (using nested loop and index) are 2600
and for second (using full table scan) are 2000.

It's not the number of rows that actually appear that matters, it's the number that the optimizer thinks will appear.

It's possible to do a detailed analysis of the stats involved - but a simple starting point would be to give us the full execution plans for the two queries. (See the note below about posting code in fixed font)

From what I understand, Oracle's hints are actually directives - the optimizer must follow the hint, unless the hint is invalid.

Nonetheless, on multiple occasions in multiple circumstances I have seen the cost-based optimizer refuse to take a hint that was coded properly. Usually this takes the form of refusing to use a specified index in preference to another index which in operation offers inferior performance.

From what I understand, Oracle's hints are actually directives - the optimizer must follow the hint, unless the hint is invalid.

Nonetheless, on multiple occasions in multiple circumstances I have seen the cost-based optimizer refuse to take a hint that was coded properly. Usually this takes the form of refusing to use a specified index in preference to another index which in operation offers inferior performance.

This is rare but I have seen it.

I saw this happen on a client site on one occasion. Examination of the 10053 trace file showed that Oracle had "lost" the index.

I wasn't allowed to keep the trace file, and I don't remember who the client was. I think it was some version of 10.2, and I think there was a query transformation involved and Oracle lost the first index of the alphabetical list (as if it had created an array of indexes from subscript zero, but only considered them from subscript 1) - it was clearly a bug.

Jonathan:
Thanks for joining the discussion and providing 2 more reasons why hints are potentially ignored. Riedelme's reply reminded me of another that I have seen addressed somewhere online.

riedelme wrote:

From what I understand, Oracle's hints are actually directives - the optimizer must follow the hint, unless the hint is invalid.

Nonetheless, on multiple occasions in multiple circumstances I have seen the cost-based optimizer refuse to take a hint that was coded properly. Usually this takes the form of refusing to use a specified index in preference to another index which in operation offers inferior performance.

This is rare but I have seen it.

Riedelme:
I believe what you are saying - that it can happen. I think that Jonathan's provides an idea at what might be the cause, although it might be nice to see a test case where it happens. I have a vague memory of Jonathan mentioning such a problem online in the past where the 10053 trace indicated that Oracle forgot that an index existed, where he actually showed that fact by presenting part of a 10053 trace.

Your post reminded me of another case where Oracle might ignore a specific index hint, and select to use another index instead. Here is a simple test case which creates a table with 100,000,000 rows with 2 indexes:

The above created the table T15, the composite index IND_T15_C1_C2 on the columns C1 and C2, and the index IND_T15_C3 on just the C3 column. The table is large enough due to column C4 that Oracle will probably will select to use an index, when possible, rather than performing a full table scan. Now, let's see what happens when we try to determine the number of rows in table T15:

Oracle still used the IND_T15_C1_C2 index, even though I hinted to use the IND_T15_C3 index. But wait, there is a problem. The table is actually T15, not T1. I have included this example, as it is an easy mistake to make when typing SQL statements. Let's try again with a correctly formed hint in the SQL statement:

Oracle still ignored the hint to use the IND_T15_C3 index, and selected to use the IND_T15_C1_C2 index instead (again). Oh, Oracle does not index NULL values in a non-composite B*Tree index (or when alll values are NULL in a composite index), so using that index may yield the wrong result. We can fix that problem:

The case you present is covered by Jonathan's statement "Or if using the hint would return the wrong answer" earlier.

If the column is not defined as a NOT NULL Oracle cannot be sure that it does not contain NULLs -- even if you are sure that they are no NULLs. It is safer and correct for Oracle to not use the index , else using the index would lead to the wrong results.

The concept that I had in mind was the influence of potential NULL values in columns causing Oracle to (correctly) treat a hint as being invalid - not necessarily an index hint, but it was a little easier to demonstrate a "fix" for the invalid hint with an index on a nullable column. I agree with you that this should probably fall under Jonathan's statement "if using the hint would return the wrong answer". Then there is the case of a typo as I demonstrated, but that seems to be a repeat of my statement "if the wrong alias is used in the hint".

By the way, nice summary of the reason why the optimizer would find the index hint as invalid.