You are here

Tuning High-Volume SQL

I was at the supermarket the other day waiting my turn at the checkout behind another guy. The checkout-chick (I'm sure there is a more PC term, I just don't know what it is) just finished scanning his groceries and he asked her to wait until his wife returned with a few last-minute items. I've done it before, so steam didn't start coming out of my ears - yet. Fifteen seconds later she ran up - sorry, sorry - and checked out the last few items, so I wasn't really inconvenienced.

As it turned out, their technique was efficient. If they had returned to the aisles with the shopping trolly to get the last few items, they would almost certainly have been behind me at the checkout; 30 seconds of extra items could have cost them minutes at the checkout.

How does this technique scale though? What if he had walked into the supermarket and reserved a place at the checkout whilst his wife ran back and forth with groceries - one aisle at a time. It is inconceivable that she could fetch groceries as fast as the checkout-chick could check them out, so inevitably there would be considerable waiting time between checking-out each armful of groceries. Also, the collection itself will take longer because of the extra ground covered. Although I haven't performed a study, logic tells me the technique will be slower. By lots!

So, it makes sense if you only need to go back for one or two more things, but not more. It's hardly surprising that you don't see people doing it, right?

But here's the thing. I DO see it all the time. Not at the supermarket with groceries; but in Oracle with data. The database equivalent of running back and forth to the checkout is called Nested I/O. Nested I/O is really efficient for small volumes, but it doesn't scale. It is the cause of almost every poorly tuned high-volume SQL I have ever seen.

Consider a sub-optimal SQL (SQL1) that takes 10 minutes to run and its optimal tuned equivalent (SQL2) that takes 1 minute to run. They perform the same logic and return the same results, but one is faster than the other. There can only be one reason why SQL1 is slower: because it is doing more work, and since they are producing the same results, this extra work is clearly redundant.

So the secret to tuning High-Volume SQL (indeed - tuning any SQL) is to eliminate the redundant work, leaving only the minimum effort required to generate the result. In the context of SQL tuning, "More Work" comes in two basic forms:

Full table scans - A Full Table Scan reads all of the rows in a table or table partition. If you don't need every row to satisfy your SQL, then some of them have been read redundantly. The solution is generally to create an index so that only the required rows are read. This is a characteristic of a low-volume SQL. Low volume SQLs read small proportions of large tables, so full scans that discard heaps of data become very expensive. Conversely, high-volume SQLs tend to read large proportions of large tables; a Full Table Scan discards much less data and becomes pretty efficient.
Indexes are not the solution to high-volume SQL performance problems. That is not a mis-print. Generally speaking, a high-volume SQL will produce its optimal performance without using a single index.

Nested I/O - Nesting is when you perform the same operation for every element in a data set: For each A do B. Nested I/O is when the nested operation (B) is an I/O operation. Like at the supermarket, it is much more efficient to do all your I/O in one hit rather than bit-by-bit. There's a whole 'nuther article on that concept alone, but to give you some idea consider the following:

Consider a nested operation where for each of 1,000,000 SALES transactions, we lookup the customer from CUSTOMER table, which contains 10,000 rows. Assume this is performed in a nested operation. For each Sales Transaction, we must:

Read the root block of the index.

Navigate to the leaf block of the index, reading intermediate branch blocks along the way. This will vary depending on the size of the indexed column(s) and the size of the table, but an extra 2 blocks is not uncommon.

Use the ROWID in the leaf block of the index to lookup the customer in the table

That's a total of 4 blocks to process a single sales transaction. Over 1,000,000 sales transactions, that's 4,000,000 block reads just to pick up the customer. Consider that the CUSTOMER table may fit perhaps 50 customers in a single block (probably more, but let's err on the conservative side). This means that the entire table occupies no more than 200 blocks in total. Yes that's right, we read 4 million blocks from a 200 block table!

By nesting the customer lookup, we have increased the number of blocks read 20,000-fold. In other word's 2 million percent more I/O.

OK, this is a bit of an over-dramatisation because much of those 4,000,000 block reads will be cached. But make no mistake, even reading cached blocks is not free. This overhead is real and very expensive.

Eliminating Nested I/O is the key to tuning High-Volume SQL.

Sounds simple, but it doesn't really help. If I have a slow SQL, how do I tune it? I never asked it to perform nested I/O, so how do I ask it NOT to? This is where it gets a little complex because there are a number of common problems that cause nested I/O; there is a separate section below for each case.

Nested Loops Joins are the most common and straightforward type of nesting in Oracle. When joining two tables, for each row in one table Oracle looks up the matching rows in the other table. Consider the following SQL Trace output from TKProf.

Starting with the inner-most step of the plan, we see a full table scan of EMP returning 14 rows, and a unique index lookup on DEPT for each of those rows. So we see some wastage already; we have retrieved 14 rows from DEPT even though the table contains only 4 rows in total.

So how can we stop this happening? Keen observers will note the /*+RULE*/ hint in the above SQL. The Cost Based Optimiser is not stupid; it knows that nested loops is a wasteful plan for this SQL. The Rule Based Optimiser is stupid however; one of its rules is to use an index if it is available. This problem will fix itself when we remove the hint.

This was actually a bit unexpected. Oracle will usually prefer a HASH join over a MERGE for equi-joins. This is almost certainly a result of the very small tables involved. Let's see what Oracle would do in a real case of high-volume SQL:

Now that it thinks there is a million employees and ten-thousand departments, it uses a HASH join. Hash joins are the preferred method for large data volumes, because their nested operations are performed in memory, not on disk, and they scale better than MERGE joins. Interestingly, Oracle was right to choose the MERGE join for the small volumes; note that MERGE performed only 11 buffer reads, whilst HASH performed 15, and NESTED LOOPS performed 26 - the worst of all.

Although the optimiser is very reliable on simple queries such as this, the real world contains much more complex queries that are beyond the capability of the CBO to optimise. In some cases the CBO will wrongly believe that a high-volume join is actually low-volume, and wrongly choose a Nested Loops join.

Tuning such queries is a complex process and outside the scope of this article. To find out if there is a problem, take a look at the Explain Plan output for the query and see if there are any NESTED LOOPS or FILTER operations with two child steps (FILTER steps with one child do not nest). Look at the first child step and determine from your knowlege of the data how many rows that step will return; this is the number of times the second step will be executed. If this number is greater than 10% of the total rows in the table of the 2nd child step, then you almost certainly have a nested I/O problem. Even greater then 1% could be a problem, although not so serious.

Oracle generally makes these mistakes because it thinks the first child step will return fewer rows than it really does. Look at the row estimate in the Explain Plan to see if it is accurate - it almost certainly won't be. Use DBMS_STATS.GATHER_TABLE_STATS() if statistics are stale. If that doesn't help, use CARDINALITY hints as shown above to provide a more accurate estimate, or if this fails, use ORDERED, USE_HASH or NO_USE_NL hints to advise the preferred join order and method.

Personally, I find sub-query expressions in the SELECT clause insidious and annoying. They serve almost no useful purpose and are the only guaranteed way to kill the performance of a high-volume SQL. There is simply NO WAY to tune them; they must be avoided at all costs.

Consider the following query and TK*Prof output. The query selects all employees from EMP and picks up the department name from DEPT in a correlated sub-query expression

But this is extraordinary, n'est pas? Like the Nested Loops plan above, this plan shows that all 14 rows were read from EMP, and that a unique index scan was used to pick up matching rows in DEPT. But look at the row count against DEPT! 3 rows! How is that possible? There were 14 rows in EMP and therefore 14 lookups to perform.

Something very different to a standard Nested Loops join is going on here. It is definitely nesting (for each EMP get DEPT) but the buffer count of 13 tells us that it is doing less work than the Nested Loops join. Nested Loops has no cleverness whatsoever; if it tries to lookup a row that it has just retrieved a moment earlier, it still reads the blocks. Those blocks may well be in Oracle's buffer-cache (so you don't need to get them from disk), but Oracle must still access the buffer cache and find the desired row amongst the potentially hundreds in the block. This is cheaper than disk I/O, but it's not free. That's why SQL Trace records the buffer cache reads and reports them in TKProf.

What has happened here is that Oracle has performed 14 nested operations to retrieve a DEPT row, but only 3 of those operations have been recorded. What happened to the other 11? What type of operation did Oracle perform? In my database there are 4 rows in DEPT of which only 3 are referenced by EMP.DEPTNO, so this kind of makes sense; the three operations that it recorded were the first lookup of each of the three departments. The other 11 lookups were satisfied by some type of cache other than the buffer-cache. Oracle is a bit cagey about this; I can find no reference to it in the Performance Tuning manual.

So how magical is this other cache? Is it the same as the Hash Area? Hash area retrievals in hash joins are not recorded in SQL Trace, so it could be the same thing. This would be awesome, because the Hash Area can be made quite large, and hits against it are much cheaper that buffer-cache hits. Let's up the ante and find out; the following script creates BIGEMP and BIGDEPT 10-times larger than EMP and DEPT.

Woops! We were expecting (hoping?) to see 1400 rows retrieved from BIGEMP resulting in only 300 lookups on BIGDEPT, but it has all gone horribly wrong. Instead of 300 lookups there are 388. Our seemingly magical cache has topped out and cannot hold even 300 rows. At around 30 bytes per row, that's around 10Kb at best. The block-size on this database is 8K. Coincidence? Who can tell?

So we don't really know exactly what's going here. Obviously there is some kind of results-cache in action, but equally obvious it that it is not terribly useful for high volume SQL (these tests were performed on 11g R1).

Look at that! HASH processed seemingly more rows (400 departments instead of 388) but it read only 109 buffers whereas the subquery expression read 589. That is a 400% overhead. Remember that we are still dealing with relatively small volumes here; both tables and the index are in the buffer-cache. This effect multiplies even further when the buffer-cache fills and the query starts performing some disk I/O.

One final note: un-correlated subquery expressions do not suffer the same problem. Since they do not reference any columns in the outer query, they can be performed once only, and the results applied to every row.

See what's happening? For each of the 14 rows in EMP (line 2 of the plan), Oracle performs a FULL scan of EMP to get the departmental average salary. Look at the 4th line of the plan; 59 rows returned from the sub-query, but that does not tell you haw many rows were scanned and filtered out. The real number is much higher!

Even so, 59 rows read from a 14 row table. There's a bit of wastage going on there.

The Oracle UPDATE statement permits updates not only on tables, but also Views and Inline Views. If we can define a view that presents both the current value of a column and the intended new value, then it permits the following syntax:

UPDATE view_name
SET curr_col_val = new_col_val
WHERE ....

Note that this syntax does not contain a sub-query expression; if the view can be defined in a way that avoids nesting, then the update can be performed without nesting. As we shall see, this technique is limited by a condition called Key Preservation. Let's look at it in action:

Here is a SQL that gives us the raw data required for the update, without nesting

This is one of the limitations of updateable join views: when updating a table via a join-view, it may only join to other tables using their primary or unique key. This concept is discussed in more detail in the Oracle manuals. The problem here of course is that the EMP table is joined to an inline view that cannot carry a unique/primary key even though the GROUP BY guarantees uniquess of deptno.

One way around this is use a Global Temporary Table instead of an inline view since a GTT can be created with a primary key.

There are some additional overheads to this method not shown above. Whilst the creation of the GTT is quite efficient, it must also build an index and sample the results to generate statistics. Overall, the nested subquery expression is more efficient over such small volumes (14 rows). Let's take a look at a sample 100 times larger:

Remember that 1400 rows is still a relatively small sample; real life updates are much, much larger. The TK*Prof snippet above for the Global Temporary Table shows the session totals in order to account for all of the overheads. Even so, we can see that it performs substantially less IO (note the disk, query, and current columns) than the nested version. This effect is magnified as the sample gets larger.

A brilliant solution for unnesting bulk updates was introduced in Oracle v9i: the MERGE statement. The idea of MERGE is to apply a series of transactions to a table by updating rows if they exist and inserting them if they do not. By careful construction of the statement, you can engineer MERGE to only update.

Note the disk, query, and current columns are all significantly less than the nested example for 1400 rows shown earlier, and are comparable to the Updateable Join View.

The WHEN NOT MATCHED THEN INSERT clause was made optional in 10g, but is mandatory in v9i. The 1/0 value causes a divide-by-zero error that aborts a badly designed statement and prevents unwanted inserts.

Extreme care should be taken using stored functions in high-volume SQL. Functions are often used to encapsulate complex code that can then be re-used in many places. The problem with this approach is that when the function is used in a high volume SQL, the complex code is executed for every row encountered. An otherwise simple-looking SQL that returns 100,000 rows will - behind the scenes - execute 100,000 (or more!) SQLs.

Not all functions are problematic:

Functions called in the WHERE clause with no parameters, or with constant values as parameters, will execute only once. The value returned will then be used like a constant or bind variable in the SQL.

Functions called in the SELECT clause with no parameters, or with constant values as parameters, will execute only once if they have been defined as DETERMINISTIC. Non-deterministic functions (see the Oracle PL/SQL reference) in the SELECT clause will execute once for every row selected.

Functions that contain only PL/SQL code are much more efficient than those that include SQL statements. Such functions are not ideal, but they do not tend to result in nested IO.

Importantly, correllated function calls (where a table column value is passed as a parameter) will execute once for each row encountered. This is somewhat mitigated in 11g with its results cache but is still worthy of caution.

Keen observers will notice the use of table bigemp1 as a copy of bigemp to avoid mutating updates. I could have demonstrated the principle with a pure SELECT, but this example ties in nicely with the previous section.

Note the 1400 separate calls to the function and the high query buffer IO. These figures are nearly identical to the nested update shown in the previous section.

Notwithstanding the exceptions above, functions containing SQL statements should never be called in high-volume SQL. The only alternative is to re-code the logic of the function into the SQL statement itself whilst avoiding the other nested IO pitfalls (especially subquery expressions in the SELECT clause!).

Triggers bearing the FOR EACH ROW clause, not surprisingly, execute once for each row inserted / updated / deleted. The profile of such a statement is exactly the same as that of the correllated function above.

As for functions, avoid triggers containing SQL statements on tables subject to bulk DML. Any SQL performed in triggers must be migrated to the application code, taking care to avoid all other nested IO issues described in this article.

The case of sub-queries and nesting used to be simple: with the exception of uncorrelated IN subqueries, all sub-queries would nest. Period. This was great for low-volume SQL, rubbish for high-volume SQL. Easy: use joins instead for high-volume SQL.

Things started to change in v7.3 with Anti Joins that permitted NOT IN to be unnested by the optimiser (effectively converting them into an outer join that discards matched rows). In 8i, Semi-Joins allowed correlated EXISTS sub-queries to be unnested; but more importantly the /*+UNNEST*/ hint internally converted IN subqueries into EXISTS and NOT EXISTS subqueries into NOT IN so that Anti- and Semi-Joins could unnest almost any sub-query.

The revolution was completed in 9i with automated unnesting; the optimiser determines whether unnesting would be beneficial (any high-volume SQL) and applies it automatically. No more hints! As a result, sub-queries are almost never a problem for high-volume SQL any more. But when they are a problem they catch you completely off-guard.

Run your High Volume SQL through Explain Plan. Look for any NESTED LOOPS lines or FILTER lines with two child steps. Nested Loops joins typically relate to join statements, but they can also come from sub-queries re-written by the optimiser. They usually mean that the sub-query is still nesting, even though the optimiser has rewritten (but not unnested) the query. FILTER steps with one child step are usually related to complex queries with views or aggregation but they are relatively harmless. FILTER steps with two child steps come from sub-queries; they mean that the sub-query is nesting.

The optimizer can unnest almost any uncorrelated sub-query and even most simple IN, NOT IN, EXISTS, and NOT EXISTS sub-queries.

If the subquery of a high-volume SQL is simple or uncorrelated and is still nesting, try:

Gather statistics on all tables and indexes using DBMS_STATS.GATHER_TABLE_STATS()

Check you are using the Cost Based Optimizer, not the Rule based Optimizer.

Try adding an /*+UNNEST*/ hint to the sub-query.

If your Explain Plan shows a step NESTED LOOPS (SEMI) step then the optimiser is unnesting a sub-query into a Nested Loops semi-join; this may not improve the situation. The /*+UNNEST*/ hint will not help because the sub-query is already unnested; it may be necessary to add a /*+HASH_SJ*/ (for correlated sub-queries with = predicates) or /*+MERGE_SJ*/ (for correlated sub-queries with > or < predicates) hint.

If your sub-query is complex and correlated, you will have to code it another way. Perhaps materialise some interim results in a Global Temporary Table.

You should never, ever do any of the following in the correlated sub-query of a high-volume SQL:

Nest two or more levels deep, and correlate a column not in the immediate parent block

Correllate a HAVING clause

Use a set operator, analytic function, MODEL clause, CONNECT BY in the sub-query.

Correlate a non-scalar sub-query with a comparison operator (WHERE col > (SELECT col ... correlated sub-query ...)). Even = conditions will nest. Use MAX/MIN/AVG to make the sub-query scalar - even if selecting on a unique key.

All we've discussed here is the perils of nesting. Is tuning high-volume SQL that simple? The answer is: "sort of".

It's not quite as cut and dried as I've made out; not all nesting is bad. If you have a query that joins 500,000 rows from a 1,000,000 row table (high volume) to 500,000 rows from a 100,000,000 row table, then nesting might be the way to go. A full scan of 100 million rows is a pretty expensive way of reading the half-million that you need.

Rather that thinking in terms of high- and low-volume queries, you need to evaluate each table on its merits. How many rows are there? How many do I need?

Another exception - as John Watson (OCM) identifies in the comments below - is a nested loop on an index without a table lookup. If the inner loop of a nested loop join can get all the columns it needs from the index, then even in a high-volume scenario, the nested loop will probably out-perform the alternatives.

The simple don'ts in this article are easy to avoid; functions, scalar sub-queries, triggers, etc. It gets much more complex when you eliminate all of these things but are still left with a nested step in the Explain Plan. Like many things, the first step is to recognise that there is a problem; hopefully this article helps in that regard.

Try it and find out. Make sure you trace and use Tkprof. Ensure that you get a hash join in the outer join option, and post the results here.

Outer joins attract some type of mystical fear and loathing that they are somehow less performant than inner joins. I usually have a reasonable insight into these myths, but I have no idea how that one came about.

The truth is that Oracle supports four join methods: Nested Loops, Sort-Merge, Hash, and Cluster. All of them work equally well as inner joins and outer joins. Correllated Scalar Sub-Query Expressions are the functional equivalent of a Nested Loops join, so their use merely restricts your available toolset.

Keen observers will notice the use of table bigemp1 as a copy of bigemp to avoid mutating updates.

Could you explain, please ?

3) Conclusion

I think your rather contradicting yourself. If

Quote:

...
It's not quite as cut and dried as I've made out; not all nesting is bad. If you have a query that joins 500,000 rows from a 1,000,000 row table (high volume) to 500,000 rows from a 100,000,000 row table, then nesting might be the way to go. ...

means that we can use scalar sub-queries (more precisely correlated scalar-subquery) because, as you are saying, nesting is not always bad.

1) Agreed, a left outer join is the more appropriate comparison. However the principle remains the same: Oracle is able to Hash Left Outer Join from both directions (use either the inner or outer table as the hash table), so performance-wise the example stands.

2) If my function had referenced the same table that was being updated it would have raised a mutating table error, so I made the function and UPDATE reference different tables. Obviously this is not a real-life example, but it does what I wanted, which is to demonstrate the nature of function calls.

3) Look at the example again. I thought I was being clear, but obviously not. Full Table Sans are faster row-for-row than index scans. Of course, we often scan more rows in a Full Table Scan. Is a full table scan of 1,000,000 rows faster than an indexed scan of 500,000 rows? Absolutely yes! Is a full table scan of 100,000,000 rows faster than an indexed scan of 500,000 rows? Probably not. This last example is an index scan of 0.5% of the table; whilst this is not "low-volume SQL" by any stretch or the imagination, it is processing only a very small proportion of the table.

Interesting article, and one I will have to keep in mind, as a counter to my usual experience. My usual experience being, when there is a query which is taking hours, and it should take minutes, it is because Oracle is doing a hash join, and the fix is make it do a nested loops join. Obviously, one size does not fit all.

Results cache is interesting - it should narrow the gap for scalar function calls. Even though it will result in fewer function calls, those calls will still be nested and will result in less efficient IO over a large data set. I haven't done any benchmarking, but I don't expect to be changing my position on the issue of Functions.

If Nested Loops is out-performing Hash Join / Full Scan, it's because you are dealing with Low Volume SQL, not High Volume SQL (the subject of this article). I usually draw the line anywhere between 1% and 10% of a table. If your SQL requires more than 10% of a large table to produce its results, then I contend Indexed Nested Loops will never out-perform Hash Join / FTS except in extraordinary cases deliberately designed to confound hash joins. Conversely, if your SQL needs less than 1% - Indexed NL will always out-perform FTS / HJ if the index is well designed. In-between 1 and 10% - benchmark.

I came across your article while trying to develop a little thought experiment: "what is the most efficient way to determine which rows are common to two row sets?" You emphasise the importance of avoiding correlated subqueries, as used by the iterative operations nested loop join and filters with two children. Generally, I would agree - but in this case I'm not sure.
My example is the OE demo schema: which products are stocked by both the warehouse in Toronto (warehouse_id=5) and Sydney (warehouse_id=9)? I have five equivalent SQLs, and the results surprised me.
First, the obvious solution which is to use a compund query:

Both have been re-written to a join and given the same cost, which is far lower than my baseline. I do not understand why it is so low: the nested subquery which has to iterate 114 times is costed at zero. I had expected both these to be worse than the compound query.
Then I tried two versions of, in effect, manually coding the INTERSECT:

The first is re-written to the join with its correlated subquery and gets the same cost, with an extra 1 for removing duplicates. Fair enough. The second has done exactly what I asked it to do (no transformation) but it is still slightly cheaper than the INTERSECT.
My conclusion: perhaps nested subqueries are not as bad as one might think. The CBO is pretty good at transformations nowadays.
This was using release 12.1.0.2.
--
John Watson
Oracle Certified Master DBAhttp://skillbuilders.com

The exception to this rule - there's always one - is when all of the columns referenced in the SQL are contained in the index. If Oracle does not have to access the table then there is no break-even point; it is generally quicker to scan the index even for 100% of the rows.

In your example, the loop is accessing only the index - this is very quick indeed, especially when the entire index is cached. If you conspired to create a high-volume SQL where the inner loop hit the index AND the table, I feel confident that one of your join alternatives would fare better.

For clarity, I'll go back up into the article and note the exception, giving you credit for finding me out.