I ran into a really ugly SQL statement last week. It was the most expensive statement running on the system for the last several weeks. On top of the fact that the statement ran for hours, it also had a number of “issues”. The statement had the following characteristics:

Several DISTINCT Operators – these tend to show up in statements where developers have left off a join condition

ANSI Join Syntax – generally Oracle converts this syntax back to Oracle join syntax under the covers. I realize this is standard SQL syntax, but it still makes me nervous due to previous experience with problems caused by this conversion process.

Functions Used in WHERE Clauses (DECODE and NVL in particular) – can disable indexes

UNION – forces a sort of the union’ed result sets so that it can eliminate duplicates (which is often not necessary). Depending on how the statement is coded, UNIONs can also sometimes result in multiple accesses of the same objects (i.e. if the developer has used the UNION in the place of a more robust where clause).

Numerous != and <> Comparison Operators – Not necessarily a problem but often can be coded in a more efficient manner

Several OR’s – OR’s are tricky. I get suspicious when there are numerous other problems in a statement as well.

I’ll refrain from publishing the actual statement to protect the guilty, but one of the most interesting bits of the statement looked something like this:

So this is an extremely strange way to write the statement and clouds the intention considerably, but does it hurt performance? Well actually no. The optimizer is smart enough to know that this DECODE will always result in the same literal value and so it appears that Oracle does not run the DECODE statement for each row. Here’s a quick example from a 10.2.0.4 database. (note that I used dplan.sql and fss.sql in this code snippet)

You know the cost calculation that the cost based optimizer (CBO) uses to determine which execution plan to choose for a SQL statement, right? If you don’t, you should immediately stop reading this and pick up a good novel instead. Ah, you’re still here? Well I got an interesting email today from one of my co-workers saying he had to kill a query yesterday. Actually that’s a big part of his current job. Killing runaway queries – apparently that job takes most of his time between 8 and 5. Anyway, he sent me this execution plan today, no comments, “just have a look at this”, he said.

So I had a look. Yes – that’s a 65P in the cost column. I’ve seen worse (but not in a production system). Cost is not always a good indication of run time, by the way. It’s just a sort of normalized estimation after all. But the estimate for the number of rows and bytes (18E and 15E) are very impressive as well. This query ran for several hours before my buddy finally killed it. As you might expect, the query was missing a join condition between a couple of large tables (7M and 32M).

So it looks like the cost tops out at 18E as does the estimated number of rows. Oddly the number of bytes appears to top out at 15E. So the production query had maxed out the rows and bytes estimate although the cost was significantly under the max. Still 65P is the biggest cost I’ve seen in a production system. Anyone seen a bigger one?

P.S. I have two categories for SQL related posts. “Developer Tricks” and “Wall of Shame”. This one gets both tags.

Ha. This one was a little surprising. I ran across a SQL statement that gets fired off hundreds at a time in rapid succession. Each execution taking several seconds – too long for thousands of executions in a row. The statement looked like this (cleaned up to protect the guilty):

Yes – that’s a first rows hint with about a billion as the number of rows to optimizer for.

The reason I noticed it is that it runs for 15 hours before getting a Snapshot Too Old error. The attempted solution was to restart it the next day (thinking maybe it will run better the second time I guess). The table has roughly 100M rows. There was no index on PYMET which is unfortunate as the two values requested account for only about 0.15% (not 15%, 0.15%). The optimizer chooses an index on MANDT, BURKRS, AUGST, FAEDN and as you might expect, it doesn’t work very well (see the NDV comments I added to the statement).

Funny things:

The First_Rows hint is requesting the Oracle optimizer to return the first billion records as fast as possible (even though there are only 100M rows).

The documentation for the First_Rows hint in 11g looks like this:

The FIRST_ROWS hint instructs Oracle to optimize an individual SQL statement for fast response, choosing the plan that returns the first n rows most efficiently. For integer, specify the number of rows to return.

For example, the optimizer uses the query optimization approach to optimize the following statement for best response time:

In this example each department contains many employees. The user wants the first 10 employees of department 20 to be displayed as quickly as possible.

So I can see where the developers might have interpreted this as the ever elusive “Go Fast” hint.

The developers also added “and rownum < 999999999" to the where clause which limits the amount of rows that can be returned. I'm not sure whether they knew it or not, but this clause also has the same affect as the hint. That is to say that the clause causes the optimizer to modify it's calculations as if the first_rows_N hint had been applied. Maybe the developers weren't getting the "go fast" behavior they expected from the hint and after doing some research found that the "rownum Continue reading ‘Funny Developer Tricks – first_rows(999999999)’ »

1. Obviously turns off any indexes on acct_number (unless they had a function based index).
2. Looks like they are probably storing numeric data in a character data type (or implicitly converting a number to character string).
3. Not using bind variables so they are not helping themselves from a parsing standpoint.
4. Finally, they’re making darn sure they take care of any mixed case numbers!