Wednesday, April 20, 2011

Yet Another Slow Query

by Enrique Aviles

As the database performance contact I get to troubleshoot slow queries. Typically when users experience a slow application screen an incident is submitted and the issue ends up on my plate. Recently, I had to investigate a report that was taking a little over 10 minutes to generate results. Although the application is an OLTP system some screens allow users to generate a report that can be sent to the printer. Normally these reports take a few seconds to appear on the screen but this one was way over the application’s SLA requirements.

Below is a sample query construct that captures the syntax of the actual query that was executed by the application. After examining execution plans and generating extended SQL trace I identified the parts highlighted in red as the ones contributing to the query’s poor response time.

The query took more than 13 minutes and over 65 million consistent gets to select one row. This is obviously unacceptable so it is clear why users were not happy with that particular report. What could be causing the query to generate so much work?

The complex view is composed of twelve tables and 40 lines in the WHERE clause, five of them OR conditions. I noticed that removing the subquery that gets the maximum LOG_DATE from the complex view helped the main query to complete in just a few seconds. Obviously, results were incorrect without the subquery so I had to figure out a way to preserve the logic that gets the maximum LOG_DATE while having the query complete in a matter of seconds.

Examining the complex view data showed there were no duplicate LOG_DATEs so the MAX aggregate function will always return the absolute maximum LOG_DATE for a given PART_ID/PARAMETER_ID combination. Finding that characteristic in the data led me to consider using a scalar subquery to get the USAGE value from the complex view. In the process I also wanted to select from the complex view in one pass so I decided to use the ROW_NUMBER analytic function to get the maximum LOG_DATE and eliminate the need for a self-join via a correlated subquery. Having devised that plan, I executed the following query to test what would become the scalar subquery:

Only 0.02 second and 738 consistent gets to select the USAGE value from the complex view! It looks like the plan is coming together. I proceeded to replace the correlated subquery with the scalar subquery so the main query becomes:

Notice the complex view is not part of the FROM section and there are no joins to the complex view in the WHERE clause. I executed the new and improved query and got the following error:

ERROR at line 17:ORA-00904: "T4"."ID": invalid identifier

The excitement was extinguished for a brief moment while I realized my mistake. How come T4.ID is an invalid identifier when I know ID is a valid column on TABLE_4? The problem is that TABLE_4 is not visible inside the inline view of the scalar subquery. The complex view is two levels deep so I can’t join CV and T4. How can I hide the logic of the scalar subquery in a way that allows me to join the complex view and TABLE_4? A view that implements the core logic of the scalar subquery achieves the desired result so I created the following view:

The new query returned the same data as the original query while performing a fraction of the work. Although 28,316 consistent gets is a bit high to return one row, this value is quite handsome when compared to the original value of 65 million consistent gets. This translates into a 99.96% improvement. Regarding response time, the original query took 13 minutes to complete while the new query only requires 0.3 second to generate results, also a 99.96% improvement.

Combining the right mix of database objects and SQL features helped me achieve such dramatic improvement. I hope my experience will help you consider creative solutions when faced with challenging SQL performance issues.

2 comments:

where's the "like" button on your blog? this is great one ... I'm looking forward to coming back and studying it in depth ... at the moment I am particularly interested to learn what you mean by "scalar subquery" (I think that was the phrase that caught my eye) ... good work Chet ... see you soon.

Michael, thanks for your comments. A scalar subquery is simply a SELECT that returns only one column from one row. It's used where you would normally list a table or view column. There are a few examples in Google.