Monday, January 19, 2009

Today I gave a session about PL/SQL to colleagues who mostly work with Tibco in our Enterprise Application Integration (EAI) group. They know how to program and they know SQL. The challenge was to cover most of the main features of PL/SQL in a workshop in only 8 hours. The session contained lots of little exercises. Topics were architecture, datatypes, main statements, exception handling, appearances of PL/SQL, SQL in PL/SQL, cursors, collections and recordtypes, bulk processing and dynamic SQL.

For those who are interested and who can read Dutch: you can download the document, presentation and exercises here:

If developing SQL queries is part of your job, then it is definitely worth knowing the SQL model clause. Once you’ve gotten over the initial fear of the new syntax and learned the basics, you can save yourself writing procedural code and creating auxiliary database objects even more often, by doing more in SQL itself. You may end up with shorter, more readable and sometimes faster code. This article will show a couple of uses of the model clause for fairly common type of queries and will discuss in each case whether or not the model clause solution is suitable compared to alternative queries - if any - in terms of readability and performance. By the end you will more easily recognize situations in your daily work where you can apply the model clause.

Background / Overview

The SQL model clause was introduced with the release of Oracle10g Release 1, back in 2003. According to the documentation, the model clause can replace PC-based spreadsheets by creating a multidimensional array from query results and then apply formulas (called rules) to this array to calculate new values. However, judging from the various Oracle forums, it is not used very often yet. And when it is used, people are not doing spreadsheet like calculations with it. The types of queries the model clause does get used, are discussed in this article. These types of queries are:

Forecasting queries

Row generation

Variable number of calculations based on calculated values

Complex algorithms

String aggregation

Of course, the possibilities of the model clause are certainly not limited to this list.

This type of query is used throughout the aforementioned chapter 22 of the Oracle Database Data Warehousing Guide. The basis for this type of query is typically a sales table or view and the question to be answered is a forecast of how future sales will look like. For example, let’s have a look at the SALES table in figure 1.

Using the model clause for such a query is not mandatory; you can achieve the same result by doing a UNION ALL and calculate next year’s sales in a separate query with its own extra table access. Or you can use grouping sets. But if you write these statements, you'll see they become messy and complicated quite fast. And what if you want the sales forecast for 2009 as well, for example by multiplying the sales of 2008 by 2? With the model clause, you just add an extra rule:

amount[2009] = amount[2008] * 2

So forecasting queries are the example in chapter 22 of the Data Warehousing Guide for a reason, as the SQL model clause is able to calculate them without much ado.

Row generation

For many years, you generated rows by selecting from a table or view for which you knew it contained at least the amount of rows you wanted to generate. Usually the all_objects view was used for this, like this:

select rownum
from all_objects
where rownum <= 100000

Then Mikito Harakiri came up with this simple row generating query against dual:

The model clause query for generating rows is faster than the good old query against all_objects, but much slower than the the "connect by level" query. It is therefore not worthwhile to use the SQL model clause for pure row generating. However, for inferred problems of row generation, the model clause can certainly be an option. An example of such an inferred problems is splitting periods into individual days in a calendar, for example when 1 row containing a startdate January 1st and enddate January 5th, have to be split into 5 individual rows. Another example is splitting strings into individual words:

You can see how they are based on the pure row generating queries described earlier. In these solutions "regexp_count(str,' ')+1" calculates the number of words, by calculating the number of spaces and adding one. "regexp_substr(str,'[^ ]+',1,n)" gets the n-th word out of the string.

Especially for the "connect by level" query, more variants are possible. All of them look pretty cryptic, making them hard to maintain. The SQL model clause query maintains its readability by using the partition clause.

With a 10,000 times enlarged table t, the tkprof file looks like this:

Again, the "connect by level" trick wins, but this time it's much closer. So you may let readability prevail.

Variable number of calculations based on calculated values

There are queries out there that just seem impossible to do with just SQL, where a model clause can provide a solution elegantly. This is a class of problems where calculations have to be done on calculated values repeatedly. An example says more than a thousand words, so here is one:

The question is: "What's the balance at the end of each year?". For customer 1 at the end of 2003 it is: 1000 * 1,05 = 1050. At the end of 2004 this is (1050 + 200) * 1,032 = 1290, et cetera. This is what I mean by calculations based on calculated values, that you have to do a variable number of times. Using the model clause, this query becomes relatively easy, like this:

For doing this type of calculations in SQL, there is hardly an alternative, except a very slow one using advanced XML functions. Before Oracle10 you'd probably resort to a procedural language such as PL/SQL. But nowadays you don't need to anymore.

Complex algorithms

With the SQL model clause, it has become possible to solve algorithms, that previously could not be done in SQL. Even some of the most complex ones. An example is a query to distribute tablespaces evenly among data files of a database, in such a way that the free space in each data file remains as equal as possible. Or even a single query to solve each sudoku puzzle when supplied with a string of 81 digits. Although it can be great fun to write, you'll have to admit that these will be a nightmare to maintain if you are not the original author. Regardless of reduced lines of code or small performance gains, a (pipelined) PL/SQL function seems much more appropriate here.

String aggregation

String aggregation is really just a special case of the category "variable number of calculations based on calculations". But it is such a frequently asked question on Oracle forums, that I will treat it separately.

String aggregation is the act of aggregating several rows into one row and "stringing up" all column values, possibly separated by a special character like a comma. An example with the well known EMP table is to produce the following output:

There are lots of ways to do string aggregation:
• using a user defined aggregate function, for example Tom Kyte’s stragg function,
• a hierarchical query with sys_connect_by_path
• with XML-functions
• using the COLLECT function, a SQL type and a user defined function
• using the undocumented (and therefore not recommended) wmsys.wm_concat

Just one query, no auxiliary objects and faster than all other previously mentioned alternatives. Nothing comes for free: the price to pay is a higher PGA memory footprint. Usually this is not a problem, if sufficient memory is available of course. If this is not the case, you'll notice the wait events 'direct path read temp' and 'direct path write temp' and the performance gain quickly turns around and becomes a performance loss.

Conclusion

For a number of problems, the SQL model clause is a very useful extension of the SQL language. This is especially true for forecasting queries and queries with a variable number of calculations based on calculated values, including string aggregation. The model clause can also be used for solving complex algorithms in SQL and for row generation. But this compromises maintainability in the first case and performance in the second case. Nevertheless, if you want to leverage SQL's capabilities, the model clause is absolutely worth studying and using.