News, views, and items of interest on IBM's Db2 database management system and mainframes.

Tuesday, February 25, 2014

Dynamic SQL - Let's Do The Math

We've come a long way in the world of DB2 in the past decade or so. I remember way back when it was common for DBAs to say "If performance is an issue, dynamic SQL should not be used... always use static SQL!" But today, in 2014, it is usually the case that dynamic SQL is the predominant form of new development.

Now a lot of things have changed to make this the case. Particularly that most new applications are being developed for distributed and web applications, instead of traditional mainframe, COBOL applications. And dynamic SQL is the default way to access DB2 from these type of apps.

But you know what? Even if you are developing traditional mainframe COBOL programs, dynamic SQL can be a better solution for you.

The Basics

Before we go on, let's tackle a few of the basics. What
makes dynamic SQL different than static SQL?Well, static SQL is optimized prior to program execution.Each and every static SQL statement in a
program is analyzed and optimized during the DB2 Bind process.During this process the best access path is determined
and coded into your DB2 package.When
the program is executed, the pre-formulated access path is executed.

Dynamic
SQL, on the other hand, is optimized at run time. Prior to the dynamic SQL statement being
executed, it must be processed by the DB2 Optimizer so that an optimal access
path can be determined. This is called
the Prepare process. Prepare can be
thought of as a dynamic Bind.

We will not go into the details of dynamic statement caching and its ability to improve dynamic SQL performance here. Suffice it to say, dynamic SQL these days can be tuned using caching. For additional details on dynamic statement caching (and REOPT parms) check out my article, Dynamic SQL Performance, on TOAD World.

Now let's turn our attention to traditional dynamic SQL development. There
are four types of dynamic SQL:

EXECUTE
IMMEDIATE

Non-SELECT

Fixed-List
SELECT

Varying-List
SELECT

EXECUTE IMMEDIATE dynamic SQL ­will (implicitly) prepare and execute
complete SQL statements embedded in host-variables. Its drawbacks are that it can not be used to
retrieve data using the SELECT statement and the PREPARE is implicit within the
EXECUTE IMMEDIATE; so optimization must occur every time the statement is
executed.

Non-SELECT
Dynamic SQL can be used to explicitly prepare and execute SQL statements in an ­application
program. The PREPARE and EXECUTE are
separated so that once a statement is prepared, it can be executed multiple
time without re-optimization. However,
as its name implies, Non-SELECT dynamic SQL can not ­issue the SELECT
statement.

Fixed-List SELECT can be used to explicitly prepare and execute SQL SELECT statements ­where
the exact columns to be retrieved are always known in
advance.The columns to
be retrieved must be known at the time the program is being coded and they can
not change during execution.This is
necessary ­in order to create the proper working-storage declaration for ­host-variables
in your program.

If
you do not know in advance ­the exact columns that are to be accessed, you can
use Varying-List SELECT dynamic SQL. In
this case, pointer variables are used to maintain the list of selected
columns. Although Varying-List SELECT is
the most complicated type of dynamic SQL, it also provides the most flexibility
for dynamic SELECT statements. Changes
can be made "on the fly" to tables,­ and columns, and
predicates. Because everything about the query­ can change during one invocation of the
program, the number and­ type of host-variables needed to store the retrieved
rows cannot­ be known beforehand. This
will add considerable sophistication and complexity to­ your application
programs.

A Mathematical Reason to Reconsider Dynamic SQL

Even
if the decreasing cost of dynamic SQL and the newer performance improvements like dynamic statement caching do not compel you to use dynamic SQL, there is at least
one situation where dynamic SQL should almost always be chosen over static
SQL: when numerous combinations of
predicates can be chosen by a user at run-time.

Consider
the following: What if, for a certain
query, there are twenty possible predicates.
The user of the program is permitted to choose up to six of these
predicates for any given request. How
many different static SQL statements need to be coded to satisfy these
specifications?

First,
let's determine the number of different ways that you can choose six predicates
out of twenty. To do so we need to use
combinatorial coefficients. So, if n is
the number of different ways then:

n = (20 x 19 x 18 x 17 x 16 x 15) /
(6 x 5 x 4 x 3 x 2 x 1)

n = (27,907,200) / (720)

n = 38,760

38,760
separate static SELECTs is quite a large number, but this is still not
enough! This number shows the total
number of different ways we can choose six predicates out of twenty if the
ordering of the predicates does not matter (which for all intents and purposes
it does not)[1].
However, since the specifications clearly state that the user can choose
up to six, we have to modify our
number. This means that we have to add
in:

the number of
different ways of choosing five predicates out of twenty

the number of
different ways of choosing four predicates out of twenty

the number of
different ways of choosing three predicates out of twenty

the number of
different ways of choosing two predicates out of twenty

the number of
different ways of choosing one predicate out of twenty

Figure 1. Combinatorial Coefficients Calculations

Ways to Choose Six
Predicates Out of Twenty

(20 x 19
x 18 x 17 x 16 x 15) / (6 x 5 x 4 x 3 x 2 x 1) = 38,760

Ways to Choose Five
Predicates Out of Twenty

(20 x 19
x 18 x 17 x 16) / (5 x 4 x 3 x 2 x 1) = 15,504

Ways to Choose Four
Predicates Out of Twenty

(20 x 19
x 18 x 17) / (4 x 3 x 2 x 1) = 4,845

Ways to Choose Three
Predicates Out of Twenty

(20 x 19
x 18) / (3 x 2 x 1) = 1,140

Ways to Choose Two
Predicates Out of Twenty

(20 x
19) / (2 x 1) = 190

Ways to Choose One
Predicate Out of Twenty

20 / 1 =
20

Total
Ways to Choose Up To Six Predicates Out of Twenty

38,760 +
15,504 + 4,845 + 1,140 + 190 + 20 = 60,459

This
brings the grand total number of static SQL statements that must be coded to
60,459. The calculations are shown in
Figure 1. In a situation like this, if
static SQL is being forced upon us, we have one of two options:

I
can guarantee that 99.99% of the time the second option will be chosen. My solution would be to abandon static SQL
and use dynamic SQL in this situation.
How would this ease the development situation? Consider the following:

With dynamic
SQL, the twenty predicates need be coded only once (in working storage)

As the program
runs, the application logic can build the complete SQL statement based upon
user input

The size of the
DBRM will decrease dramatically.The
DBRM for the static SQL program will be huge if it contains all of the 60,459
SQL statements.Even if a compromise
number is reached, chances are that the DBRM will be large.And guaranteed it will be larger than
the DBRM for the dynamic SQL program.

Although there
will be additional run-time overhead to build the SQL and perform the dynamic Prepare,
performance may not suffer. Queries issued against non-uniform data, may actually experience improved access paths and perform better.

So When Should You Seriously
Consider Dynamic SQL?

When the
nature of the program is truly changeable, such as the example given in the
text above.

When the
columns to be retrieved can vary from execution to execution.This is similar to the example given where
multiple combinations of predicates can be chosen, but in this case, multiple
combinations of columns can be chosen.

When benefit
can be accrued from interacting with other dynamic SQL applications.

When the SQL
must access non-uniform data.

You can find some additional guidance for helping you to evaluate when to use dynamic versus static SQL in my Data Perspectives column Dynamic vs. Static SQL.Synopsis

Dynamic
SQL is not always bad... and it is already pervasive in distributed and web applications. In this day and age, dynamic SQL should be considered as a
viable option even for traditional mainframe applications that are not distributed or web-based.

[1] It is true that for performance reasons you
may want to put the predicate with the highest cardinality within each type of
operator first, but we will not concern ourselves with this in this blog post.