Motivation

Often users write scripts to monitor the database in a host language such as PERL or bash.But these host languages have some downsides:

The language environment must be installed on the server.This creates a dependency on a specific environment and one more moving part

The environment may not be portable.For example if you move the database from Windows to Linux a shell script is unlikely to work unchanged

If a lot of data needs to be processed in the script performance may be badShuffling data from the server to the client just to perform logic is very inefficient.This is even true if the client is local to the server.

So clearly there is a desire to allow for server side, portable scripting.

Scripting using SQL procedures

Traditionally the preferred way to perform server side scripting has been through the use of SQL stored procedures.A SQL stored procedure provides a several benefits:

Minimal overhead in executing logic based on data returned from SQL within.DB2 does not need to pass output variables or result set through DRDA to a client.Instead DB2's Procedure Virtual Machine (PVM) can look at the data locally.

Code is portable across all environments supported by DB2 for LUW and to a large extend other DB2 family members.So there is no cost to port script. Further more copies of the same procedure can be used everywhere.

A procedure can be executed from any client that supports an SQL API.So logic can be pre-canned and driven by a tool, an application or a DB2 shell simply by invoking the procedure.

A procedure encapsulates access rights.It is possible for an administrator to write a set logic using elevated privileges, but still safely allow a wider audience to execute the logic.

Here is an example for a stored procedure which performs all the tasks of adding a generated column as described in my last post on generated columns.

Compiled compound statements (aka anonymous blocks)

Procedure do have one big disadvantage though:
In order to create a procedure we need to have sufficient privileges to execute DDL and a schema where we can place the procedure in.
Even if we have the authorization to execute DDL on principle the DBAs may not like it to see random tools deployed on a database.
This is even worse if the script is to be called by a tool.
That is when you install a tool on a client machine you do not expect that this tool requires procedures to be deployed on every database just to operate.

Since many years DB2 has supported the execution of scripts using dynamic compiled statements (BEGIN ATOMIC ... END).But these blocks had to be written in inline SQL PL which has severe limitations.

For example dynamic SQL itself (as used in the above procedure) is not allowed within an dynamic compound.

In DB2 9.7 compiled compound statements have been introduced. A compiled compound statement is a BEGIN.. END block that stands on its own without being part of a trigger or routine.

compiled compounds can be dynamically prepared and executed using PREPARE/EXECUTE or EXECUTE IMMEDIATE.They can be used everywhere a CALL statement can be used.Another name for compiled compound statement is anonymous block.

To show how they work we will redo the example above, this time using an anonymous block instead of a procedure.

Since anonymous blocks do not have a formal parameter list I have used local variables to gather up all the inputs in an easy to find place.
But just like with any other dynamic SQL statement extracting literals from the statement is good practice.
You can use regular parameter markers for variables here or named parameter markers.
CLPPlus works well with named parameter markers.
So let's create some CLPPlus variables and set them. Ironically we must use an anonymous block for that.
But this also serves to show how you can pass output parameters:

Generally, for best readability I recommend to copy parameter markers into local variables at the beginning of the compound statement. That way, when looking at the text you can immediately see which parameters are expected.

returning result sets

I have demonstrated above how you can SET output parameter markers to pass back information to the client.
In addition to returning values of simple base types you can also return more complex values such as ARRAY, ROW or REFCURSOR.
This capability depends of course on the ability of the client to receive these types in general.
The following example shows how to return a result set to CLPPlus using a REFCURSOR

Conclusion

Anonymous blocks provide a powerful tool for DBAs, application developers and tools.
Using anonymous blocks you can execute complex portable scripts on the server without requiring any server side DDL artifacts.
In addition to what i have above DB2 10 also introduces local types and local procedures which can be used in anonymous blocks.
I will blog about those in a future post.

Motivation

Some years ago I visited a business partner to help them overcome some performance problems on DB2.They had a rather simple workload for quality assurance test which ran on a simple Windows PC.

On another DBMS that workload executed without a problem and with acceptable throughput.On DB2 however that same workload completely overwhelmed the machine and throughput was, frankly, abysmal.

Needless to say the partner was not too amused.We quickly discovered that the CPU was pegged at 100%.We further discovered that DB2 saturated the CPU through compilation of SQL statements.

A dump of the package cache showed that it was flooded with simple update statements. Each update being identical to the next except for input values.

That in itself could not explain why the CPU usage was so high. The workload was identical to the competitor's DBMS.

It turned out that there was a recursive update trigger using inline SQL PL.So what looked like a simple statement was actually quite complex.The other DBMS did not have the concept of inline SQL PL. It did not have to pay for trigger compilation for each statement.

Once the problem was known the fix was simple: Using parameter markers there was no need to keep compiling SQL Statements. The inline SQL PL provided an additional boost propelling DB2's performance past the other DBMS.

While the outcome was good, the partner needed to do work to achieve acceptable performance.

Lesson learned: DB2 must run really bad SQL really fast.

In DB2 9.7 two features were added which are meant to fix the issue we encountered:

Compiled SQL PL triggers reduce the complexity of UPDATE, INSERT and DELETE statements to that of the statement itself.I discussed the inline SQL PL vs. compiled SQL PL in two blog entries in the past.

A statement concentrator reduces detects patterns of similar statements to reduce compilations.This is the topic I will discuss today.

A Problem Scenario

It is a best practice when working with DB2, or most other DBMS to use parameter markers or host variables when submitting dynamic SQL.

DB2 maintains a cache of recently executed SQL statements. Every time a new SQL statement is submitted DB2 will match it to the existing cache entries.If a match is found then there is no need to parse and compile.DB2 can execute the SQL statement right away.Unfortunately many application developers do not follow that practice.

Even more unfortunate is that several abstract APIs which generate SQL under the covers also do not use parameter markers,Let's take a look what happens here by setting up a simple experiment.

So 50000 executions of this simple SQL Statement take about 42 seconds to execute.And each statement was compiled and inserted into the package cache.

The best fix

The above scenario represents bad SQL programming. The correct way to write SQL is to use parameter markers and bind the values when the query is being executed.
What happens when we use a parameter marker?

That's a 20 fold increase. when using parameter markers!
Here DB2 will match the txt content with the previously generated plan and simply assign the stmt handle.
We can do even better by doing that ourselves: We pull the prepare outside the loop:

A finer granularity measuring would be better here.
We could use some a technique discussed earlier in this blog to exactly subtract timestamps.
But this last part isn't really the point of investigation.What we want to achieve is to speed up this test without changing the application!

Using the Statement Concentrator

The statement concentrator is a feature that has been introduced in DB2 9.7.

When the concentrator is turned on every dynamic SQL statement text which cannot be matched to an entry in the cache will undergo some very basic parsing.

In this parsing DB2 will find constant literals such as 'Hello', and 5 and replace them with typed parameter markers.So:

VALUES (1)

get's replaced with the equivalent:

VALUES CAST(? AS INTEGER)

DB2 will then attempt to match the modified SQL statement text once more.If a match is found then DB2 will execute the cached statement and supply the extracted constant literals.

If no match is found DB2 will compile the statement, place it in the cache and then execute it.

The easiest way to turn of the statement concentrator is through the DB configuration.

This is online and will effect any subsequent compilations from any session:

As you can see the load executed manifold faster now.
But the statement concentrator is no complete substitute for using parameter markers.
There is an overhead to shallow parsing.

Rules

There are several rules to consider when dealing with the statement concentrator:

Since the statement concentrator replaces literals with parameter markers the optimizer loses the ability to use those literals to derive plans based on the actual values.However, for complex queries such as used in data warehousing the statement concentrator is generally not recommended

If there is a parameter marker present within the original query no further substitutions will be made by the concentrator.The assumption is that the remaining literals are there for a reason.This includes output parameter markers such as: SET ? = (SELECT ...) constructs.

The database configuration STMT_CONC LITERALS is only used if there is no explicit client side setting for statement concentration.Some such settings are:

the SQL_ATTR_STMT_CONCENTRATOR attribute in CLI/ODBC ,

the setDBStatementConcentratormethod in JDBC,

or the statementConcentrator connection property

It is generally safe to enable connection concentration in OLTP workloads that do not use parameter markers.The reason is that typically the query plans for OLTP queries are unlikely to depend on the actual literals provided.

Detrimental affects of literal replacement can, to some degree be mitigated through the use of REOPT(ONCE).This bind option will use the constant literals of the first query to be inserted into the cache to serve as representative for the optimizer.