Tuesday, April 28, 2015

Although this blog usually focuses on DB2 issues, I sometimes use it to focus on other IT issues, usually mainframe-related. The primary purpose of today's blog post is to promote a webinar I'm conducting this Thursday called, Managing Your z/OS Software Bill. The webinar is sponsored by Data Kinetics, the North American distributor of a product called AutoSoftCapping (or ASC for short), that can be used to help control the rolling four hour average and thereby reduce monthly software bills.Cost containment is of critical importance for IT departments in this day and age of financial austerity... especially so in the mainframe world. Every decision regarding your computer resources is weighed based on not only the value that they can deliver to your organization, but upon their cost to procure, implement, and maintain. And, in most cases, if a positive return on investment cannot be calculated, the software won’t be adopted, or the hardware won’t be upgraded.

An important opportunity for mainframe cost containment is to better manage the peak monthly capacity of your mainframe on an LPAR (logical partition) by LPAR basis. The pricing model for most mainframe software is based on the capacity of the machine on which the software will run. Note that this pricing model reflects the potential usage based on the capacity of the machine, not the actual usage. Some vendors offer usage-based pricing. You should actively discuss this with your current ISVs as it is becoming more common, more accurately represents fair usage, and can save you money.

IBM offers several subcapacity pricing models for many of its popular software offerings, including products such as z/OS, DB2, IMS, CICS, MQSeries and COBOL. Some of the benefits of subcapcity pricing include the ability to:

By tracking MSU usage by LPAR you can be charged based on the maximum rolling four hour (R4H) average MSU usage, instead of full capacity. Most organizations with mainframes have shifted to some form of subcapacity pricing model, but not all of them understand how all of the "moving parts" work together. Thursday's webinar will help to clear that all up!Managing mainframe software costs by adopting subcapacity pricing, soft capping techniques, and software like Data Kinetics' AutoSoftCapping can help your company to assure a cost-effective IT organization. In today’s cost-cutting, ROI-focused environment, doing anything less than that is short-sighted.

Tuesday, April 21, 2015

You may recall that this is a subject I've written about before, but I think it is important enough to warrant covering briefly in this series on SQL performance basics... and that is, you should avoid black boxes if you want to optimize your applications.

So what is a black box? Simply stated, a black box is a database access routine that sits in between your application programs and DB2. Instead of coding embedded SQL in your application code, you make a call to the access routine (or black box).

The general idea here is that development is easier because programmers do not need to know how to write SQL. Instead, programmers call the black box to request data. SQL statements become calls – and every programmer knows how to code a call, right?

This approach is commonly referred to as a “black box” approach because the data access interface shields the developers from the “complexities” of SQL. The SQL is contained in that black box and programmers do not need to know how the SQL works – just how to call the black box for data. But there are a number of reasons why this approach is not sound:

It is unwise to have uninformed and untrained developers writing DB2 applications without knowing the SQL behind their data access requests. What may seem like a simple request to a non-educated programmer may actually involve very complex and inefficient SQL “behind the scenes” in the black box.

The black box coders will take shortcuts, like combining multiple types of SQL requests into one that will cause more data to be returned than is needed... but then send only the needed data back to the requester. This violates Part 1 of this series.

The black box is an access routine to DB2 data, but SQL is already an access method to DB2 data -- another one is not needed, and is therefore superfluous.

The black box will add CPU cycles to your application because it consists of extra code that is not needed if the SQL is embedded right into your application programs.

Suffice it to say, avoid implementing SQL black boxes... they are just not worth the effort!

Thursday, April 16, 2015

Another pervasive problem permeating the DB2 development community -- and indeed for most relational DBMSes -- is the “flat file” development mentality. What I mean by this is, when a programmer tries to access data in a relational database the same way that he would access data from a flat file.

DB2 is ‘relational’ in nature and, as such, operates on data a set-at-a-time, instead of the record-at-a-time processing used against flat files. In order to do justice to DB2, you need to change the way you think about accessing data.

To accomplish this, all users of DB2 need at least an overview education of relational database theory and a moderate to extensive amount of training in SQL. Without such a commitment your programmers are sure to develop ugly and inefficient database access code – and who can blame them? Programmers are used to working with files so they are just doing what comes naturally to them.

SQL is designed so that programmers specify what data is needed but they cannot specify how to retrieve it. SQL is coded without embedded data-navigational instructions. The DBMS analyzes SQL and formulates data-navigational instructions "behind the scenes.” This is foreign to the programmer who has never accessed data using SQL.

Every SQL manipulation statement operates on a table and results in another table. All operations native to SQL, therefore, are performed at a set level. One retrieval statement can return multiple rows; one modification statement can modify multiple rows. This feature of relational databases is called relational closure.

When accessing data, a programmer needs to think about what the end result should be and then code everything possible into the SQL. This means using the native features of SQL – joins and subselects and functions, etc. – instead of coding procedural host language code (whether in COBOL, C, Java or whatever) that, for example, opens up a cursor, fetches a row, and then uses a fetched value to open up another cursor. This is processing DB2 like a set of flat files... better to join the data!

Educating programmers how to use SQL properly -- at a high level -- is probably the single most important thing you can do to optimize performance of your DB2 applications.

Saturday, April 11, 2015

Sorting is a very costly operation that you should strive to avoid if at all possible.
Indexes are very useful for this purpose. DB2 can choose to use an available index
to satisfy the sorting requirements of an ORDER BY, GROUP BY, or DISTINCT clause.
Therefore, it can be beneficial to create indexes that correspond to these clauses
for frequently executed SQL statements.

If an index on LAST_NAME and TITLE exists, DB2 can use it to avoid sorting. By
retrieving the data in order from the index, sorting becomes unnecessary.

You can use this information to tune your SQL further to avoid sorting. When
ordering is required only by one column, sometimes adding another column to
the ORDER BY can enable indexed access to avoid the sort.

Say there is a unique index on MANAGER. DB2 will probably use this index to
satisfy the request. However, a sort will be done after retrieving the data to
put the results into DEPT_CODE order within MANAGER. But, since we know
our data, we are able to change this situation. Because MANAGER is unique
we know that the following SQL statement is equivalent to the prior one:

SELECT DEPT_CODE, MANAGER, DEPT_NAME,
FROMDEPARTMENT
ORDER BYMANAGER;

In this case, DB2 can use the index on MANAGER to avoid the sort. The extra
column, DEPT_CODE, is removed from the ORDER BY clause. But, since MANAGER
is unique, there can be at most one DEPT_CODE per MANAGER. Therefore, the
sort order will be equivalent. Because we knew our data we removed a sort from
the process and made our query more efficient!

One final note on sorting: although most of you probably know this, it cannot be
stated too strongly (or frequently) - always code an ORDER BY if the order of the
results of your query is important. The ORDER BY clause is the only foolproof
method of ensuring appropriate sort order of query results. Simply verifying that
access is via an index is not sufficient to ensure the order of the results because:

Another simple, yet effective, means of enhancing SQL performance is to
understand how UNION, EXCEPT and INTERSECT work. Let's start with UNION
because it has been around in DB2 the longest. UNION takes the results of
multiple SELECT statements and combines them together. It does this, as part
of the UNION operation, by sorting the results and removing any duplicates.
UNION ALL, on the other hand, will not sort and will not remove duplicates.

If you know that the results of the queries being unioned together are distinct

(that is, no duplicates will be encountered), then you can use UNION ALL instead
of UNION, and thereby enhance performance by avoiding the sort. Additionally,
if you do not care whether duplicates are returned, always use UNION ALL.

Tuesday, April 07, 2015

Did you know that the order in which you code your
predicates can
have an impact on query performance? It is usually a minimal impact,
but it may buy you a couple of microseconds for a very performance-critical
query. In order to use predicate ordering to your
advantage however, you need to be armed with some basic
information on how DB2 evaluates predicates as it processes
your SQL.

So,
before we continue, let's review the order in which DB2 evaluates predicates at execution time. DB2 will evaluate indexable predicates
first: matching predicates before non-matching. Then, Stage
1 predicates, and finally Stage 2 predicates. Within each of these
four groups, DB2 will evaluate equal predicates, then BETWEEN
and NOT NULL predicates, and finally, any other predicates.
If more than one predicate exists within a group, then DB2
will evaluate them in the physical order in which they are coded
in the SQL statement. The
re-ordering of predicates to take advantage of this situation should
be considered only as a last resort. When implemented, the technique
will usually shave only a little bit from the query's
execution time. It is also important to note that predicate order
will not impact a query's access path: it will remain unchanged (as shown in the PLAN_TABLE).

Now,
how can we use this to our advantage? Consider the following query:

For
the purposes of this discussion, no index exists for either of the
columns coded in the predicates. They are therefore the same type:
stage 1 and equal predicates. Furthermore, we know our data
- in our organization, there is approximately a 50-50 split between
males and females, and 15% of all employees are managers.

To
optimize this query then, we can swap the two predicates to achieve
better performance. So the query becomes:

Why
should this query outperform the previous version? Well, assume
we have 100,000 employees. If DB2 retrieves 50% of the rows
(SEX = 'M') and then retrieves 15% of those 50%, we will have
processed 57,500 rows:

(
100000 * 0.5 ) + ( ( 100000 * 0.5 ) * 0.15 ) = 57,500

But,
if instead, DB2 were to retrieve 15% of the rows (TITLE = 'MANAGER')
and then 50% of those, we will have processed only
22,500 rows:

(
100000 * 0.15) + ( ( 100000 * 0.15 ) * 0.5 ) = 22,500

Obviously,
it is better for fewer rows to qualify early, thereby reducing
the answer set and the number of rows that will have to be
subsequently scanned.

Wednesday, April 01, 2015

It is technically possible to learn how to write SQL statements without having an in-depth knowledge of the data. However, the better you know your data, the better your application performance will be. Let's look at a simple example.

By reducing the number of predicates on your SQL statements you may be able to achieve better performance by:

Reducing BIND (and REBIND) time because fewer options will probably need to be examined by the DB2 Optimizer.

Reducing execution time due to a smaller path length caused by the removal of search criteria from the optimized access path. DB2 will always make sure that it processes each predicate coded for the SQL statement. Removing predicates removes work -- and less work equals less time to process the SQL.

Of course, you have to make sure that you can actually remove predicates without impacting the result set of your query, right? But sometimes - if you know your data - there are cases where you can eliminate predicates.

This statement retrieve all rows for vice presidents who are at a grade level of 10 or above. But, what if we know more about our data? Say, for example, that the starting grade level for vice presidents in our organization is 10. Therefore, it is impossible for anyone with a lower grade level to achieve the title of VP. That makes the second predicate redundant in this case. If we remove this predicate it will not logically change the results, but with less checking of the data required (DB2 won't have to check for GRADE_LEVEL >= 10) performance may be improved.

It is important though that you truly do "know your data." For example, it is not sufficient to merely note that for current rows in the EMPLOYEE table, no vice presidents are at a grade level below 10. This may just be a coincidence. Do not base your knowledge of your data on the current state of the data. You must truly know your business criteria to determine that a correlation between two columns (such as between GRADE_LEVEL and TITLE) actually exists. And only then should you modify your SQL. Failure to do so can result in incorrect results being returned.

Also, if the predicate was already there and you are removing it, comment out the predicate instead and be sure to document exactly why you are doing so in the code... that way, when somebody else takes a look at it later they'll know what happened and why.