This is the blog of Robert Catterall, an IBM Db2 for z/OS specialist. The opinions expressed herein are the author's, and should not be construed as reflecting official positions of the IBM Corporation.

Friday, August 30, 2013

DB2 for z/OS: GETPAGEs are not ALWAYS the Key Determinant of SQL Statement CPU Time

In my DB2 for z/OS-related work over the past 25 years, I've done a lot in the way of performance analysis, looking at individual queries and at DB2-accessing application programs (both transactional and batch in nature). Time and time again, sometimes in reviewing trend data and sometimes in examining DB2 monitor output to gauge the impact of performance tuning actions, I've seen that the CPU time consumed in SQL statement execution rises or falls with the associated level of GETPAGE activity (a GETPAGE is essentially a request by DB2 to examine a page from a table or an index). The correlation I observed between CPU cost and GETPAGEs for SQL statements was so high that I came to see GETPAGE activity as the number one determinant of a query's CPU consumption in a mainframe DB2 system (and I declared as much in a blog entry that I wrote a few years ago while working as an independent DB2 consultant).

Well, a lot of rules have exceptions, and that turned out to be the case for this rule of thumb. I got some information from a DB2 for z/OS DBA that turned my notion of the relationship between GETPAGEs and CPU cost for SQL statements on its head. The data I saw seemed at first to be very strange, but as I dug deeper it started to make sense. I still see GETPAGEs as being, more often than not, the most important factor with regard to a query's CPU cost, but it's not the only factor, and sometimes other aspects of SQL statement execution exert a greater influence on CPU consumption. In this blog entry I'll show you why GETPAGEs are not always the strongest flavor in the mix of determinants that influence CPU consumption for DB2 for z/OS-accessing queries.

Interestingly, the case about which I'm writing involved data compression. That's a DB2 for z/OS feature that's been around a LONG time (around 20 years or so), and which is VERY widely used. Data compression is super-popular at mainframe DB2 sites because it saves a ton of disk space and it's virtually free, overhead-wise. Why is DB2 for z/OS data compression so CPU-efficient? Two reasons: first, it gets a hardware assist from System z servers (big difference versus a software-only implementations of data compression). The second reason is GETPAGE-related: because compression often allows you to get three to four times as many rows in a page versus a non-compressed table space, the number of GETPAGEs required to access data frequently drops considerably when a table space is compressed. Yes, even hardware-assisted data compression has a cost, but many DB2 for z/OS-using organizations have found that the reduction in GETPAGE activity that goes along with compression largely offsets the CPU cost of compressing and decompressing data rows; thus, the near-zero overhead effect of DB2 compression observed in many systems.

Imagine my surprise, then, when the aforementioned DBA presented me with DB2 monitor data showing the effect of data compression on a DB2 for z/OS-accessing query. I saw that the number of GETPAGEs associated with execution of the query dropped by 65% after the target table space was compressed. No surprise there. I also saw that the query's elapsed time dropped by a whopping 92%. Not a huge surprise there, either. The lion's share of elapsed time for many a query is wait-for-I/O time, and with many fewer pages having to be read from disk to obtain the query's result set, a much-reduced run time would be expected. Then came the not-expected result: the query's CPU time went up by 31% following compression of the target table space.What?!? How could that be? How could you have a 65% reduction in GETPAGEs and and a 31% increase in CPU time? Sure, I wouldn't expect CPU time to go down by the percentage of GETPAGE reduction, because some CPU cost does get added when COMPRESS YES is put into effect for a table space. Still, though, I would have expected either a modest decrease in the query's CPU cost, or at worst a break-even situation, with the lower level of GETPAGE activity cancelling out the CPU expense of decompressing result set rows. This just didn't make sense to me.

My confusion started to dissipate as I learned more about the query in question, and thought more about what DB2 was doing in executing the query. The query targeted a table with about 3 million rows, and 600,000 of those rows had to be examined in order to generate the result set. The large number of rows needing evaluation was due to the fact that the primary search argument -- the one that really filtered out a lot of rows -- was a non-indexable LIKE predicate used to find a particular character string that could occur anywhere within a certain column of the table (the % in front of the LIKE made the predicate non-indexable; an index on the column could have been used in executing the query had there been only a % after the LIKE, versus the %LIKE% that enabled location of the character string anywhere in the referenced column). Another predicate in the query -- an indexable one -- filtered out 80% of the table's rows, but the remaining 600,000 rows had to be individually examined for the existence of the required character string in the referenced column, as noted.

Here's where the compression effect kicked in. See, when rows in a compressed page are evaluated against a query search argument, they are individually decompressed, one at a time. Sometimes (very often, in fact), an index can be used to identify the small number of qualifying rows that must be examined in a given table page (or simply retrieved from the page, if result set row qualification can be fully determined at the index level), and decompressing those few rows in the page will be a relatively small component of the SQL statement's overall CPU cost. Other times (and this was the case for the query that the DBA discussed with me), an index can be used to reduce the number of table pages requiring access in query execution, but in a given page there might be 30 or 40 or more rows that have to be decompressed and evaluated (and maybe retrieved). When decompression cost is driven by the expansion of 30 or 40 rows in a page, versus 3 or 4 rows, that cost can rise to the point that it exceeds the cost reduction associated with diminished GETPAGE activity for a compressed table space. That's exactly what happened in the situation I've described. Was the compression performance result reported by the DBA good or bad? That depends on whether the performance objective for the query was improved response time (in which case the big reduction seen for the query's elapsed time with the compressed table space would be a positive outcome) or lower CPU consumption (this particular query consumed less CPU time when the target table space was not compressed).

It all goes to show that rules of thumb are best thought of as guidelines, as opposed to absolutes. You still have to use your brain in analyzing query performance, and you have to be ready to dig below the surface of a situation. GETPAGE activity is indeed a very important factor in determining a query's CPU cost in a DB2 for z/OS system, and tuning actions that reduce GETPAGEs are generally beneficial with regard to SQL statement CPU efficiency. What you have to remember is that GETPAGE activity is one of several determinants of query performance. When DB2 has to do something to a lot of rows in each page accessed in the execution of a query -- things like decompression, value transformation via scalar functions, casting of one data type to another, arithmetic operations, etc. -- the CPU cost of GETPAGEs can become a relatively smaller part of the query performance big picture. Consider what DB2 has to do with the rows on a page -- and with how many of the rows on that page -- once it gets the page. There will be times when improving the CPU efficiency of a query is is not just a matter of reducing the query's GETPAGE count. Also, have a good understanding of your query tuning objective: is it reduced elapsed time, or reduced CPU consumption, or both?Rules of thumb can be useful in guiding your query tuning efforts. Don't let them become blinders that prevent you from thinking outside of the box.

Hi Robert,Very interesting article.I have a question. I noticed for some jobs ( performing Delete/Updates)that number of Page updates for Buffer is much higher than overall number Getpage requests , like this :Buffer Pool: BP8K1

Based on my understanding, it is no surprise, in some cases, to see buffer updates in excess of GETPAGEs. If a page contains 20 rows that are all updated by an application process, that would be 1 GETPAGE and 20 buffer updates. Note that "update" here does not necessarily mean SQL UPDATE statements - INSERT and DELETE statements drive buffer update activity, as well. Think of "buffer update" as "buffer change."