Index Decluttering Opportunities in DB2 for Linux, UNIX, and Windows

Fred Sobotka

How long has it been since you took a good look at your indexes? -- Photo by Stacy Brunner

This month marks the second birthday for DB2 10.1, which delivered an impressive set of new engine capabilities and enhancements. Depending on your organization’s upgrade policies, IBM’s recent release of Fix Pack 3 for DB2 10.5 may make that version a suitable upgrade target, too. That said, many database shops I’ve worked with are still largely satisfied with DB2 9.7--some are even happy at 9.5--and don’t feel compelled to upgrade for showcase features such as time travel query or BLU acceleration. This article shifts the spotlight onto some easily overlooked aspects of DB2 10.1 and 10.5 that can accelerate index performance while improving manageability.

You only think you’re stuck with those indexes

For production databases, index definitions may only undergo DBA review when creating or altering a table, or while tuning a long-running query. Even then, indexes are rarely ever dropped, since application vendors tend to swear up and down that all of their indexes are necessary, and in-house developers may respond with a shrug and “you never know” when asked about dubious indexes in homegrown applications. A superstition-based approach to database object management can be frustrating and discouraging to DBAs, especially when they’re attempting to streamline problematic indexes, but the potential for fewer, smaller, and faster-scanning indexes in DB2 10.1 and 10.5 arguably makes the index battle worth fighting.

Single-column indexes

A single-column index is a one-trick pony, and if the column has low cardinality (very few distinct values), the trick may not even be all that good. Are there any additional columns that would make the index useful to other queries? Even if a single-column index is unique, it can still be re-created as a composite (multi-column) index that includes extra columns not involved in the unique part of the key. For example, instead of creating a primary key that implicitly creates a single-column index on(ORDERKEY), you could first create a unique index on (ORDERKEY) INCLUDE (ORDERSTATUS), which would also benefit queries that filter on the status column when joining to that table. Now that the unique index already exists, DB2 will reuse it when you define the PK via ALTER TABLE...ADD PRIMARY KEY.

Redundant indexes - (c,d,e) vs. (c,d,e,f)

DB2 will automatically reject the creation of a new index if the specified key is identical to that of an existing index (same table, key length, column names and positions), but it won’t prevent you from creating a redundant index whose key is narrower but otherwise identical to another index. For example, an index on (ORDERKEY, CUSTKEY, ORDERSTATUS) is just wasting storage and I/O if that table also has an index on (ORDERKEY, CUSTKEY, ORDERSTATUS, TOTALPRICE). In those situations, keep the wider index and drop the narrower one.

Deciding what to do about a group of composite indexes built on different permutations of the same columns can be tricky. They may seem redundant at first glance, but this is often where the indexing superstition creeps in to raise some doubts. There could be a reasonable explanation behind the indexes on(PARTKEY, SALEDATE, CUSTKEY, ORDERKEY) along with (CUSTKEY, ORDERKEY, PARTKEY, SALEDATE) and so on, or it could just be someone’s futile attempt to appease the IXSCAN deity with a plethora of options. What’s different in DB2 10.1 is the jump scan operation, a technique the optimizer uses to avoid scanning index gaps (parts of the index that don’t satisfy the query predicates).

When it works, the jump scan operation can extend the versatility of composite indexes so you don’t need to create as many of them. The challenge is designing a composite index in a way that will persuade the optimizer to use a jump scan instead of a traditional index scan. If this sounds like a bit like voodoo, Scott Hayes has explored this topic in depth and offers some useful rules of thumb.

Star schema considerations

The jump scan operation is a building block of another new optimizer trick in DB2 10.1, the zigzag join. To the DBA, a typical data warehouse query such as a star join involving a large fact table and a few tiny dimension tables could look relatively simple, but the optimizer has been known to incur inefficient index scans when predicates in the WHERE clause filter more on the dimension table(s) than on the fact table itself. For example, filtering predicates such as (COLORCODES.COLORNAME LIKE 'ORANGE%' AND PAYMENTCODES.PAYMENTTYPE = 'CASH'), could be highly selective, but the fact table only contains dimension keys such as PARTKEY, COLORKEY, PAYMENTKEY,etc.

The zigzag join method handles this problem by automatically detecting a star join condition and doing some extra legwork to convert the query’s dimension values to dimension keys before operating on the fact table. After the zigzag join retrieves the necessary dimension keys, the optimizer can make better use of the composite index on the fact table, ideally with an inexpensive jump scan that jumps more than it scans. In our sample query, a successful zigzag join will scan the composite index of dimension keys, bypassing the irrelevant portions as if the query predicates had been written directly against the fact table (SALESFACT.COLORKEY IN (7,8,9) AND SALESFACT.PAYMENTKEY = 4).

Less frequent index reorganization

Sequential prefetching is still the fastest option for scanning large quantities of index pages--and data pages, for that matter--but the readahead prefetching technique introduced in DB2 10.1 provides an alternate approach that doesn’t incur as much of a penalty when scanning sparsely-populated index pages. The optimizer’s new smart index prefetching behavior watches for a significant quantity of low-density pages while scanning an index, and automatically switches from sequential prefetching to readahead prefetching in an effort to maintain tolerable read performance. After the sparse index is eventually reorganized by the DBA, smart index prefetching will detect the dense pages and stick to sequential prefetching when scanning that index.

“Mainframe-style” expression-based indexes

Before this feature arrived in DB2 10.5, creating an index on a built-in expression such as UPPER() required a GENERATED ALWAYS column containing that expression. If you used this approach and somehow managed to keep the queries focused on the underlying expression and not the generated column, then upgrading to DB2 10.5 may present you with an opportunity to declutter some tables, too.

Although expressions that reference UDFs cannot be indexed, the types of reformatting and other column-level transformations typically done by UDFs can often be recreated as a chain of built-in functions which are eligible for indexing in DB2 10.5. As indexed expressions become more complex, there is an increased risk that queries could specify the expression in a way that the optimizer doesn’t see as a suitable match for the index.

We finally get UNIQUE WHERE NOT NULL

DB2 10.5 not only delivers another piece of SQL-92, but also boosts performance by shrinking sparse indexes. Specifying the EXCLUDE NULL KEYS clause on any index (unique or otherwise) will prevent an all-null key from being inserted into the index at all, as if the row doesn’t exist. Enabling this option on indexes that tend to have lots of null keys will result in smaller index objects, faster scans, and better compatibility with migrated Oracle applications (whose unique indexes were probably designed that way in the first place).

Conclusion

Whether you’re considering a bit of spring cleaning on your existing indexes or suiting up for hunting season, the evolutionary improvements in DB2 10.1 and 10.5 provide DBAs with a game-changing set of index management options.