2010's 10 Most Popular Data Replication Topics for DB2

These are the ten topics that I felt were the most popular with the sales teams and customers I worked with in 2010. As with last year's list, this year's based solely on my own experience. No way I'm going to try to reconcile my colleagues' opinions and come up with a composite list :) There are just too many opinions around for that. I'll stick with mine, start at the top, and work down.

1. Is Q Replication Free?

In October, IBM announced that two-site Q Replication would come at no additional cost with DB2's new Advanced Enterprise Server Edition and all editions of InfoSphere Warehouse. This quickly became the hottest replication topic to come my way. I got a lot of questions about when you actually need to buy Q Replication now. So I set up a post over at developerWorks called, surprisingly enough, "Is Q Replication Free?" I'll update the info there as questions come in.

Backing up a step... why is this free-Q move being made? It's part of DB2's high availability and active-active database story, which brings us to the next most popular replication topic of 2010...

2. High Availability and Active-Active Databases

You can use Q Replication as the main component of a database availability solution for two or more systems (typically two these days). Q is not a substitute for an availability solution like DB2 data sharing. Instead, Q is more likely to be used to complement other availability solutions. For example, Q may be the basis for an active-active solution for two highly-available DB2 sites or to keep two DB2 servers in sync for load balancing and failover. If you're not familiar with how Q works for these solutions, I have an introductory video here on ChannelDB2.

Better than my video were this year's excellent sessions about Q Replication at IBM's IOD conference in Las Vegas. If you missed them, you can still access the presentations at ibm.com. These show what people are doing with Q. They include:

This is a web-based health and performance monitor for IBM's Q Replication. It combines live graphical monitoring with information about status, performance, data conflicts, and errors. It makes Q Replication highly accessible, especially compared to the near black box experience you get from some data replication products.

A new version provided a variety of enhancements in 2010, including a (1) Performance Advisor (shown below) and (2) a Recovery Advisor to assist you with high availability, active-active, and DR scenarios.

And, yes, this photo shows the Dashboard on a Nexus S phone running Google Android 2.3 and Adobe Flash 10.1. Don't get too excited though :) My feeling is that the Dashboard really needs a mobile variation of its website to be practical from a small touchscreen interface. Another release or two of Android and Flash might not hurt either.

4. DB2 pureScale

With 9.8 fix pack 2, DB2 enabled it's log read API for pureScale. This means pureScale can now be both a source and target for Q Replication (before FP2, it could only be a target). There's a lot of interest in using Q as part of active-active sites and DR solutions for pureScale. I got a lot of questions around whether we require the source and target to have the same number of members (no) and if the source could be a pureScale and the target something else (yes). I'll work on a post to answer the most frequently asked questions later.

In the mean time, I also get questions about whether you have to buy Q Replication for pureScale. I will be answering that in my developerWorks post about Licensing Tips for Q and SQL Replication. However, in a nutsell, if you have two sites both running DB2 LUW's, then here's what you need for the site(s) running pureScale:

pureScale must be purchased for DB2 AESE but two-site Q is included at no additional cost

both pureScale and Q must be purchased for DB2 ESE, with Q purchased via InfoSphere Replication Server or the IBM Homogeneous Replication Feature (HRF).

pureScale is included in DB2 Workgroup, but two-site Q must be purchased via InfoSphere Replication Server (no HRF is available)

I'll throw in some pictures when I post on developerWorks.

5. Setting Up WebSphere MQ

New Q Replication users on UNIX and Windows are sometimes new to WebSphere MQ. They worry about whether MQ is difficult to set up. It isn't. But, they don't know what's required to set it up quickly. To make it easy for them, a new command was added to the replication command line and script processor, asnclp. The command generates small operating system scripts (shell for UNIX, bat for Windows) that will create the MQ objects you need for Q Replication on source and target systems.

For example, the following asnclp script generates scripts for queue managers and queues for unidirectional subscriptions between two databases called 'sourcedb' and 'targetdb':

Once the scripts are generated, run the one for the source database on the source system and the one for the target database on the target system. That's it. If you prefer, you can even have the command run the scripts. Just run the same asnclp script on both source and target systems. That makes MQ set up even easier.

Note that your asnclp must be running at the 9.7 fix pack 3a level or higher. Your Q Replication can be at any level.

6. InfoSphere Replication Server 10 for z/OS

It was announced the same day as DB2 10 for z/OS, but there's no dependency between the two. Replication Server 10 supports DB2 8, 9 and 10 for z/OS while both current versions of Replication Server on z/OS, 9 and 10, support DB2 10 for z/OS. That's true for both Q and SQL Replication.

The most popular new feature so far is Q Replication's ability to replicate Alter Column Datatype DDL automatically. It updates subscriptions and alters the target table to match. Note that both your DB2 source and your Q Replication must be at V10 to use this function.

7. The IBM Smart Analytics Systems

Both SQL and Q Replication are built into the Smart Analytics Systems. The Q technology is exactly the same as found in InfoSphere Replication Server and the IBM Homogeneous Replication Feature. The SQL technology is what's built into DB2 LUW and InfoSphere Warehouse. That only leaves two questions unique to the Smart Analytics Systems:

Which release of replication can I use?

Do I need to buy it?

I have posts to answer these questions for the 7700, 7600 and 5600. Other models follow the same pattern. You can read the one about the 7700 as an example for understanding what's what for the other models. The only recent change is that the 7700's software stack now includes InfoSphere Warehouse 9.7.2. Why is that important? That's the first release of InfoSphere Warehouse where two-site Q Replication is included at no additional cost.

8. Comparing the Data in Source and Target Tables

When you replicate data, you'll eventually have a day where someone changes a target table in such a way that it's out of sync with its source table. What do you do? The only solution from some replication technologies is to reload the target table with source data. That's not the case with Q and SQL Replication. For many years, DB2's had utilities, asntdiff and asntrep, that compare and repair the data in the source and target tables of a replication subscription. They can be an alternative to reloading a target with source data.

You can also use the compare utility, asntdiff, to do periodic comparisons to verify table differences don't persist over time.

9. Statistical data for replication

From year to year, I find this one of the most under considered tools in the data replication world. Many people go through a product's trial or proof of concept without asking what data is available to help them tune, monitor, and manage what they're evaluating.

For example, what kinds of data are available? How do I verify when my peak changed data volumes happen? How do I know if latency is consistently meeting requirement defined in my organization's Service Level Agreements (SLAs)? How do I access this data or provide reports on it?

You could wind up being sorely disappointed once you go into production. However, with IBM's Q and SQL Replication, you have access to a wide variety of historical statistical data about replication performance and workloads processed. It's easily retrieved via queries since it's stored in relational tables. For example, see this Q Apply table. Tables also makes it easy to integrate with in-house monitoring and reporting software. Or, if you're using Q Replication, you can access all of this data through the Q Replication Dashboard.

How often do you get a row of statistics? By default, data is generated once a minute (one every five minutes in older releases) by each Capture or Apply program and stored for seven days. Of course, you can change these values to meet your needs.

z/OS customers love to offload work onto zIIP processors. However, replication technologies don't tend to offload to zIIP. The good news is that, if your replication technology is a DB2 application that uses SQL, a percentage of its DB2 workload can be zIIP eligible if it accesses DB2 for z/OS via DRDA over a TCP/IP connection. That pretty much describes SQL Replication.

When the SQL Apply program runs on the target system, you would be hard pressed to set it up any other way than to have it connect to a DB2 z/OS source via DRDA over TCP/IP. SQL Apply then uses SQL to pull changed data to the target. That SQL can include column transformations or even source views with transformations and joins. The benefit is even more interesting when you have a data distribution with an SQL Apply running on each target system. Throw in not-logged table spaces for CD tables, and you have a noticeably lower impact on source MIPS.

To be clear, the zIIP eligibility of an SQL workload and the percentage offload are determined by DB2. See DB2's information to understand what it offers in your environment.