You may have seen a recent announcement on ibm.com that says IBM would no longer be marketing it's older data replication products in 2013. That includes InfoSphere CDC. Why?
And what happens to the CDC technology?

Over the years, IBM provided its data replication technologies
through a lot of different products. For example, IBM used to offer two major
data replication products at the same time -
InfoSphere CDC and InfoSphere Replication Server. That was a little
confusing,
even to some IBM people. To simplify the situation, IBM consolidated all it's replication technologies into a single product called IBM InfoSphere Data Replication
(IIDR). Once IIDR was available, the older products no longer needed to be sold to new customers. That's why the end of marketing was announced. However, the replication technologies - CDC, Q Replication, and SQL Replication - are still alive and well. You can
continue to use them as you always
have. Of course, you may have two related questions:

IIDR 11.4.0.0-5032 for Microsoft SqlServer
* Added support for GEOMETRY and GEOGRAPHY datatypes

IIDR 11.4.0.0-5032 for Oracle
* Added support for Oracle 12Cr2
JR58209 CDC CAN SKIP CERTAIN ARCHIVE LOGS IF THOSE LOGS WERE CREATED MUCH LATER THAN OTHER ARCHIVED LOGS WITH LATER SEQUENCE
JR58247 IIDR ORACLE AGENT FAILS TO IDENTIFY RESTORED ARCHIVE LOGS.

IIDR 11.4.0.0-5031 for Kafka

* Added support for using a broker list in place of a ZooKeeper server connection

I have requirement to deploy the IIDR CDC solution to continuous replication for around 100 tables from DB2 LUW (10.1/Linux X86 / RHEL 6) to target Oracle 11g.

I try to figure out most part by myself; looks no documents found in community groups/redbook for this scenario (DB2 LUW to ORACLE); So I would like to ask gurus here to help on high level setup / required installation setup etc...

IIDR 11.3.3.3-59 for Oracle
JR57167 IIDR FAILS TO START, WITH INVALID PARAMETER BINDING ERROR.
JR57162 IIDR IS NOT ABLE TO GET NEXT ONLINE LOG FILE TO READ WHEN PRODUCT IS CONFIGURED TO READ FROM ORACLE RAC SYSTEM.

IIDR 11.3.3.3-44 for DB2 LUW
* Modify Purescale transaction ordering logic from VTS based ordering to one based on LFS and LSN.

Note: Deployments of IIDR for DB2 LUW on PureScale should avoid DDL changes during the upgrade steps. Otherwise this will require completing the DDL procedure prior to upgrading to this version of IIDR CDC for DB2 LUW.

IIDR 11.3.3.3-44 for Event Server
JR56064 WHEN REFRESHING/MIRRORING DATA TO A EVENT SERVER MQ TARGET, THE XML MESSAGE DATA SHOULD CONTAIN A TAG FOR THE TABLE NAME.

IIDR 11.3.3.3-44 for Netezza
JR55991 NETEZZA: ERROR RETURNED WHEN MAPPING A TABLE IN A MULTI-SCHEMA DATABASE.

IIDR 11.3.3.3-43 for all engines
* Installer updates to address security vulnerability on Windows, see security bulletin http://www-01.ibm.com/support/docview.wss?uid=swg21984310.
APARs
JR55891 JOURNAL CONTROL FIELDS ARE NOT BEING CORRECTLY POPULATED IN A LIVE AUDIT MAPPING FOR THE 'CLRPFM' AND 'STRJRNPF' COMMANDS.

IIDR 11.3.3.3-43 for Oracle
* Updated JDBC driver
APARs
JR55262 IIDR FOR ORACLE 12 AS A TARGET: APPLYING CHANGES TO LOB, CLOB OR XML COLUMNS MAY THROW AN ARRAYINDEXOUTOFBOUNDEXCEPTION.
JR55858 PREFERRED ARCHIVE DESTINATION IS NOT CONSIDERED IF THE ARCHIVED LOG NAME IS "NULL" IN V$ARCHIVED_LOG

The following rules apply with respect to what Versions of Management Console (MC), Access Server, and CDC agents (engines) will inter-operate.

These rules apply to any CDC 6.x or higher release.

1) The MC and AS must be at the exact same release level
2) The CDC source and target agents (engines) can be at different release levels
3) The MC version must be >= the most recent CDC source or target agent (engine)

§Using ‘Standard’ replication achieves much higher throughput performance than using ‘Consolidation’ or ‘Summarization’

–

Standard replication can do optimizations such as arraying, commit grouping, etc that can not be performed when using the other replication methods

–

Note some optimizations will also be disabled if using Adaptive apply or Conflict Detection & Resolution

§Be aware when you are parking tables/subscriptions

–

An inactive (not currently replicating) subscription that contains tables with a replication method of Mirror will continue to accumulate change data in the staging store from the current point back to the point where mirroring was stopped. For this reason, you should delete subscriptions or remove tables that are no longer required, or change the replication method of all tables in the subscription to Refresh to prevent the accumulation of change data in the staging store on your source system.

–

The same is true with a parked (idle) table. You need to insure that the replication method is set to Refresh

For InfoSphere CDC z, there is no command available. Generally not a requirement as most z shops keep logs around for 10 days. If required, you can utilize the earliest open position indicated in the event log when InfoSphere CDC z starts replication

You need to consider and accommodate for cases when replication will be down for a period of time

Rule of Thumb:

Successful implementations typically have 5+ days of logs retained

If you do not have sufficient log retention, you need to be prepared to do table refreshes if something unexpected happens in your environment

The IBM Redbook titled "Smarter Business: Dynamic Information with IBM InfoSphere Data Replication CDC" is now available and can be found: http://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg247941.html?Open

This Redbook covers a wide range of topics from InfoSphere CDC use cases, solution topologies, features and functionality, performance, environmental considerations and automation. This is a great source of information if you are wondering how best to set up InfoSphere CDC, how do you fit it into a resilient environment, etc.

IIDR 11.4.0.0-5045 for Db2 LUW
JR58470 ROW FILTERING FAILS WHEN BIGINT DATATYPE USED WITH LARGE NUMBER GREATER THAN 10 DIGITS

IIDR 11.4.0.0-5045 for Kafka
* Now supports Kafka custom operation processors
* You can write audit records in comma-separated values (CSV) format by using the KcopLiveAuditSingleRowIntegrated Kafka custom operation processor.
* You can write records in JSON format by using the KcopJsonFormatIntegrated Kafka custom operation processor. No schema registry required!
* You can specify topic names and set up mappings between multiple source tables and target topics by using the KcopTopicMappingIntegrated Kafka custom operation processor.
* The KcopMultiRowAvroLiveAuditIntegrated Kafka custom operation processor can write audit records in Avro format and register the schema in a Confluent schema registry or a Hortonworks Schema Registry Service that uses its Confluent emulation mode.
* You can develop a user-defined Kafka custom operation processor to suit your needs using the samples provided.

IIDR 11.4.0.0-5044 for Microsoft SqlServer
* Now supports Microsoft SqlServer 2017 as a source or a target

IIDR 11.4.0.0-5044 for Oracle
* Now supports replicating from source tables that use the Oracle Interval Partition feature, where partitions are automatically added. On Oracle 12cR2, IIDR now supports Auto-List Partitioning as well.

IIDR 11.3.3.3-66 for Oracle
JR58247 IIDR ORACLE AGENT FAILS TO IDENTIFY RESTORED ARCHIVE LOGS.
JR58448 REQUESTED TO REMOVE A UOW THAT WAS NOT THE ACTIVE UOW
JR58464 SUBSCRIPTIONS IN ARCHIVED_LOG_ONLY MODE NOT FAILING WHEN REQUIRED ARCHIVE LOGS DO NOT EXIST.

IIDR 11.4.0.0-5024 for all LUW Engines is available on FixCentral
JR58069 TABLE MAPPING FAILS WITH CHC9696E ERROR

IIDR 11.4.0.0-5024 for Oracle
JR58034 SUBSCRIPTION SHOULD NOT FAIL WHEN ONE OF THE RAC NODES IS DOWN
JR58054 CDC IS REPLICATING COLUMN DEFAULT VALUE FOR NON-NULLABLE COLUMN WHEN THIS COLUMN IS NOT CHANGED AS A PART OF UPDATE

IIDR 11.4.0.0-5017 for Oracle
JR57440 EXCEPTION AT CONTINUE READING ERROR "ROW ALREADY EXIST IN INDEX TS_ORACLE_ARCHIVES_PK ON TABLE TS_ORACLE_ARCHIVES"
JR57469 REPLICATION STOPS WITH ARRAYINDEXOUTOFBOUNDS EXCEPTION WHEN PARSING OVERFLOW RECORD FOR A TABLE THAT HAS LOB COLUMNS.
JR57587 TABLE NAME WITH TRAILING SPACES CANNOT BE FOUND IN DATABASE
JR57642 HISTORY CLEANUP IS NOT DONE IF SHARED SCRAPE IS DISABLED
JR57710 CDC FAILS TO READ THE ARCHIVED LOG IN THE PHYSICAL STANDBY MODE.
JR57798 EXCEPTION "OPCODE T IS NOT MAPPED TO A DATADEFINITIONOPERATION TYPE" WHEN TRUNCATING ORACLE TABLE

JR57452 COMMITS ARE FILLING UP ASM AUDIT TRACES BECAUSE OF DEFAULT AUTOCOMMIT SETTING ON ORACLE EXADATA.JR57581 A NECESSITY EXCEPTION "REPORT_POSITION_ID IS BEFORE BOOKMARK" MAY END REPLICATION IN AN ORACLE RAC WITH AN INACTIVE NODE.

IIDR 11.4.0.0-5003 for Oracle
* JR57120 DMSHOWLOGDEPENDENCY CAN RETURN A NULL POINTER EXCEPTION
* JR57121 DMSHOWLOGDEPENDENCY DOES NOT RETURN THE FULL LIST OF REQUIRED LOGS
* JR57125 DMSHOWLOGDEPENDENCY COMMAND CAN RETURN COMPLETED LOGS IN THE REQUIRED LOG LIST AS NOT AVAILABLE, MISSING
* JR57162 IIDR IS NOT ABLE TO GET NEXT ONLINE LOG FILE TO READ WHEN PRODUCT IS CONFIGURED TO READ FROM ORACLE RAC SYSTEM.
* JR57167 IIDR FAILS TO START, WITH INVALID PARAMETER BINDING ERROR.

IIDR 11.3.3.3-58 for Oracle
* JR57085 CDC IS NOT PROCESSING ANY FILE IN MANUAL LOG SHIPPING WHEN FILE IS DELETED FROM ORIGINAL LOCATION
* JR56720 UPDATE IS FAILING WITH ERROR ORA-01407 WHEN COLUMN IS DEFINED AS NOT NULL WITH DATABASE DEFAULT VALUE
* JR57125 DMSHOWLOGDEPENDENCY COMMAND CAN RETURN COMPLETED LOGS IN THE REQUIRED LOG LIST AS NOT AVAILABLE, MISSING
* JR57120 DMSHOWLOGDEPENDENCY CAN RETURN A NULL POINTER EXCEPTION
* JR57121 DMSHOWLOGDEPENDENCY DOES NOT RETURN THE FULL LIST OF REQUIRED LOGS

IIDR 11.3.3.3-47 for all LUW engines can be found on Fix Central.
* Enhanced collection of file system characteristics in dmsupportinfo
* JR56205 "CURRENT TIMESTAMP" & "CONSTANTS" ARE NOT BEING POPULATED IN AUDIT MAPPING TYPE FOR THE 'CLRPFM' AND 'STRJRNPF' COMMANDS.

IIDR 11.3.3.3-45 for Oracle
JR55859 IIDR fails with ORA-04030 error when refreshing a table with XMLType column which has BINARY as storage option
JR56384 dmarchivelogavailable and dmarchivelogremove return errors that null value is not allowed or log to deregister is not valid

There are many deployment models available for InfoSphere Data Replication's CDC technology of which DataStage integration is a popular one. The deployment option selected will significantly affect the complexity, performance, and reliability of the implementation. If possible, the best solution is always to use CDC direct replication (i.e. do not add DataStage to the mix).

CDC integration with DataStage is the right solution for replication when:

You need to target a database that CDC doesn't directly support and is not appropriate for CDC FlexRep

Complex transformations are required that could not be handled natively with CDC, such as complex table look-ups

When integrating with MDM

Cons of replicating from CDC to DataStage to an eventual target database:

Performance going through DataStage (no matter which integration option is chosen) will be significantly slower than applying via a CDC target directly to the database

The exception to this rule is when targeting Teradata, if you use DataStage flatfile integration, the throughput will be higher than CDC direct to Teradata

The maximum number of tables per CDC subscription is lower if targeting DataStage

The CDC External Refresh does not work when targeting DataStage. A separate process would have to be put in place to de-dup duplicate records produced during the "in-doubt" period of a refresh (the captured changes that occurred while the source date was being refreshed).

IIDR 11.3.3.3-39 for all engines can be found on Fix Central.
JR55631 UPDATE FAILURE WHEN "MINIMIZE NETWORK LOAD" ENABLED
JR55714 EXPLICIT ENCODING CONVERSION DOESN'T WORK WITH CDC FOR DB2 FOR IBM I
JR55795 DIFFERENTIAL REFRESH FAILS WITH 'NO ROWS WERE FOUND'

IIDR 11.3.3.3-39 for Event Server
JR55517 INCORRECT VALUES IN EVENT MESSAGE FOR RECORDS RECEIVED AND APPLIED AT THE END OF REFRESH

When I first joined IBM in 2007 it seemed somewhat anachronistic that the Toronto Software Lab was managed by the leader of the Sensors and Actuators group. Now it seems prescient. As we consider the Internet of Things and see that all the physical objects around us have a useful place in the world of information, we see that our information assets can be viewed from a more traditionally physical perspective as well.

Databases are one of the most important assets that we have in an organization, certainly equal to our physical assets. As we consider the value in all the physical sensor information available, telling us who entered and exited every building, showing us through RFID tags what components flowed through an assembly line and so on, we should recognize the value of sensors on our databases as well.

I’ll focus on a particular type of sensor, one that provides a stream of the data changes occurring in the database. IBM’s Cloudant database provides a REST API that delivers a sensor stream of changes. IBM InfoSphere Data Replication can provide a sensor stream of changes from your distributed and mainframe-based relational databases, as well as from non-relational databases such as IMS and VSAM.

The original role for data replication technology was to enable low impact and low latency data movement. Data replication technology captures the changes occurring on the source database quickly and with minimal impact on that database and without requiring any changes in the database application. InfoSphere Data Replication captures changes from the database recovery logs. These traits make it ideal as a sensor.
Data replication has always had a role as an audit tool. Government regulations require certain industries to maintain an audit tail for their key data. Traditionally data mining was rarely done on these audit trails (let’s call them database sensor logs). The database sensor logs were kept primarily to meet the regulatory requirements.

Over time some industries have begun performing analytics on these sensor logs. Banks are using machine learning techniques to identify potential fraud events. Cell phone companies have been using streaming analytics to identify upsell opportunities. This use of analytics will grow as the Internet of Things continues to drive better analytics tools and create more data scientists experienced at working with sensor data.

I am often talking with clients as they begin to create an exploratory zone. They all understand the importance of having a copy of their database data in this exploratory zone and are interested in data replication technology as a way of maintaining a current copy of that database data. For exploratory zones that are being built around Hadoop it is easy to explain the advantages of using a database sensor log to provide that data as it suits the natural processing model of HDFS and Hive. Data replication can provide the sensor log as a series of files stored in HDFS and the data scientist can create Hive views over those files that can allow them to see either the entire audit trail or collapse that audit trail to just show the latest contents. Access to an audit trail is essentially a free side effect of the most practical method to provide data scientists with a current copy of the data and suits the general philosophy that one should not discard data on the way into your exploratory zone.

Most of our clients are just beginning the process of discovering the valuable questions that can be answered using this sensor log. An interesting difference between a database sensor log and a conventional physical sensor log is that the physical sensor log is often the primary source for both the current state of the physical object and the history of that state. You may learn both the current temperature of the engine block and the changes in that temperature over time. Many of the ideas discussed around the Internet of Things, such as the connected car, are primarily leveraging the information about the current state. This sort of analytics around the current state is already in place for databases. If you want to look at the Internet of Things to seed your thinking about what you may be able to get from database sensor logs you need to focus on those that are dependent on the history, not just the current state.

The use of personal fitness trackers to identify when a person with mobility issues may have fallen is an example that requires history. It seems quite similar to the fraud detection example that is already being done with database sensor logs. Some aspects of the connected car do depend on history, tracking the changes over time between two different sensors, say RPM and oil pressure, to ensure they maintain the expected relationship as they change. This might be comparable to comparing the database sensor log with the click stream from your application to confirm how many clicks it is taking to make specific types of updates to your system of record.

I think we are just scratching the surface here. I’m interested to see what other answers we will find. I encourage you to add a database sensor log to the assets you make available to your data scientists.

IIDR 11.3.3.3-27 for Oracle
JR54995 WITH AN ORACLE SOURCE, THE AFTER IMAGE SENT TO THE TARGET IS DIFFERENT FROM THE ACTUAL SOURCE VALUE
JR55186 SUBSCRIPTION MAY FAIL WITH ERROR: INDEXOUTOFBOUNDSEXCEPTION INDEX: -52000, SIZE: 2000 WHEN USING ASM

IIDR 11.3.3.3-27 for SqlServer
* The engine now checks whether the SqlServer version is supported.
* Improved handling of scraping logs while a backup is in progress

APARs:
JR54946 IF SMTP SERVER IS NOT CONFIGURED PROPERLY, IIDR CAN BE UNRESPONSIVE
JR55016 DMSET COMMAND RETURNS ERROR WHEN RUN ON IIDR LUW 11.3.3.2_RELEASE 19

IIDR 11.3.3.2-25 for Netezza
APARs:
JR54816 TRUNCATE WHEN CONFIGURED WITH "ON CLEAR/TRUNCATE ==>DO NOT DELETE" IS NOT WORKING AS EXPECTED IN NETEZZA ENGINE

IIDR 11.3.3.2-25 for DataStage
APARs:
JR55037 THE INFOSPHERE DATASTAGE PROPERTIES ROW THRESHOLD VALUE IS NOT BEING APPLIED DURING REFRESH TO DATASTAGE TARGETS

IIDR 11.3.3.2-25 for Oracle
* IIDR for Oracle now supports the timezone conversion derived expression %TODIFFERENTTIMEZONE
APARs:
JR55111 SUBSCRIPTION FAILS AFTER CERTAIN DDLS ARE EXECUTED IN ORACLE
JR54995 WITH AN ORACLE SOURCE, THE AFTER IMAGE SENT TO THE TARGET IS DIFFERENT FROM THE ACTUAL SOURCE VALUE
JR55079 IIDR CDC FOR ORACLE 12C FAILED TO RETRIEVE TABLE STRUCTURE FOR TABLE WHEN MC WAS LOADING TABLE MAPPING DETAILS
JR54985 ROLLED BACK INSERTS IN ORACLE CAN BE SENT TO THE TARGET TO BE APPLIED

APARs:JR54961 EXCEPTION WHILE TRYING TO MODIFY DDL STATEMENTS THROUGH A DDL USER EXITJR54841 DUPLICATE MESSAGES SENT BY REPLICATION AGENT

IIDR 11.3.3.2-23 for Netezza
New Functionality:
* Improved performance of support assistant collection

APARs:JR54762 SUPPORT ASSISTANT HANGS. THIS PREVENTS THE SUBSCRIPTION FROM EXECUTING AN ABORT/SHUTDOWN AFTER IT HITS AN UNCAUGHT EXCEPTIONJR54809 NULL POINTER EXCEPTION GENERATED WHEN AUTOCOMMIT USED ON SOURCE WITH CDC NETEZZA TARGET

IIDR 11.3.3.2-23 for DataStage

New Functionality:

* More robust start-up for Direct Connect auto-start subscriptions after replication was ended.

APARs:
JR54468 UPDATE TABLE DEFINITION DOES NOT AUTOMAP NEWLY ADDED COLUMNS WITH THE SAME NAME

Previously released in IF 4:

- VARGRAPHIC data replicated is getting halved if there is a column with LOB datatype in the table.
- Correct refresh of DBCLOB ccsid 13488 to BLOB
- Failing to add a subscription with 16-character length user password on MS SQL Server target

IIDR 11.3.3.2-19 for DataStage

IIDR 11.3.32-19 for Event Server

APARSJR53811: CDC EventServer : Fixes an issue where a transaction with multiple operations has a CCID value larger than can be contained in a 31-bit integer, the first operation in transaction will have non-zero CCID value

IIDR 11.3.3.2-19 for Oracle

APARsJR54714 LOB COLUMNS MAY BE UPDATED WITH VALUES FROM A DIFFERENT ROW

IIDR 11.3.3.2-19 for Sybase

IIDR 11.3.3.2-19 for DB2 LUW

IIDR for DB2LUW will not automatically map identical columns during table assignment if the column name is larger than 30 characters.
APARsJR54344 IF SCHEDULED END IS ISSUED WHILE THE SUBSCRIPTION IS JOINING SHAREDSCRAPE THE SUBSCRIPTION MAY STALL OR NOT END

IIDR 11.3.3.2-19 for Teradata

* TPUMP apply is no longer available.

IIDR 11.3.3.2-5258 for Management Console and Access Server can be found on Fix Central.

New functionality (released in 11.3.3.2-5):
* Now supports configuring an instance on a PureData System for Analytics appliance with multiple schema support enabled.
* Now supports the VARBINARY datatype
* Performance improvements in bookmark processing

The IBM Developer Works Data Replication communities are being merged into this community. As such, the "The CDC (Change Data Capture) Forum" is being renamed to better represent the expanded content that will now reside in this community.

For those who have been using the CDC Developer Works Community, you will notice a change to the look & feel of the entry page. This new menu driven mechanism will provide extra flexibility as significant new content gets added to this community. I have tested it out on various browsers, so if you experience an issue, please send me an email at gsakuth@ibm.com and I will investigate.

Note I will also be cleaning up some old blog entries where questions were asked. Questions should continue to be asked on the forum versus the blog.

There are a large number of usability enhancements, new platform/DB support such as SQL Server 2014, along with technology previews/beta available for many new key technologies such as a Cloudant apply, WebHDFS support for Hadoop, and many others.

I have updated the look and feel of the wiki to make it easier to navigate and find what you need.

Samples are now organized in their own table and can be found by clicking the "Samples" icon on the home page.

All "how-to" documents can now be found by clicking on the "Documents" icon on the home page.

A new section has been added to the wiki which describes the features that have been added to all the recent IIDR CDC releases. In the near future I will be adding presentations with details on the features in this section.... so please check back.

I've added three new videos to my channel. They walk through configuring, operating and monitoring data replication using the CDC Management Console. This is basically the same thing you'd get if you came by the InfoSphere demo room at Information On Demand (now Insight) and agreed to let me show you a quick demo of CDC.

I'm recording some videos where I provide technical background about data replication. I've created a YouTube channel to collect them all together. The channel is "James Talks about Data Replication". Here's a link:

I've uploaded two so far. The first one discusses the special considerations you should be aware of when using data replication with tables that have duplicate rows. The second discusses the role of data replication when moving to a real time operational analytics system from a traditional batch oriented data warehouse.