Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

I need to replicate data from a DB2 database into a SQL Server Database (along the lines of about 80 tables). I can create an SSIS package, to just flat copy the data from one to another pretty easy, but the amount of records is a couple million, so doing it once takes a while, but for the initial setup its fine. However, I need to do "almost" real-time updates, so this method won't be viable. What would be the best method, and how, to do just add change and delete replication from DB2 to SQL Server?

2 Answers
2

If you have adequate logging on the DB2 side, you should be able to easily identify records have been created, modified, or deleted since your last near-real time update. But if it was that easy, you wouldn't need Stack Exchange.

Failing the above-mentioned logging, your fastest option is probably a MERGE command. Across the link, this could be slow; if so, consider bulk-copying the DB2 database into a staging table on the SQL database and then running the MERGE between that and the permanent table locally on the SQL server. It will still take a while to populate the staging table, but the update of the live table will be about as quick as it's likely to get without schema changes.

If there's too much data to copy it all every five minutes (or whatever your near-real time schedule calls for), you may have no choice but to start logging events on the DB2 side, with triggers probably.

An alternative approach would be to examine a slice of records on each update. At 5:20 you copy from DB2 to SQL all records ending in zero (ID % 10 = 0), and merge them into the main table. At 5:25 you copy and merge all records with an ID ending in one. This obviously assumes that you have some kind of synthetic key which is fairly uniformly distributed. Your data will not all be current all the time, but it will be mostly current most of the time. For best currency, make the slices as thick as possible while staying within your time window; dividing by ten is just an example. You could expand on this approach by partitioning the table, if desired.

You could create a hidden timestamp column in DB2 on the replicated tables. When you run the replication job you can pull just the data that has changed since the last replication. This would lower the quantity of data sent.

For instance, run the SSIS package every minute pulling just the last minutes worth of data.

Or, instead of a timestamp field, you could use an auto incrementing int and keep a small table on the replicated instance that records the last int value replicated. The replication process would grab any new records, and update the small table's value to reflect the last value replicated.