5.7.5. Keeping CDC Information

The Redshift applier can keep the CDC data, that is, the raw CDC CSV data
that is recorded and replicated during the loading process, rather than
simply cleaning up the CDC files and deleting them. The CDC data can be
useful if you want to be able to monitor data changes over time.

The process works as follows:

Batch applier generates CSV files.

Batch applier loads the CSV data into the staging tables.

Batch applier loads the CSV data into the CDC tables.

Staging data is merged with the base table data.

Staging data is deleted.

Unlike the staging and base table information, the data in the CDC tables
is kept forever, without removing any of the processed information. Using
this data you can report on change information over time for different
data sets, or even recreate datasets at a specific time by using the
change information.

To enable this feature:

When creating the DDL for the staging and base tables, also create the
table information for the CDC data for each table. The actual format
of the information is the same as the staging table data, and can be
created using ddlscan:

In the configuration file,
s3-config-svc.json for each
service, specify the name of the table to be used when storing the CDC
information using the storeCDCIn
field. This should specify the table template to be used, with the
schema and table name being automatically replaced by the load script.
The structure should match the structure used by
ddlscan to define the CDC tables: