Deleted Record Handling

Depending on the Replication Method being used and how records are deleted in the source, deletes may not be captured during the replication process.

Deletion methods

There are two methods that can be used to delete a source record:

Soft deletes, which will leave a record in the source and use a flag to indicate deletion, such as is_deleted or deleted_on. If the delete event updates the record’s Replication Key value, Stitch will detect and replicate the changes.

Hard deletes, which completely remove records from the source. It’s as if the record never existed. If using Key-based Incremental Replication, this will remove the record’s Replication Key value, which Stitch uses to identify new and updated records. Without a Replication Key value to check, Stitch can’t identify the change and update the record in the destination.

Delete support overview

In the table below are each of Stitch’s Replication Methods and the level at which each deletion method is supported.

Click the Replication Method name to check out examples of how each deletion method works with that specific Replication Method.

Key-based Incremental Replication

In this example, we’ll demonstrate what happens when two records are soft-deleted in the source. This table uses the is_deleted column as a deletion flag, which will be true if the record is deleted.

Soft delete: Initial job

During the initial job:

In the source: A table contains three records: id: 1, id: 2, and id: 3.

During the job: Depending on the type of integration, the initial extraction will vary slightly:

Database integration: Stitch replicates the table in full.

SaaS integration: Stitch uses the Start Date defined in the Integration Settings as the initial Replication Key (updated_at) value. Records with an updated_at value greater than or equal to the Start Date are replicated.

Wait for Stitch to re-create the table during the next replication job

These steps must be done in this order, or you may cause issues with replication.

Hard delete: Initial job

During the initial job:

In the source: A table contains three records: id: 1, id: 2, and id: 3.

During the job: Depending on the type of integration, the initial extraction will vary slightly:

Database integration: Stitch replicates the table in full.

SaaS integration: Stitch uses the Start Date defined in the Integration Settings as the initial Replication Key value. Records with a Replication Key value greater than or equal to the Start Date are replicated.

Log-based Incremental Replication

In this example, we’ll demonstrate what happens when two recrds are soft-deleted in a source table using Log-based Incremental.

Soft delete: Initial job

During the initial job:

In the source: A table contains three records: id: 1, id: 2, and id: 3.

During the job: Stitch uses SELECT replication to replicate the table in full.

In the destination: Stitch loads the table in full.

Source

id

name

is_deleted

1

Finn

false

2

Jake

false

3

Bubblegum

false

Destination

id

name

is_deleted

1

Finn

false

2

Jake

false

3

Bubblegum

false

Soft delete: Second job

During the second job:

In the source: Records 1 and 3 are soft deleted (is_deleted: true) using an UPDATE event.

As UPDATE is a supported event type for Log-based Replication, log messages describing the modified is_deleted values for records 1 and 3 are written to the database’s binary log.

During the job: After the historical replication job, Stitch will read updates for the table from the database’s binary logs.

Stitch reads the log messages for records 1 and 3 and replicates the updated records.

In the destination: The is_deleted and updated_at values are updated for records records 1 and 3.

Source

id

name

is_deleted

1

Finn

true

2

Jake

false

3

Bubblegum

true

Destination

id

name

is_deleted

1

Finn

true

2

Jake

false

3

Bubblegum

true

Hard deletes with Log-based Incremental

In this example, we’ll demonstrate what happens when a record is hard deleted - or removed entirely - from a source table using Log-based Incremental. In this situation, Stitch will use a TIMESTAMP column named _sdc_deleted_at to indicate when records have been identified as deleted in the source.

Note: Identifying hard deleted records is dependent on how the record is deleted, as not all event types are recorded by databases in binary logs. For example: If a record is deleted using TRUNCATE in a MySQL database, Stitch will not detect the deletion.

The following event types will be detected by Log-based Incremental for MySQL- and PostgreSQL-backed database integrations:

INSERT

UPDATE

DELETE

Hard delete: Initial job

During the initial job:

In the source: A table contains three records: id: 1, id: 2, and id: 3.

During the job: Stitch uses SELECT replication to replicate the table in full.

In the destination: Stitch loads the table in full.

Source

id

name

1

Finn

2

Jake

3

Bubblegum

Destination

id

name

_sdc_deleted_at

1

Finn

2

Jake

3

Bubblegum

Hard delete: Second job

During the second job:

In the source: Record 2 is hard deleted using a DELETE event.

As DELETE is a supported event type for Log-based Replication, log messages describing the delete of record 2 are written to the database’s binary log.

During the job: After the historical replication job, Stitch will read updates for the table from the database’s binary logs.

Stitch reads the log message for record 2 and identifies the record as deleted.

In the destination: Record 2 will remain in the table, but have a timestamp in the _sdc_deleted_at column indicating when the deletion took place.