How to extract change data events from MySQL to Kafka using Debezium

Introduction

As previously explained, CDC (Change Data Capture) is one of the best ways to interconnect an OLTP database system with other systems like Data Warehouse, Caches, Spark or Hadoop.

Debezium is an open source project developed by Red Hat which aims to simplify this process by allowing you to extract changes from various database systems (e.g. MySQL, PostgreSQL, MongoDB) and push them to Apache Kafka.

In this article, we are going to see how you can extract events from MySQL binary logs using Debezium.

Debezium Architecture

First, you need a database-specific Debezium connector to be able to extract the Redo Logs (e.g. Oracle, MySQL) or Write-Ahead Logs (e.g. PostgreSQL).

You also need to have Kafka running so that you can push the extracted log events and make them available to other services in your enterprise system. Apache ZooKeeper is not needed by Debezium, but by Kafka since it relies on it for consensus as well as linearizability guarantees.

The op attribute value is u, meaning we have an UPDATE log event. The before object shows the row state before the update while the after object captures the current state of the updated customer database row.

DELETE

When issuing a DELETE statement:

DELETE FROM `inventory`.`customers`
WHERE id = 1005;

The following event is being recorded by the kafka-connect Docker container: