5.4. Deploying the Kafka Applier

Kafka is a highly scalable
messaging platform that provides a method for distributing information
through a series of messages organised by a specified topic. With
Tungsten Clustering the incoming stream of data from the upstream replicator is
converted, on a row by row basis, into a JSON document that contains the row
information. A new message is created for each row, even from multiple-row
transactions.

The deployment of Tungsten Clustering to Kafka service is slightly different.
There are two parts to the process:

Service Alpha on the master extracts the information from the MySQL
binary log into THL.

Service Alpha on the slave reads the information from the remote
replicator as THL, and applies that to Kafka.

Figure 5.5. Topologies: Replicating to Kafka

With the Kafka applier, information is extracted from the source database
using the row-format, column names and primary keys are identified, and
translated to a JSON format, and then embedded into a larger Kafka message.
The topic used is either composed from the schema name or can be configured
to use an explicit topic type, and the generated information included in the
Kafka message can include the source schema, table, and commit time
information.

The transfer operates as follows:

Data is extracted from MySQL using the standard extractor, reading the
row change data from the binlog.

The Section 10.4.8, “ColumnName Filter” filter is used to
extract column name information from the database. This enables the
row-change information to be tagged with the corresponding column
information. The data changes, and corresponding row names, are stored
in the THL.

There are some additional considerations when applying to Kafka that should
be taken into account:

Because Kafka is a message queue and not a database, traditional
transactional semantics are not supported. This means that although the
data will be applied to Kafka as a message, there is no guarantee of
transactional consistency. By default the applier will ensure that the
message has been correctly received by the Kafka service, it is the
responsibility of the Kafka environment and configuration to ensure
delivery. The
replicator.applier.dbms.zookeeperString
can be used to ensure acknowledgements are received from the Kafka
service.

One message is sent for each row of source information in each
transaction. For example, if 20 rows have been inserted or updated in a
single transaction, then 20 separate Kafka messages will be generated.

A separate message is broadcast for each operation, and includes the
operation type. A single message will be broadcast for each row for each
operation. So if 20 rows are delete, 20 messages are generated, each
with the operation type.

If replication fails in the middle of a large transaction, and the
replicator goes OFFLINE, when the
replicator goes online it may resend rows and messages.