1.1. Tungsten Replicator

Tungsten Replicator is an open source high performance replication engine
that works with a number of different source and target databases to provide
high-performance and improved replication functionality over the native
solution. With MySQL replication, for example, the enhanced functionality
and information provided by Tungsten Replicator allows for global transaction
IDs, advanced topology support such as multi-master, star, and fan-in, and
enhanced latency identification.

In addition to providing enhanced functionality Tungsten Replicator is also
capable of heterogeneous replication by enabling the replicated information
to be transformed after it has been read from the data server to match the
functionality or structure in the target server. This functionality allows
for replication between MySQL, Oracle, and Vertica, among others.

Understanding the Tungsten Replicator works requires looking at the overall
replicator structure. In the diagram below is the top-level overview of the
structure of a replication service.

At this level, there are three major components in the system that provide
the core of the replication functionality:

Extractor

The extractor component reads data from a data server, such as MySQL or
Oracle, and writes that information into the Transaction History Log
(THL). The role of the extractor is to read the information from a
suitable source of change information and write it into the THL in the
native or defined format, either as SQL statements or row-based
information.

Information is always extracted from a source database and recorded
within the THL in the form of a complete transaction. The full
transaction information is recorded and logged against t a single,
unique, transaction ID used internally within the replicator to identify
the data.

Applier

Appliers within Tungsten Replicator convert the THL information and apply
it to a destination data server. The role of the applier is to read the
THL information and apply that to the data server.

The applier works a number of different target databases, and is
responsible for writing the information to the database. Because the
transactional data in the THL is stored either as SQL statements or
row-based information, the applier has the flexibility to reformat the
information to match the target data server. Row-based data can be
reconstructed to match different database formats, for example,
converting row-based information into an Oracle-specific table row, or a
MongoDB document.

Transaction History Log (THL)

The THL contains the information extracted from a data server.
Information within the THL is divided up by transactions, either implied
or explicit, based on the data extracted from the data server. The THL
structure, format, and content provides a significant proportion of the
functionality and operational flexibility within Tungsten Replicator.

As the THL data is stored additional information, such as the metadata
and options in place when the statement or row data was extracted are
recorded. Each transaction is also recorded with an incremental global
transaction ID. This ID enables individual transactions within the THL
to be identified, for example to retrieve their content, or to determine
whether different appliers within a replication topology have written a
specific transaction to a data server.

These components will be examined in more detail as different aspects of the
system are described with respect to the different systems, features, and
functionality that each system provides.

From this basic overview and structure of Tungsten Replicator, the replicator
allows for a number of different topologies and solutions that replicate
information between different services. Straightforward replication
topologies, such as master/slave are easy to understand with the basic
concepts described above. More complex topologies use the same core
components. For example, multi-master topologies make use of the global
transaction ID to prevent the same statement or row data being applied to a
data server multiple times. Fan-in topologies allow the data from multiple
data servers to be combined into one data server.