Geo-replication in Apache Pulsar, part 1: concepts and features

September 27, 2017

Sijie Guo

The importance of a disaster recovery plan and, optimally, a disaster avoidance strategy, cannot be overstated. In any given week there are plenty of headlines that show this to be true. Regardless of the industry, when an unforeseen event takes places and brings day-to-day operations to a halt, an organization needs to recover as quickly as possible to continue to provide services to its clients. From data security breaches to natural disasters, there must be a plan in place for swiftly and deftly dealing with catastrophe. Failing to have a disaster recovery plan in place can put the organization at risk of high financial costs, reputation loss, and even greater risks for its clients and customers.

In multi-faceted enterprise software systems, a disaster avoidance strategy and recovery plan requires a multi-datacenter deployment in which datacenters are geographically dispersed. In such a multi-datacenter deployment, a geo-replication mechanism can be deployed to provide additional redundancy in case a data center fails or some other event makes the continuation of normal functioning impossible.

In this and subsequent blog posts, we will describe another enterprise-grade feature that Apache Pulsar offers out-of-the-box: geo-replication. Apache Pulsar, leveraging the scalable stream storage of Apache BookKeeper, is a messaging system that supports both synchronous geo-replication (via Apache BookKeeper) and asynchronous geo-replication (configured at the broker level) across multiple data centers. We will start with some simple concepts and features in this blog post and describe some deployment practices in the next post.

Concepts

Geo-replication is a typical mechanism used to provide disaster recovery. A lot of data systems claim to support geo-replication. However, these systems generally only replicate to 2 data centers and have severe limitations when replicating to more than two. This can leave users confused and forced to do unwieldy things with the system to try to get it to replicate to multiple data centers. Before talking about the geo-replication feature in Apache Pulsar, I’d like to spend some time talking about a few basic concepts in geo-replication.

The geo-replication mechanisms used in different data systems can be put into two categories, synchronous geo-replication and asynchronous geo-replication. Apache Pulsar supports both geo-replication strategies. Figure 1 below illustrates how synchronous geo-replication and asynchronous geo-replication differ from one another.

In this example, assume there are 3 data centers: us-west, us-central and us-east, while a client is issuing a write request to us-central. In the synchronous geo-replication case, when the client issues a write request to us-central, the data written to us-central will be replicated to the other two data centers, both us-west and us-east. The write request is typically only acknowledged to the client when the majority of the data centers have issued a confirmation that the write has been persisted. In this case, at least 2 data centers have to confirm that this write request has been persisted. This mechanism is called “sync geo-replication” because the data is synchronously replicated to multiple data centers and the client has to wait for an acknowledgement from the other data centers.

Figure 1 Synchronous Geo-Replication vs Asynchronous Geo-Replication

In contrast, with asynchronous geo-replication the client doesn’t have to wait for a response from the other data centers. The client receives a response immediately after us-central successfully persists the data. The data is then replicated from us-central to the other two data centers, us-west and us-east, in an asynchronous fashion.

Synchronous geo-replication provides the highest availability, where all of your physically available data centers form a global logical instance for your data system. Your applications can run everywhere at any data center and still be able to access the data. It also guarantees stronger data consistency between different data centers, which your applications can easily rely on without any manual operations involved when data center failures occur. However, your applications have to pay an extra cross-datacenter latency penalty, which is typically around dozens of milliseconds from the US west coast to the US east coast.

Asynchronous geo-replication provides lower latency because the client doesn’t have to wait for responses from the other data centers. However, it also results in weaker consistency guarantees, due to asynchronous replication. Since there is always a replication lag (a replication lag typically means the data hasn’t been replicated from source to destination) in asynchronous replication, there is always some amount of data that hasn’t been replicated from source to destination. When a disaster strikes (be it a natural disaster such as flooding, fire, earthquake, or simply power loss, or external connectivity loss) an entire data center can be crippled, causing the data that hasn’t been replicated to be lost. Because of the replication lag, applications usually need to be written or configured to tolerate such cases when data center failures occur. Asynchronous geo-replication is typically used in use cases that have relaxed consistency requirements and is often observed in messaging or non-database storage systems.

Apache Pulsar, relying upon Apache BookKeeper for durable message storage, is able to support both geo-replication methodologies.

Before jumping into the details, let me spend 30 seconds to explain a typical Pulsar installation. This will help explain how Apache Pulsar supports both synchronous and asynchronous geo-replication.

Figure 2 illustrates a typical installation of Apache Pulsar. A Pulsar cluster is composed of two layers: a stateless serving layer, comprised of a set of brokers for serving pub/sub traffic; and a stateful persistence layer, comprised of a set of BookKeeper bookies for durably storing messages.

Figure 2. A typical installation of Apache Pulsar

This architecture pattern, separating storage from serving pub-sub traffic, has a lot of advantages. It makes brokers “stateless” which makes load balancing and traffic shifting inexpensive. This has proven to be the key to the success of multi-tenancy (read this blog post on multi-tenancy for more details). It is also the key to enabling Apache Pulsar to support both synchronous and asynchronous geo-replication.

A synchronous geo-replicated Pulsar installation is comprised of a global Zookeeper installation (a ZooKeeper ensemble is running across multiple data centers), a cluster of bookies that running in multiple data centers, and a cluster of brokers that are also running in multiple data centers. A BookKeeper region-aware placement policy is configured and used by Pulsar brokers to store data across multiple data centers and guarantee availability constraints (for example, writing to at least 2 data centers before acknowledgements) on writes.

A synchronous geo-replicated Pulsar cluster can continue to function like normal when a datacenter goes down. The applications running on top of it will be largely unaffected. Adding a new data center or retiring an old data center will be transparent to the applications. Adding or retiring data centers can even be done at runtime with zero downtime when doing such operations. Such setup is good for mission-critical use cases which are able to tolerate slightly higher latency.

Asynchronous Geo-Replication in Pulsar

In contrast, an asynchronous geo-replicated pulsar cluster is comprised of multiple physical clusters set up in different data centers. Pulsar brokers replicate data asynchronously between those different clusters. Figure 5 illustrates an asynchronous geo-replicated pulsar installation, in contrast to the synchronous geo-replicated cluster illustrated in Figure 3.

Figure 5. An asynchronous geo-replicated Pulsar installation

In asynchronous geo-replication, when messages are produced on a Pulsar topic, they are first persisted to the local cluster and then replicated asynchronously to the remote clusters. In normal cases, when there are no connectivity issues, messages are replicated immediately, at the same time as they are dispatched to local consumers. Typically, end-to-end delivery latency is defined by the network round-trip time (RTT) between the data centers. Applications can create producers and consumers in any of the clusters, even when the remote clusters are not reachable (for example, during a network partition).

Asynchronous geo-replication can be enabled on a per-property (per-tenant) basis in Pulsar. That means asynchronous geo-replication can be enabled between clusters only when a property has been created that allows access to both clusters. Although geo-replication must be enabled on a per-property basis for permission constraints, it is actually managed at the namespace level. That means that if a tenant has permissions to access data centers A, B, C, and D, it is able to create a namespace for geo-replication between A and B, another namespace for geo-replication between C and D, while also creating a third namespace for a full-mesh replication between A, B, C, and D.

Pulsar provides tenants a great degree of flexibility for customizing their replication strategy. That means that an application is able to set up master-slave-style replication, active-active bidirectional replication, and full-mesh replication between multiple data centers (Figure 5 illustrates a full-mesh replication setup in 3 data centers). In addition, replication is automatically performed by Pulsar brokers and transparent to applications. That means that unlike other pub-sub messaging systems that need to set up additional complicated processes to mirror messages between data centers, geo-replication can be enabled, disabled, or dynamically changed at runtime (e.g. from master-slave replication to active-active bidirectional replication)by simply issuing a single admin command. For details about failover, failback, and best practices for using asynchronous geo-replication in Pulsar, we will describe more in the next blog post.

Multi-Datacenter Replication at Yahoo

Pulsar has been deployed globally in more than 10 data centers at Yahoo since 2015, with full-mesh asynchronous geo-replication. This geo-replication has been used for mission-critical services, such as Mail, Finance, Gemini Ads, Sherpa (Yahoo’s distributed key-value service) and ect. It replicates 100 billion messages/day over 1.4 million topics.

At that scale, it becomes critical to have the flexibility and the tools to effectively manage the replication. From adding or removing regions, to the replication set of a namespace, to having monitoring that can show the complete picture of where the data is being retained, how much data and why the replication is happening slowly.

Finally and most importantly, with geo-replication, the chances of having network partition or degraded network performance between different data centers is much higher than within a single data center. It is therefore critical for the messaging and storage components to be able to sustain an extensive period of time building backlog, from hours to several days. Equally critical is when the network issue is resolved, to be able to drain the backlog faster than the new messages are being published without impacting the traffic.

Conclusion

Apache Pulsar, leveraging the scalable stream storage of Apache BookKeeper, is a messaging system that supports both synchronous geo-replication (via Apache BookKeeper) and asynchronous geo-replication (configured at the broker level). In this blog post, we examined two common methodologies used in geo-replication for data systems and explained their differences and tradeoffs. Pulsar supports both geo-replication methodologies via different mechanisms. We hope that this provides you with a better understanding of Apache Pulsar and its geo-replication feature. In the next blog post, we’ll examine a few common patterns or practices using asynchronous geo-replication in Apache Pulsar.

If you’re interested in Pulsar, you may want to participate in the Pulsar community via:

*Apache Heron is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by Apache Incubator PMC. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.