It’s been a while since my last post – things have been very busy! I’ve just moved across to the Coherence Development team and it’s definitely an exciting time to join. After I’ve settled in, I look forward to doing some more regular posts. On to the post…

Overview
Many customers I’ve worked with want to run a single cluster across multiple data centers to provide DR capabilities. In the case where the data centers are connected by relatively slow networks, it is much better to have two separate clusters and connect them via Coherence*Extend. There are patterns on the Incubator site such as the push-replicate pattern, which shows how to replicate data between sites in this configuration.

But many of these data centers are now connected via 10Gb or higher and customers are asking “Why can’t we have a single cluster across both?” Doing this is possible, but not recommended unless latencies are extremely low due to the potential effect of a slower link on the entire cluster. The detailed discussion on this is for another day as there are many other factors to consider, but it is possible under the right conditions.

Distributed Cache Diversion
Before we get into more detail, just a diversion to talk about how partitioning of data works in a distributed cache. (This will help us understand the end result we are trying to achieve.) For a Distributed cache, the data is evenly distributed across all the available members using a common hashing algorithm. Where possible, without affecting the overall balance of data, backup and primary copies of data reside on separate physical machines for data reliability. When the Service, which contains the cache, has data with all backups and primary copies on physically separate machines, it is known as machine-safe. In this state the Service can survive the loss of an entire machine without data loss.

Taking this further, what about if you have multiple racks within a data centre and a Coherence cluster spans these? What about multiple sites? How do we get rack-safe, or site-safe? Prior to 3.7.1 the site or rack a cache server resided on was not taken into account when making these backup decisions, just the machines. In 3.7.1, with the new Simple Partition Assignment Strategy, Coherence takes into consideration the entire topology of the cluster from machine to rack to site when backing up data.

So now it is possible to not only have clusters that are machine-safe, they can be rack-safe and site-safe as well!

Back to the example
The only viable solution before 3.7.1, for achieving a “site-safe” cluster, was to set the machine-id manually to force coherence to consider the two sites being 2 machines, e.g. Backup across so-called “machines”. That’s a reasonable approach, but if we lose a site then the cluster becomes only node-safe, not machine-safe because Coherence thinks it has only one machine, so loss of a physical machine could cause data loss.
In the diagram below to achieve this, it would be done by setting the following:

-Dtangosol.coherence.machine=siteA (for all machines on Site A)
-Dtangosol.coherence.machine=siteB (for all machines on Site B)

Traditional “Site-Safe” Method

Using the Simple Partition Assignment Strategy
With the new Simple Partition Assignment Strategy in 3.7.1, and with the above cluster setup, as long as you set the site name using
–Dtangosol.coherence.site=siteA and SiteB, or using the appropriate override, you will be able to achieve a site-safe configuration. E.g. you could lose an entire site at once, and you would not lose data. Similarly if you have multiple racks in your configuration, as long as you identify them via the –Dtangosol.coherence.rack setting, you can achieve a rack-safe configuration. E.g. you could lose an entire rack you would not lose data.

Demonstration
To demonstrate this I’m using part of the Coherence Incubator functionality, which allows you to easily startup/shutdown multiple cache servers, either in process or as separate processes. I’ve built a wrapper around this and made a simple command line utility that allows me to dynamically specify a machine, rack and site before starting up cache servers.

I’ll provide the details of the code below, but in my cache-configuration, all I have to do to enabled this for a service, is to set the partition-assignment-strategy.

Rack-safe Configuration
Consider the example below where you have a single site with 2 racks with 2 servers each for simplicity. Each server will have 2 cache servers running.

Rack-safe Configuration

Running my utility (which I’ll show below) and passing the IP address of my machine for WKA configuration, I can create this setup using set rack and set machine commands which will set the tangosol.coherence.rack and tangosol.cohernece.machine system properties before starting the cache server(s).

Now we have a site-safe configuration using the Simple Partition Assignment Strategy!

Closing thoughts
This is a great new feature which has seemed to slip in without too much fanfare. Definitely something that many people have been asking for.

As mentioned early on, running a Coherence cluster across multiple geographically dispersed sites is possible but care should be taken when doing this. Speeds of 10Gb and extremely low latencies are a must, you must also ensure that you test the link using a tool such as the datagram test as well as assessing the impact of cluster traffic on your other cross-site traffic. Other factors, outside the scope of this discussion, should also be considered.

One of my colleagues has also posted about this new partitioning strategy, and has some good advice down the bottom of his post. Worth a read too!

6 Responses to Making your cluster site or rack safe with Coherence 3.7.1

Do you happen to know internal details of what happens on the joining node/s when one or more of the nodes in the
coherence cluster is/are restarted while cluster is populated?
We are getting OOM error in jboss server when we are joining the cluster. jboss has local storage turned off.

Hi,
When a cache server is shutdown and it holds data, the primary and backups that the cache server owned will be distributed amongst the remaining storage-enabled cache servers. If it is a graceful shutdown, this will be done before the cache server shuts down. If its not graceful, e.g. failure, the data will be recovered by the normal recovery process.

In terms of OOM, that can happen when there are not enough cache servers left to hold the data. e.g. you had 20 cache servers and then only 5 are left. If you don’t have size limitations you can get OOM.
but if you are getting OOM in you storage disabled clients it could be many things and without error messages, config ,etc, difficult to diagnose.
Probably worth posting a question at https://forums.oracle.com/forums/forum.jspa?forumID=480 or if log an SR with Oracle support.

Hi Tim,
I see you use WKA configurations.
We configured our Coherence cluster multicast over our 2 low latency connected datacenters. In the past Ehcache clusters performed well over multicast over our datacenters.
Do you see advantages using WKA? I guess with 2 datacenters you create different wka’s for each to avoid a single point of failure? Is the wka uptime included in the site-save status?
Thanks

I just use WKA in my examples so as not to inadvertently join other clusters people are using.

If you are able to use multicast on your network, I would use it if you can as there are some operations that perform better with multicast. E.g. when multicast is enabled and a message needs to be sent to > 25% of the cluster, it will be sent via multicast.

There is a great article from Jon Purdy here that explains the reasons why multicast is preferred option.

Having said that,WKA will work as well, but you need to be aware of the some of the operations mentioned by Jon, especially in large clusters.
By setting WKA, you effectively disabled multicast communications in a cluster altogether.

Blogroll

Copyright (c) 2017 Tim Middleton and other contributors. All Rights Reserved. The views expressed in this blog are our own and do not necessarily reflect the views of Oracle Corporation. All content is provided on an 'as is' basis, without warranties or conditions of any kind, either express or implied, including, without limitation, any warranties or conditions of title, non-infringement, merchantability, or fitness for a particular purpose. You are solely responsible for determining the appropriateness of using or redistributing and assume any risks.