Oracle Blog

Coherence Clustering Principles

Overview

A Coherence environment consists of a number of components. Below I’ll describe how they release to each other and what the terms mean. But just to give you a flavor, here they are displayed as a hierarchy.

Distributed cache services with the same name will cluster together to manage their cache data. So you can have multiple clustered services across Coherence nodes. This article will explain how these components work together and how applications interacts with them.

What do we mean by a Coherence Cluster?

It’s a set of configuration parameters that control the operational and run-time settings for clustering, communication, and data management services. These are defined in the Coherence operational override file and include such things as:

Multi-cast or unicast addresses for locating cluster members

Cluster identity information

Management settings

Networking parameters

Security information

Logging information

Service configuration parameters

These settings are used by Coherence services to communicate with other nodes, to determine cluster membership, for logging and other operational parameters and are similar to the Domain configuration used by Weblogic Server. They also apply to the entire Coherence cluster node, which usually means the whole JVM. Although a cluster node will usually equate to a JVM, it is possible to have more than one node per JVM by loading the Coherence libraries multiple times in different isolated class loaders, e.g. a "child first" class loader. However, this is only usually done within the contaxt of an application server or a test framework.

The coherence.jar file contains a default operational override file – tangosol-coherence-override.xml – that will be used if another is not detected in the CLASSPATH before the coherence.jar library is loaded. In actual fact there are 3 versions of this file;

tangosol-coherence-override-eval.xml

tangosol-coherence-override-dev.xml

tangosol-coherence-override-prod.xml

Which is selected will depend on the mode that Coherence is running in (the default is Developer mode) and can be set using the system property -Dtangosol.coherence.mode=prod.

Note: Its important to ensure that the production mode is selected for production usage, as in production mode certain communication timeouts etc will be extended so that Coherence will wait longer for services to recover – amongst other things.

What is a Coherence Service?

A Coherence service is a thread (and sometimes pool of worker threads) that has a specific function. This can be:

Connectivity Services

Clustering Service – manage cluster membership communications. There is exactly one of these services per JVM (or within a class-loader). This service tracks which other nodes are in the cluster, node failures etc.

This document will focus on Distributed Cache Services that manage distributed caches defined from a distributed schema definition.

When a schema definition (as shown below) is parsed Coherence instantiates a service thread with the name specified. This service thread will manage the data from caches created using the schema definition. So how does this all happen then?

If an application using Coherence calls:

NamedCache tradeCache = CacheFactory.getCache(“trade-cache”);

A couple of things happen:

When the Coherence classes are loaded they will by default search the CLASSPATH for the coherence-cache-config.xml file – which is actually the name specified in the default operational override file. The first instance that is found wild be used. However, if one is specified using the system property –Dtangosol.coherence.cacheconfig=<cache config file> it will use that cache configuration file instead. Also a cache configuration can be explicitly loaded from the CLASSPATH, as follows:

When the cache configuration file is parsed distributed cache service threads are started for all the cache “schemes” that are defined if the autostart parameter is set to true (by default its false).

<caching-schemes>

<distributed-scheme>

<scheme-name>DefaultDistributedCacheScheme</scheme-name>

<service-name>DefaultDistributedCacheService</service-name>

<autostart>true</autostart>

</distributed-scheme>

</caching-schemes>

Note: For data services the autostart flag is only observed for distributed caches. So a replicated cache service would automatically be started.

Each cache service started is given a name - or one is created if none is specified. This service thread then attempts to join with other services that have the same name in the Coherence cluster. If none are found, i.e. it’s the first to start, it will become the “senior member” of the service. To illustrate this take a look at a sample log statement when Coherence starts a service.

2012-02-02 11:00:04.666/16.462 Oracle Coherence GE 3.7.1.0 <D5> (thread=DistributedCache:DefaultDistributedCacheService, member=1): Service DefaultDistributedCacheService joined the cluster with senior service member 1

In this case a distributed cache service called DefaultDistributedCacheService has started up on Member 1 of the cluster (the first JVM). As it’s the first service with this name to start it becomes the senior member – which means it has a couple of extra responsibilities, like sending out heartbeats etc.

Once the cache services have been started Coherence will try and match the cache name that has been passed, in this case “trade-cache”, with the appropriate cache scheme (and service) that will manage this cache. It uses the cache scheme mappings part of the cache configuration file to do this and wild card matching (if necessary) to identify the right cache scheme.

Note: Regular expression parsing is not used and wild cards cannot be used at the start of the cache name.

<caching-scheme-mapping>

<cache-mapping>

<cache-name>trade-*</cache-name>

<scheme-name>DefaultDistributedCacheScheme</scheme-name>

</cache-mapping>

</caching-scheme-mapping>

Once the correct cache scheme has been matched a reference to an existing cache managed by the cache service for this scheme will be returned or a new cache created, using the parameters of the cache scheme.

Cache services that use the same cluster name should try and scope their names to prevent name clashes. This is so that multiple applications sharing the same cluster don’t inadvertently use the same service name - and unintentionally shares data.

Each application that wishes to cache data in isolation from others can load its own cache configuration file (as shown above) and specify a unique scope name. This will then be used as a prefix to cluster service names to prevent service name collision. Below is an example scope definition in a cache configuration file:

<?xml version="1.0"?>

<cache-config>

<scope-name>com.oracle.custom</scope-name>

...

</cache-config>

And in JConsole you can see the scope prefix being used in conjunction with service names – circled in red.

Note: Its not used as a prefix to cache names.

By scoping service names multiple services can co-exist in the same Coherence cluster without interacting, thus allowing individual applications to create and manage their own caches in isolation.

Below we have 3 nodes (JVM’s) each running 2 partitioned cache services. Each cache service manages 2 caches and the data (key/value pairs) stored in these caches is shared evenly between the nodes.