We (Jive Software) have been a long-time Coherence customer and still find it to be the top clustered-cache solution out there. The feature we're most excited about in the 2.1 release is the invocation service. Essentially, it provides a way to send arbitrary commands to some or all cluster members. For us, this is the other piece of the clustering equation that we've always been looking for (beyond the clustered cache functionality that Coherence has always provided).

Ghanshyam: Where can I find info about JCache? It is listed as a JSR but seems to have been based off of prior work.

Unfortunately, the way that the JCP works is that you see the original submission that the JSR was created from, and until there is a vote on the submission from the expert group you don't see anything else.

In the case of JCache, it is even worse because the current state of the JSR is totally different from the original submission. The good news is that various members of the JCache expert group (including me) are pushing to get something into the public sight ASAP. Also good news is that JCache is now based on a much simpler proposition (i.e. the Java collections API) than the originally submitted spec.

Unfortunately, the way that the JCP works is that you see the original submission that the JSR was created from, and until there is a vote on the submission from the expert group you don't see anything else.

>

Which means that if you are not a member of the JSR, then you have a terrible disadvantage in time to market compared to companies, or individuals that are. But I guess thats OK with you Cameron, because you are, indeed, in. =)

> In the case of JCache, it is even worse because the current state of the JSR is totally different from the original submission. The good news is that various members of the JCache expert group (including me) are pushing to get something into the public sight ASAP. Also good news is that JCache is now based on a much simpler proposition (i.e. the Java collections API) than the originally submitted spec.

Cameron: Unfortunately, the way that the JCP works is that you see the original submission that the JSR was created from, and until there is a vote on the submission from the expert group you don't see anything else.

Johan: Which means that if you are not a member of the JSR, then you have a terrible disadvantage in time to market compared to companies, or individuals that are. But I guess thats OK with you Cameron, because you are, indeed, in. =)

Actually, it is something that I dislike greatly, and I have been vocal and public about it. And my complaints have been echoed by at least one other member of the expert group (Strachan).

Actually, it is something that I dislike greatly, and I have been vocal and public about it. And my complaints have been echoed by at least one other member of the expert group (Strachan).

>

Nice to hear =). Lets hope they listen, with both you and James Strachan on the case, they might consider it, hopefully. Anyway, its not a problem located in the JCache jsr, but rather in the whole JCP. Gives unfair advantage to the big players who can always get their top dogs in to the interesting jsr´s, while the smaller players has to wait for a public draft before they can do anything.

Ricardo: Tangosol looks like it's a great product and I would love to use it here, but can't afford it. :(

Thank you very much for the kind comment.

We do have volume discounts and we also sell using OEM contracts for companies that need to include our software as part of their own products. I don't know your specific business requirements, but we have made special arrangements with specific customers in the past that, for whatever reason, could not afford list pricing. Drop me an email (cpurdy at tangosol dot com) if you'd like to explore it further.

I use Coherence as the foundation for a cluster of machines that perform data-parallel operations.
Data (lots of) is loaded and dispersed (distributed cache) throughout the cluster and placed in an Object Repository.
From this dispersement, load balancing in my system is achieved.
The object repository (on each member) supports multiple level of cache stages that can be plugged in or removed on the fly .
(Ram Cache, Shared Ram Cache, NIO Heap Shared Cache, Disk based, etc.)
A Blackboard Architecture (replicated cache) handles command and control amongst the clusters and
is shared. (Command and control does not have a single point of failure)
Commands are placed on the blackboard and members act on these commands
Results are sent point to point back to the user using a distributed cache that resides on
one member (the calling member).

This description is very cursory... but I'd like to point out that Tangosol has been beside me at
every point offering suggestions and adding new ideas/features to their product.

The Invocation API of 2.1 allows these data parallel calls to be made without the use of my blackboard.
Tangosol so gratiously added this for grid customers like me.
As of late they added the ability for me to specify my own backup map....failover for
my distributed data on each member. Each members data is maintained on another member.
Last fall, they added NIO support, amongst others. You won't go wrong with this product,
It just keeps getting better.

Yes, it does support JCache which is very nice.
Yes, there are some limited features added for grid computing - also nice.

But from their white paper (http://www.tangosol.com/coherence-featureguide.pdf) under "Distributed Cache Services" it is obvious that every "GET" operation may involve a network trip to a neighboring cluster node.
This network trip means serializing data on one cluster node and de-serializing data on another cluster node.

This whole architecture is extremely questionable.

Isn't the whole purpose of caching to minimize the amount of network trips for "GET" operations? Wouldn't storing data on-demand within the same JVM with effective LRU policy be more efficient? Wouldn't this make GET operations instantaneous without any network trips? Also imagine how much would be saved on network load in read-mostly applications.

Also, there is nothing in the white paper talking about non-replicable cache.

Non-replicable cache may significantly reduce network load in applications where updates are big, because instead of propagating huge load of changes across all nodes in the cluster a small invalidation message is sent.

<Dmitriy>
Isn't the whole purpose of caching to minimize the amount of network trips for "GET" operations? Wouldn't storing data on-demand within the same JVM with effective LRU policy be more efficient? Wouldn't this make GET operations instantaneous without any network trips? Also imagine how much would be saved on network load in read-mostly applications.
</Dmitriy>

Take a look at another of their whitepapers (http://tangosol.com/coherence-versionedcache.pdf) and you will find one solution to the issue you are describing. We are currently working with a similar architecture and it is proving to be extremely scalable. We are able to partition data based on customers and the web server they reside on so all access to their data is in memory after the first hit, essentially eliminating any need for network trips. By utilizing their versioned caches, we can ensure that data is always consistent and never stale.

Look through some of their documentation and download a trial and you will begin to see what the big deal is ;-)

I did take a look at this white paper and yes, you are right, in this scheme data is stored locally on every node even though algorithm to achieve it seems somewhat over complex.

But what about writes?
Writes happen asynchronously from cache updates.

If I understand correctly, this may create an illusion that data is stored within data store when in reality it isn't.

What if nodes crashed?
What if I have non-Java applications reading from database?
What if I have triggers wihtin database that need to be executed whenever every update happens?

This list may go on and on.

Write-behind caching is useful for some scenarios but for many others it isn't.
In fact more scenarios do require functionality of Distributed Cache but again I will say that architecture of distributed cache is questionable because it may involve a network trip for GET operations.

<Rick>
Look through some of their documentation and download a trial and you will begin to see what the big deal is.
</Rick>

I did study their documentation thoroughly and I am sure that APIs are great and usable, but in my opinion for many scenarios there are much more scalable caching solutions that may involve less overhead and network load.
Regards,
--Dmitriy.

... under "Distributed Cache Services" it is obvious that every "GET" operation may involve a network trip to a neighboring cluster node.

Coherence is very modular in nature. One of the basic building blocks is our distributed (partitioned) cache, which as you mentioned would require a network trip for (n-1)/n operations in an n-node cluster. In fact, many of our customers use our distributed cache for total out-of-process caching, by setting their application server processes to be "local cache disabled" for specific caches, and running separate JVMs whose only job is to manage the cache. It also allows them to use our Java NIO featues even if they are running on JDK 1.2 or 1.3.

However, if you don't like making network hops to get your data, you just drop a near cache in front of the distributed cache and (voila) you have the answer to your question: Wouldn't storing data on-demand within the same JVM with effective LRU policy be more efficient? Except we offer a balanced LRU/LFU algorithm, since LRU will often thrash the cache altogether, causing 0% hit rates under heavy iteration usage. (We also offer pure LRU and pure LFU and of course lossless caches.)

Like I said, the architecture is very modular. The "near" cache is a combination of a local cache and a distributed cache. You can have a "dumb" near cache that just caches based on an expiration or eviction policy, or a "seppuku" (invalidation-based) near cache that caches until it receives an async notification of data invalidation, or optimally a versioned near cache that guarantees that its data is synchronously up to date and uses data versioning to accomplish it.

This network trip means serializing data on one cluster node and de-serializing data on another cluster node.

Actually, even when a network hop is required, we only have to deserialize the data. Strangely enough, the deserialization is actually the most expensive part of the distributed cache, since the network access (even on a relatively slow 100Mb network) is extremely fast with Coherence (no overhead from JMS or TCP/IP, etc.) And in our 2.1 release (this latest one) we introduced more optimizations for deserialization, speeding it up by as much as 10x in some tests. In the upcoming 2.2 we'll have automatic optimizations for XML and XML beans as well.

Also, there is nothing in the white paper talking about non-replicable cache.

The "distributed cache service" provides partitioned caches, not replicated caches. (The "replicated cache service" provides replicated caches.) We also have local caches, which are local caches, and near caches, which are local caches of remote caches.

In all fairness, I think Tangosol is hugely overrated.

OK. If you're still with Fitech Labs, I'd welcome the opportunity to discuss how well Coherence could snap into your architecture. I'd love to have you guys as a customer. I'd rather have you helping us get hugely over-rated ;-)

I do agree with you that there may be some practical applications of keeping the actual data on a separate JVM but I would like to stress out again that this will require a network trip for (as you mentioned) n-1/n operations. This means that 9 out of 10 operations will hop over network to get the data. It may be less expensive than database trip but much more expensive than local JVM access.

About using Distributed Cache with Local Cache (Near Cache), the 1st thing that strikes me is that there is double memory consumption because, as it is mentioned in the white paper, Local Cache stores objects directly and Distributed Cache stores objects in their serialized form.

Also, white paper states:
"A near cache maintains a cache of data locally; if the data is subject to changes in the distributed cache, local cache will either become out of sync, or the near cache must utilize means to maintain out-of-date information."

Which "means" are those? User must guarantee cache data integrity himself somehow? I don't think this solution is acceptable at all. I don't think user should ever worry about data being out-of-sync when accessing data from cache.

In fact the only combination that sounds useful out of all Coherence caches is Versioned Near Cache. It sounds like it will take care of cache coherency automatically. But even in this case there is still double memory consumption and requirements to use additional hash maps.

Again, I would like to say that I am sure that Tangosol cache is useful in many cases but in many other cases it's use is over-rated and there are other better caching solutions that may involve less overhead and less network load.

About using Tangosol in our system: thanks for the offer.
I would love to use it once I am convinced that it is the best caching solution for us. So far, I am not.

About using Distributed Cache with Local Cache (Near Cache), the 1st thing that strikes me is that there is double memory consumption because, as it is mentioned in the white paper, Local Cache stores objects directly and Distributed Cache stores objects in their serialized form.

Usually, the distributed cache is much larger than the near cache. For example, the distributed cache might be 10GB and the near cache might be 100MB. It depends on access patterns, which is to say, how close to a 100% hit rate can you get with the smallest possible near cache.

In fact the only combination that sounds useful out of all Coherence caches is Versioned Near Cache. It sounds like it will take care of cache coherency automatically. But even in this case there is still double memory consumption and requirements to use additional hash maps.

The versioned near cache is definitely a great way to handle cache coherency, if it matches your needs. The only additional information that gets managed is the version information, and if it is backed by a database, then there is a second set of information managed: the persistent version data. That way, the cache can compare the current cache version ("transient") with the database version of the data ("persistent").

We also support distributed invalidations (Dimitri Rakitine's Seppuku pattern) and near caches that simply invalidate their data after a certain period of time (if cache coherency is not required, e.g. for read-only use.)

could you explain your protocol a bit more or post a link to a white paper? Is it a standard protocol you use?

We use UDP to communicate data between/among members of the cluster. At a low level, UDP has significantly lower overhead than TCP, both in terms of computer resources and network bandwidth usage. Very importantly for us, UDP allows a member to communicate with another member (unicast) without having to connect to that member, and even communicate with large sets of members (multicast) with a single packet. The net result is that each member uses the same amount of "network sockets" in a cluster of 2 members and a cluster of 100 members, yet all communication between/among members can be direct (no intermediary). The downside of UDP is that packets can be lost, so a reliable delivery layer must be provided to ensure delivery of data. We use both positive and negative acknowledgement, coupled with configurable burst mode and asynchronous transfer mode to ensure the highest scalability. Without going into a great deal of technical detail, I can say that the communication capabilities were designed and optimized for 2 to 4 CPU servers running many threads of execution, each utilizing the clustered services in parallel. The net result is that with a cluster of dozens of servers, each running many threads that are utilizing the clustered services, the scalability of the throughput is simply amazing.

The clustered services each have their own "protocol" in terms of what the members running a particular service can ask or tell each other. For example, a member running a replicated cache service may tell all other members about a change to some data. We refer to that as a "clustered service protocol", which is tailored to the specific problem domain. All of these higher level protocols run over the clustered UDP-based protocol.

Lastly, it is worth pointing out that the resource utilization is amazingly low; for example, Coherence has no problem managing a 50MB cache of replicated data on a 64MB heap, including full failover and failback processing for a cluster of 16 servers. Being able to responsibly and predictably manage resources in a server environment is absolutely essential to being part of a non-stop processing system, and Coherence is repeatedly chosen for mission-critical enterprise applications because of its reliability and fault tolerant features. I figure you know the requirements of the market pretty well, since you founded SwiftMQ.

just wanted to raise a couple of concerns about using UDP -
Obviously multicast/unicast is inherently unreliable (and in fact gets more unreliable with load on both the client and the network).
After adding the ability to fragment large messages into smaller datagrams, and adding a Java based reliable protocol I think you will find that TCP/IP actually performs alot better. I've also found that the multicast scaling reliably in Java only programs is also a bit of myth.

just wanted to raise a couple of concerns about using UDP - Obviously multicast/unicast is inherently unreliable (and in fact gets more unreliable with load on both the client and the network). After adding the ability to fragment large messages into smaller datagrams, and adding a Java based reliable protocol I think you will find that TCP/IP actually performs alot better.

In our lab, we've definitely found some places where that is true, for example when a large portion of the TCP/IP stack is accelerated by hardware (as opposed to the OS drivers.) I would guess that the way that your product works is much better suited for TCP/IP (under JMS, right?) since there is a broadcast server that other servers listen to, from what I've read. (If I got anything wrong, it's just my own ignorance of your product, not a smear campaign ;-)

In our case, we have a cluster of some number n of servers talking directly to each other, sometimes to one server and sometimes to two servers and sometimes to the whole rest of the cluster (n-1 servers.) In this case, for a variety of reasons (including the threads and sockets necessary for communication) it is just not a practical solution to open a TCP socket to each and every server in the cluster. It probably would be acceptable for 2 or 3 servers, though.

I've also found that the multicast scaling reliably in Java only programs is also a bit of myth.

I've certainly seen implementations that scaled / performed very poorly, some of which are in your market (JMS).

OTOH, our implementation is pure Java and we do use multicasting when appropriate, and it has treated us very well. We do counsel our customers on reasonable limitations for fully replicated caches and things like that, since that type of cluster workload will rely more on multicasting, and thus degrade as the number of servers and the number / size of updates per server increases. For example:

1. Replicated cache of 100MB on 2 JVMs will use a total of 200MB, which is not very much. On 100 JVMs, it will use a total of 10GB of memory, which is starting to sound expensive ($).

2. Replicated cache on 2 JVMs will not use any multicast. Replicated cache on 3 or more JVMs will probably use multicast (multiple other JVMs to update.)

3. Replicated cache with 100 updates per second on each JVM is very manageable for 2 JVMs. Estimate CPU utilization at 5-10% and 100Mb network utilization at 15%. If the size of the updates double, the network utilization will roughly double! Furthermore, with 3 JVMs (but the same number of updates per second from each JVM), the CPU utilization will be 7-12% and the network utilization will be 22%. That's because each JVM is having to process the updates coming from all other JVMs. With 100 JVMs, the network utilization will be 100% (saturated) and the CPU utilization will be network-limited (it will only be less than 100% because the network is saturated, so it is somewhat idle waiting on the network.) Once the network approaches saturation, you end up in a very sluggish state, just like with TCP/IP, because packets are getting lost and requiring a re-send. There is a term specifically associated with this occuring on multicast networks: "multicast flooding."

In other words, for very high-scale applications, you have to choose your tools more carefully. Replicated caches are wonderful, particularly for small data sets that are very read intensive, because the "cache clients" (the servers in the cluster) will have instant access to the data, because a copy is local (it is replicated to each server running that replicated service.) Even in a cluster of 100 servers, fully replicated caches have their place, but it is (percentage-wise) a much smaller place than in a 4-server cluster, for example.

For data that is changing quite a bit, and for extremely large caches, we suggest the use of our distributed (aka partitioned) cache service. It scales beautifully (almost linearly) up to the extent of your fabric on your switched backbone. For a data center, that is often 48 or 72 servers. With IB and fiber interconnects between switches, it can go much higher than that, but you will start to see a little less than the linear scalability. Having the size of the cache grow linearly with the number of servers is also a very handy capability, allowing some of our customers to manage many gigabytes of data while maintaining very reasonable Java heap sizes (less than half a gigabyte.)

I confess I do work for a competitor though ...

Of course -- I know you guys pretty well by now ;-). BTW congratulations on your 2.0!

I would like to thank Cameron for for prompt answers to all my questions posted previously.

I still have a question regarding loading data with near caches.

Assume the following scenario:
1) Object O is "locally" cached on node A and "distributely" cached on node B.
2) Now object O is updated (e.g. references of O become stale on node A and B).
3) Get for O is issued on node A.
If I understand correctly, Get is step 3 will involve querying Node B, then going to data store from node B, then passing O back to node A.

My questions are:
Is a trip over to the next node cheaper than trip to database in all cases?

Wouldn't it be more efficient if node A went to data store directly? I think in this case you would save an extra network trip. Does Tangosol provide distributed cache that exhibits this behavior? And, if so, why would one chose to use Local cache with Distributed cache, a.k.a. Near cache, assuming that the goal is to have data cached on the same JVM?

Similar kind of question would apply when using Put operation as well.

I'm going to take a shot at this, but if I get the explanation wrong I'll have someone post the corrected answer tomorrow.

1) Object O is "locally" cached on node A and "distributely" cached on node B.

I assume that what this means is that O is in the cache, and server B is responsible for managing O, and a process on server A asked for O, so A "getted" O from B and stored it locally.

Now I'm going to add some more details that may or may not be 100% correct because the developer has a lot of choices on how they configure the system. I'm going to assume that O represents some persistent data, and to keep it simple, let's say that it represents JDBC data from a single database. Further, I'm going to assume that the Read-Through/Write-Through (or Write-Behind) functionality is being utilized. I'm also assuming a versioned near cache implementation.

The implication is that a process on server A asked for object O, it wasn't in the near cache, so it went to server B, it wasn't in the distributed cache, so server B loaded it from the database, stored it in the distributed cache (in this case, O is managed by server B) and returned O to server A, which stored it in its near cache and then utilized it in the process that requested it.

Compared to getting O directly from the database, this entire process would have added less than 2ms to the entire operation on a 100Mb network with typical server hardware.

2) Now object O is updated (e.g. references of O become stale on node A and B).

I don't grep this entirely. If O is updated, on any node in the cluster, it simply needs to be "putted" back into the cache. We support both locking and transactions against the cache to ensure data integrity.

With the write-through cache, the processing is similar to #1 above (i.e. add less than 2ms to manage the distributed cache.) The data ends up in the database immediately (before the client finishes the "put".)

With the write-behind cache, the entire write process takes less than 2ms, because the write is removed from the synchronous portion of the process (basically, it is deferred to be processed later on a different thread.)

3) Get for O is issued on node A. If I understand correctly, Get is step 3 will involve querying Node B, then going to data store from node B, then passing O back to node A.

No, it will just get it from node A. Node A has the current up-to-date copy of O on it. It knows it has the up-to-date copy because the transient version of the data that is managed by the versioned near cache matches the version of O that is in A's near cache. In other words, the get operation now takes 0ms.

Compare this to invalidation-based schemes, in which the changing of an object forces removal of that object from the cache, thus throwing away the objects that the system is actually using. (I call it an anti-cache, but unfortunately it is the most common caching pattern in use today.)

As far as predicting performance etc., there is no such thing as "typical", but with the versioned near cache and read-through / write-behind, we have seen 25-40x improvements in performance and -- much more importantly -- incredible scalability that could not be cost-effectively achieved with a database. Furthermore, the application can survive application server failure without impacting user experience, and (if the app is architected for it) the application can even survive database failure without impacting end-user experience. One of the applications mentioned in our 2.1 press release has proven to do just this, and they weren't exactly planning on having their database fail ;-)

1)
The implication is that a process on server A asked for object O, it wasn't in the near cache, so it went to server B, it wasn't in the distributed cache, so server B loaded it from the database, stored it in the distributed cache (in this case, O is managed by server B) and returned O to server A, which stored it in its near cache and then utilized it in the process that requested it.

Compared to getting O directly from the database, this entire process would have added less than 2ms to the entire operation on a 100Mb network with typical server hardware.

I don't think you can say 2ms for all cases. For small objects, yes, the overhead is minimal, but what if I am caching a query result that is a collection of 100s or maybe 1000s of objects? Are you saying that with Tangosol the overhead will be 2ms for objects of all sizes?

2)
I don't grep this entirely. If O is updated, on any node in the cluster, it simply needs to be "putted" back into the cache. We support both locking and transactions against the cache to ensure data integrity.

I have 2 questions about it.
a) When O is updated on any node in the cluster, storing it in the database will require 1 network trip to database. If I understand Tangosol Distributed cache correctly, putting object into cache may also require a database trip in 9 out of 10 cases. Therefore, (correct me if I am wrong) there is still an extra network trip.

b)
Sometimes, you don't know in advance all the objects that will be updated by the query (e.g. update ... table A where timestamp < yesterday), moreover, single update may change multiple objects in cache. For example, what if you are caching 10 collections (query results) and updating a single object changes all 10 query results in unknown manner (now this object may have to be added to query results or removed from them).

In these cases, issuing cache Put operation for all changed objects could become very time consuming or impossible (queries may be big, re-querying from database and multiple puts into cache with multiple network trips may be required). In this case, as you so called it, anti-cache invalidation-based pattern may prove to be much more efficient. How does Tangosol support this case? Again I would like you to take into account extra network trip that is involved with Distributed cache to get the data from neighboring node.

Also, I would like to add that most invalidation-based caches will still put the object into cache on the updating node and invalidate all other nodes. Following "Get" operations on other nodes may require a single trip to database (just once) to get the invalidated data, which maybe arguably shorter or longer in time than getting data from the neighboring node since most of the modern databases cache query results anyway.

Correction about my previous posting:
a) When O is updated on any node in the cluster, storing it in the database will require 1 network trip to database. If I understand Tangosol Distributed cache correctly, putting object into cache may also require a network trip in 9 out of 10 cases. Therefore, (correct me if I am wrong) there is still an extra network trip.

Please, ignore this one. No matter what cache implementation is, cluster still has to be updated, so an extra network trip is still required in one form or another on any implementation.

I don't think you can say 2ms for all cases. For small objects, yes, the overhead is minimal, but what if I am caching a query result that is a collection of 100s or maybe 1000s of objects? Are you saying that with Tangosol the overhead will be 2ms for objects of all sizes?

You caught me doing marketing. Yes, you are correct, the size of the object will affect the network time (and obviously the utilization of network throughput) for moving the object. With the replicated cache architecture, the objects are only "moved" on the network when they are "putted" into the cache. For the distributed cache architecture, both a "get" and a "put" will move the object on the network (assuming that the "get" is a miss in the near cache, of course, which in practice is relatively rare.)

In our tests here, we test with objects in the cache up to 100MB each, with up to 48 nodes in the cluster. The 100MB objects are significantly more expensive to move on a 100Mb network (and to manage in memory for that matter) than (for example) 1KB objects ;-)

On a 100Mb network going from a "client" through a hub to a switch to a distributed cache "server", for 1KB objects, with 4KB UDP packet size, we see round trip times of <1ms, but we use the 2ms figure to be safe. By the time you get up to 10x threads in a tight loop putting 10KB objects, you are network bound of course, so we see about 30ms per put (note that a put() returns the previous value, so it includes the functionality of a get()). That works out to about 6.7MB/s throughput (not our raw throughput, but actual "your data" throughput) on the wire, with commodity hardware, and going through both a hub and a switch. Neither the client nor the server side will peg the CPU on these tests, even with many threads, because they are network limited; you can see CPUs (and the I/O bus) getting pegged on the gig-E load tests of course.

Sometimes, you don't know in advance all the objects that will be updated by the query (e.g. update ... table A where timestamp <

First, I think you realize that a cache is a tool, and it isn't always the right tool, so I'm not going to try to convince you that it is right for every application and for every situation.

What we try to do is increase the percentage of situations in which our software can help applications work faster, more scalably and more reliably. We started with a relatively simple coherent replicated cache (1.0, December 2001) and in the year since then we've added quite a few options and capabilities.

Starting with 2.0, for example, you can actually run the queries against the cached data in a cluster, so you don't even touch the database for data selection. This implies that the data fully resides in the cache, or at least the "index" of the data that you are looking for is in the cache. Again, this is not a "general purpose solution" that most applications will use, but it is used by search engines, cartographic systems, etc. because it scales very close to linearly with the number of servers in the cluster.

In many cases, though, we see that customers send all queries to the database, and pull back key sets, which are then managed (gets and puts) through the cache. You are absolutely correct about some of the trade-offs, and that you can have very bad "worst case scenarios" with a cache system that thrashes because (1) the access pattern exceeds the cache size and (2) the access iterates over the access pattern to force the thrash. Such a situation needs to be avoided from an engineering point of view, either by an architecture that avoids it, or by making the cache large enough to be effective. One of our customers, for example, has a cache in their grid that is much larger than the amount of memory that they have, so they spool large parts of it to disk in the grid using btree file structures and memory mapped files (their custom impl.) Because of the nature of their application, they cannot utilize a database efficiently to assemble the data that the application needs, so this makes good sense in their particular case.

In these cases, issuing cache Put operation for all changed objects could become very time consuming or impossible (queries may be big, re-querying from database and multiple puts into cache with multiple network trips may be required). In this case, as you so called it, anti-cache invalidation-based pattern may prove to be much more efficient. How does Tangosol support this case? Again I would like you to take into account extra network trip that is involved with Distributed cache to get the data from neighboring node.

Each application is different. For some, distributing the work across the cluster (for example, using the clustered Invocation Service) will allow the data operations to be localized, such that with 100 servers, each server will only process 1% of the data.

With the classic model (one server processing 100% of the data locally), if it selects a large amount of data, iterates over it, pulling the data from (or through) the cache, modifying it, and putting it back out through the cache, there are a number of variables:

1. How is the data being selected? (Database or Cluster.) If Cluster, then the selection of the data scales very well and does not bottleneck on the database; further, the data is already in the "object format" that the application is expecting, so no O/R work has to be performed.

2. How is the data being retrieved? If the keys were selected from a database, then the values will be "getted" from the distributed cache; this can hit the worst case scenario that we discussed above. OTOH, if the data is in the cluster, as per #1 immediately above, then the result of the selection is an entry set that has all the data in it, in the "object format" already, and the major cost of this retrieval from the cluster is the deserialization of the objects off of the wire if they aren't already near-cached.

3. How is the data being put back? If a large number of objects are being modified and put back into the database, the write-behind functionality will make a staggering difference. For example, in one Oracle database application that uses Coherence, the database reads (single row) were taking over 200ms and writes (single row) over 800ms (large chunks of data), so the I/O per operation was over 1000ms; with read-through/write-behind caching, that dropped to under 40ms (over 25x improvement.) The database load obviously dropped significantly because (with their access pattern) the same data was basically never read more than once or twice a day from the database, and the writes were coalesced, dropping the database load by well over 90% in our tests.

Also, I would like to add that most invalidation-based caches will still put the object into cache on the updating node and invalidate all other nodes. Following "Get" operations on other nodes may require a single trip to database (just once) to get the invalidated data, which maybe arguably shorter or longer in time than getting data from the neighboring node since most of the modern databases cache query results anyway.

As I mentioned, caching is just a tool in the application developer's toolbox, and there are good examples of when not to use it. Using either the Coherence replicated or distributed+near cache features, most applications will see 0ms access to their data. In a high-scale application, the database is almost always (1) the bottleneck and (2) the most expensive part of the infrastructure to scale up, so offloading the database is a Very Good Thing for both performance and cost.

We don't solve every problem with caching, and in the caching field there are features that we don't provide yet, but we are working to make sure that our software handles as many possible scenarios as is reasonably possible, and does so very reliably and efficiently. If you have suggestions for the product, or specific needs that Coherence does not currently handle, please drop me an email (cpurdy at tangosol dot com). If I have glossed over some of the questions you were trying to get answered here, let me know and I will get an architect to respond.

I have not had a chance to look at Tangosol Coherence 2.1 yet, but I must say that I am greatly impressed by Cameron's posts and the professionalism he has shown. If only all the forum topics on TheServerSide were carried out in such an illuminating fashion...

TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations technology projects - with its network of technology-specific websites, events and online magazines.