Friday, August 17, 2012

Hazelcast offers many features to developers working with systems of all shapes and sizes. Hazelcast's implementations of distributed Queue and Map offer possibilities of running Web applications while operational systems are down for maintenance or unavailable. Here is an approach to solving such a problem with Hazelcast.

Let's start with our business model. We sell widgets online and through our call center. We also service order and customer related issues in our call center. Every evening, our widget product database is inaccessible because of some maintenance tasks. Here is one place where Hazelcast fits into the picture.

Before our maintenance window, we preload the widget product data into a Hazelcast distributed Map named "widgets". Each map entry has a key, which is our unique widget id, and a value, consisting of a Widget object. Now, our Web applications can access product data from our Hazelcast distributed Map "widgets", rather than relying on our product database.

When an order is created, Hazelcast's distributed Queue comes into play. The Web applications can submit an order and it will be passed into our "orders" distributed Queue. Here we can keep our orders ready and waiting until our database comes back online. We configure our Queue to be backed by a Map so our "orders" elements, consisting of Order objects, can be persisted in our intermediary storage before being processed when our target datastore comes off of maintenance mode.

A big part of this solution is to have a great services or middleware layer that can take on the burden of working with our distributed Maps and Queues. This will create a good abstraction of our backend datastore and distributed data from our Web applications. As a service oriented architecture is a common approach to abstracting a backend from Web applications, this solution does introduce some complexity into the services or middleware architecture.

Here is a high level diagram of the setup:

With a mature services layer, our call center and online applications can create orders, search through our widgets that exist in our "widgets" distributed Map and even verify and validate orders that are currently only in our distributed Queue "orders" or our Map that is backing our Queue. This type of solution does take some up front planning, but 24/7 business comes at a cost.

Tuesday, December 07, 2010

I have been using Hazelcast on a few small projects and I have just recently started using the Hazelcast Monitoring Tool. The Hazelcast Monitoring Tool is simply awesome. Just deploy the hazelcast-monitor-x.x.war file on Tomcat and add your Hazelcast cluster to monitor. Now you have a simple, Web based view of your running Hazelcast cluster.

Thursday, June 10, 2010

When you decide to incorporate a distributed data grid as part of your application architecture, a product's scalability, reliability, cost and performance are key considerations that will help you make your decision. Another key consideration will be the accessibility of the data. One nice feature of Hazelcast that I have been working with lately is distributed queries. In simple terms, distributed queries provide an API and syntax that allow a developer to query for entries that exist in a Hazelcast distributed map. Let's look at a very simple example.

In the demo project (link at the bottom) I have one object, a test case and the Hazelcast 1.8.4 jar file as a project dependency. Below is the class that will be put into a distributed map, ReportData. Once we have a distributed map that is full of ReportData entries, we can use Hazelcast's distributed query to find our ReportData entries.

Nothing too complex in the code above. It is just an object that implements Serializable and that contains a few different types (String, Boolean and Date) of attributes. This class will work nicely to help demonstrate Hazelcast's distributed query API and syntax. I omitted the getters and setters for brevity.

In the test code, I created ~50,000 ReportData objects using a for loop and put them into the "ReportData" distributed map. I used the index, 0..50,000, for the ReportData's id and the reportName is set to "Report " + index. I did a few other things, so we could have a few different dates represented in our map's entries. Check out the demo project for more detail.

Below, we have a case where we are building the predicate programmatically using the EntryObject to fetch all ReportData where the id is greater than 49900 and the endDate attribute of ReportData is between two dates, startDate and endDate. I included the code below to show how I am creating a few dates to use in the predicate that eventually gets passed into the map.values(predicate) method.

Getting data from your Hazelcast distributed map using the distributed query API and query syntax is pretty straight forward. Most of these queries ran for about 500 milliseconds to 2 seconds in my IDE. The power and performance comes from the ability to query objects or map entries that are in memory rather than always relying on a round trip to your RDBMS. Distributed queries are an important feature that make Hazelcast a great tool that can help offset the workload of your RDBMS. With Hazelcast and a good knowledge of your enterprise data, you can implement a simple and effective solution that will easily scale to as many Hazelcast nodes your hardware can support. The demo project can be downloaded here. For more information, check out Hazelcast's website or visit the project's home at Google Code.

Monday, March 29, 2010

I have been working with Java and related technologies at multiple companies since 2004. Most of the major business problems that I have encountered revolve around working with relatively small data objects and relatively small data stores (less than 50GB). One commonality in the development environment at each of these companies, other than Java, has been some form of legacy data store. In most cases, the legacy data store was not originally designed to support all of the various applications that are now dependent on the legacy system. In some cases, performance issues would arise that were most likely due to over utilization.

One approach to help alleviate utilization issues on legacy resources is with data caching. With data caching we can utilize available memory to keep our data objects closer to our running application. We can take advantage of technologies like Hazelcast, a data distribution platform for Java, to provide support for distributed data caching. In particular, this example focuses on Hazelcast's distributed Map to manage our in-memory caching. Because Hazelcast is easily integrated with most Web applications, include the hazelcast jar and xml file, the overhead is minimal. When we take advantage of Aspect Oriented Programming(AOP), with the help of Spring and AspectJ, we can leave our current implemented code in place and implement our distributed caching strategy with minimal code changes.

Let's look at a simple example where we are loading and saving objects in a simple Data Access Object (DAO). Below, PersistentObject, is the persistent data object we are going to use in this example. Note that this object implements Serializable because it is required if we want to put this object into Hazelcast's distributed Map (it is also a good idea for applications that utilize session replication).

Here is the implementation of the DataAccesObject interface. Again, really simple and in fact, I left out the meat of the implementation for brevity, but it will work for this example. Each of these methods would usually have some JDBC code or ORM related code. The key here is that our DAO code will not change when we re-factor the code to utilize the distributed data cache because it will be implemented with AspectJ and Spring.

public class DataAccessObjectImpl implements DataAccessObject {
private static Log log = LogFactory.getLog(DataAccessObjectImpl.class);
@Override
public PersistentObject fetch(Long id) {
log.info("***** Fetch from data store!");
// do some work to get a PersistentObject from data store
return new PersistentObject();
}
@Override
public PersistentObject save(PersistentObject persistentObject) {
log.info("***** Save to the data store!");
// do some work to save a PersistentObject to the data store
return persistentObject;
}
}

The method below, "getFromHazelcast", exists in the DataAccessAspect class. It is an "Around" aspect that gets executed when any method "fetch" is called. The purpose of this aspect and pointcut is to allow us to intercept the call to the "fetch" method in the DAO and possibly reduce "read" calls to our data store. In this method, we can get the Long "id" argument from the called "fetch" method, get our distributed Map from Hazelcast and try to return a PersistentObject from the Hazelcast distributed Map, "persistentObjects". If the object is not found in the distributed Map, we will let the "fetch" method handle the work as originally designed.

The method below, "putIntoHazelcast", also exists in the DataAccessAspect class. It is an "AfterReturning" aspect that gets executed when any method "save" returns. As each PersistentObject is persisted to the data store in the "save" method, the "putIntoHazelcast" method will insert or update the PersistentObject in the distributed Map "persistentObjects". This way we have our most recent PersistentObject versions available in the distributed Map. If we just keep inserting/updating all PersistentObject's into the distributed Map, we would have to eventually look into our distributed Map's eviction policy to keep more relavent application data in our cache, unless, we have excess or abundant memory.

This is a simple example of how Spring, AspectJ and Hazelcast can work together to help reduce "read" calls to a data store. Imagine reducing one application's "read" executions against a legacy data store while improving read performance metrics. This example doesn't really answer all questions and concerns that will arise when implementing and utilizing a Hazelcast distributed data cache with Spring and AspectJ, but I think it shows that these technologies can help lower resource utilization and increase performance. Here is a link to the demo project.

Thursday, December 10, 2009

Data distribution is a pretty cool topic. Recently, I have been working with Hazelcast, which is an open source clustering and data distribution platform for Java. Well, I really like what I have seen so far and I figured why not have some fun with Hazelcast and Groovy.

I started by adding the Hazelcast 1.7.1 jar to $GROOVY_HOME/lib. Hazelcast, at an introductory level, provides distributed implementations of java.util { Queue, List, Set, Map }. I can run a Groovy script on multiple JVM's and I can share a Map of customers on each instance. For example:

def customersMap = Hazelcast.getMap("customers")

Now, I have an instance of Map and I can add values using Hazelcast's distributed id generator:

Now, the customers Map has a few customers in it and our Groovy scripts are still running. Let's add an com.hazelcast.core.EntryListener to the customers Map so we can detect a com.hazelcast.core.EntryEvent. Here is HazelcastGroovyness.groovy:

In the above code, we define listener which implements com.hazelcast.core.EntryListener. I now start up HazelcastGroovyness.groovy at a new command prompt(s):

> groovy HazelcastGroovyness.groovy

We can go back to our original HazelcastGroovynessAdd.groovy script and open (re-open) a few more command prompts and run the script that adds customers to the Map. Now in each running instance of HazelcastGroovyness.groovy we see something like:

Hazelcast is very cool, easy to use technology that provides distributed data with a few lines of code, especially with Groovy. More information can be found at Hazelcast's website and at the project site at Google Code.