Thursday, June 10, 2010

When you decide to incorporate a distributed data grid as part of your application architecture, a product's scalability, reliability, cost and performance are key considerations that will help you make your decision. Another key consideration will be the accessibility of the data. One nice feature of Hazelcast that I have been working with lately is distributed queries. In simple terms, distributed queries provide an API and syntax that allow a developer to query for entries that exist in a Hazelcast distributed map. Let's look at a very simple example.

In the demo project (link at the bottom) I have one object, a test case and the Hazelcast 1.8.4 jar file as a project dependency. Below is the class that will be put into a distributed map, ReportData. Once we have a distributed map that is full of ReportData entries, we can use Hazelcast's distributed query to find our ReportData entries.

Nothing too complex in the code above. It is just an object that implements Serializable and that contains a few different types (String, Boolean and Date) of attributes. This class will work nicely to help demonstrate Hazelcast's distributed query API and syntax. I omitted the getters and setters for brevity.

In the test code, I created ~50,000 ReportData objects using a for loop and put them into the "ReportData" distributed map. I used the index, 0..50,000, for the ReportData's id and the reportName is set to "Report " + index. I did a few other things, so we could have a few different dates represented in our map's entries. Check out the demo project for more detail.

Below, we have a case where we are building the predicate programmatically using the EntryObject to fetch all ReportData where the id is greater than 49900 and the endDate attribute of ReportData is between two dates, startDate and endDate. I included the code below to show how I am creating a few dates to use in the predicate that eventually gets passed into the map.values(predicate) method.

Getting data from your Hazelcast distributed map using the distributed query API and query syntax is pretty straight forward. Most of these queries ran for about 500 milliseconds to 2 seconds in my IDE. The power and performance comes from the ability to query objects or map entries that are in memory rather than always relying on a round trip to your RDBMS. Distributed queries are an important feature that make Hazelcast a great tool that can help offset the workload of your RDBMS. With Hazelcast and a good knowledge of your enterprise data, you can implement a simple and effective solution that will easily scale to as many Hazelcast nodes your hardware can support. The demo project can be downloaded here. For more information, check out Hazelcast's website or visit the project's home at Google Code.