Introduction
Coherence is very efficient in running queries in parallel across your cluster. For a “Partitioned” or “Distributed” cache, the data is stored evenly across all storage-enabled members. When you run a query, the request is sent to all nodes that hold data, executed in parallel on those nodes, and the results returned and aggregated. For example the following statement will get the average balance for all customers in the “South” region:

Data is stored serialized in the storage-enabled members, so any non-key based access that require predicates or access to data means the data must be de-serlialized. In the example above the “getRegion” for the predicate as well as the “getBalance” for the DoubleAverage will cause de-serialization to occur on the storage-enabled members and increase query time.

Indexes are useful in this case. We can create and index on a “getter” of an object, such as “getRegion”, which will tell Coherence to store de-serliazed values in preparation for use by a query. This does however use more heap space and requires more time to create/update the index on insertion, update or deletion of cache entries. Its really a balancing act ensuring you don’t add too many indexes, versus getting query times down.

(Note: By using Portable Objects (see this tutorial for a good explanation and demonstration.), you can limit the indexes you may require by using PofExtractors for querying. This is outside the scope of this post, but see here for more information.)

Understanding the MBeans

So how do we identify queries that are taking a long time in our cluster and may be candidates for indexes? Within the StorageManagerMBean (see here), there are a number of attributes that help us identify this. In particular the following are of interest:

MaxQueryDescription – A description of the query with the longest duration that exceeds the MaxQueryThresholdMillis attribute since the statistics were last reset.

MaxQueryDurationMillis – The number of milliseconds of the longest running query since the statistics were last reset. ie. For the above Query.

MaxQueryThresholdMillis – A threshold, in milliseconds, for recording queries. The longest query that executes longer than this threshold is reported by MaxQueryDescription attribute. The default value is 30 ms.

There are also NonOptimized* and Optimized* metrics for average, count and total time for queries.

Running the Example

To test this we have a Customer POJO with the following attributes. It also implements Serializable of course and with appropriate setters and getters.

Next we startup jconsole and attach to one of the running DefaultCacheServer processes. Go to MBeans tab, and expand the Coherence MBean and drill down so that you see the attributes of one of the storage-enabled members.

Get total balance...
Balance is 1498224640.0, time = 2359 ms
Get average balance for South region...
Average Balance is 4994.2, time = 1249 ms

Now these query times are still pretty good but we can improve them greatly.

When we go back to jconsole and refresh the statistics, we should see something similar to the following:

Non Optimized Query

This shows that a query using the filter EqualsFilter(.getRegion(), South)) has taken 1161ms to run. This is the execution time on this particular storage-enabled member. A similar statistic will be available on other members.

The reason that this query is taking this long is that every Customer object is deserialized so that the region can be checked and the balance then retrieved if the region is “South”. This is a lot of extra work that the cluster has to do.

Adding Indexes

Now that we have identified the longest running query, lets add some indexes to help us here:

Ensure that you go back to jconsole and reset the statistics by going to the Operations node and clicking on “resetStatistics”. This ensures we can pick up the new values. See below:

Reset Statistics

Once we re-run the queries, we get output similar to the following:

Get total balance...
Balance is 1498224640.0, time = 182 ms
Get average balance for South region...
Average Balance is 4994.2, time = 72 ms

We can see that the query time for the second query has now gone down to 72 ms from 1249 ms. The time for the first query has also reduced to 182 ms from 2359 ms. We can see the dramatic effect of adding the index on the query time.

If we go back to jconsole and refresh the statistics we will so that the first query is now the longest running query. This is ok as we have still reduced the time to get the
total balance from the Customers quite considerably.

Optimised Query

Conclusion
As mentioned above, POF helps us reduce the number of indexes required and you may not need to add indexes, but indexes may still be required depending upon the queries and their execution times. Again its a balancing act between having a sensible number of indexes and getting the query response times your are after.

It is also worth while looking at the “Query Explain Plan” feature, which allows you to understand the path a query took. See this Youtube Video for information on this feature.

All of these JMX statistics can be retrieved programatically, via JMX APIs and via Oracle Enterprise Manager, so this approach can be used on an automated basis to help identify potential query improvements.

Blogroll

Copyright (c) 2013 Tim Middleton and other contributors. All Rights Reserved. The views expressed in this blog are our own and do not necessarily reflect the views of Oracle Corporation. All content is provided on an 'as is' basis, without warranties or conditions of any kind, either express or implied, including, without limitation, any warranties or conditions of title, non-infringement, merchantability, or fitness for a particular purpose. You are solely responsible for determining the appropriateness of using or redistributing and assume any risks.