Tuesday, May 29, 2012

RHQ provides a rich feature set in terms of its monitoring capabilities. In addition to collecting and storing metric data, RHQ automatically generates baselines, allows you to view graphs of data points over different time intervals, and gives you the ability to alert on metric data. RHQ uses a single database for storing all of its data. This includes everything from the inventory, to plugin meta data, to metric data. This presents an architectural challenge for the measurement subsystem particularly in terms of scale. As the number of managed resources grows and the volume of metrics being collected increases, database performance starts to degrade. Despite various optimizations that have been made, the database remains a performance bottleneck. The reality is that the relational database simply is not the best tool for write-intensive applications like time-series data.

This architectural challenge has in large part motivated me to start learning about Cassandra. There are plenty of other, non-relational database systems that I think could address the performance problems with our measurement subsystem. There are a couple things about Cassandra that provided enough intrigue that made me decide to invest time learning about it.

The first point of intrigue is that Cassandra is a distributed, peer-to-peer system with no single point of failure. Any node in the cluster can serve read and write requests. Nodes can be added to and removed from the cluster at any point making it easier to meet demands around scalability. This design is largely inspired by Amazon's Dyanmo.

The second point of intrigue for me is that running a node involves running only a single Java process. For the purposes of RHQ and JBoss Operations Network (JON), this is much more important to me than the first point about single points of failure. The fewer the moving parts, the better. It simplifies management which will goes along way towards the goal of having a self-contained solution.

Cassandra could be a great fit for RHQ, and the time I have spent thus far learning it is definitely time well spent. There are some learning curves and hurdles one has to overcome though. I find the project documentation to be lacking. For example, it took some time to wrap my head around super columns. It was only after I started understanding super columns to the point where I could begin thinking about how to leverage them with RHQ's data model that I then discovered that composite columns should be favored over super columns. Apparently composite columns do not have the performance and memory overhead inherent to super columns. And composite columns allow for an arbitrary level of nesting whereas super columns do not. Fortunately DataStax's docs help fill in a lot of the gaps.

One thing that was somewhat counter-intuitive initially is how the sorting works. With a relational database, you first define the schema, and then queries are defined later on. Sorting is done on column values and is specified at query time. With Cassandra, sorting is based on column names and is specified at the time of schema creation. This might seem really strange if you are thinking in terms of a relational database, but Cassandra is a distributed key-value store. If you think about it more along the lines of say, java.util.TreeMap, then it makes a lot more sense. With a TreeMap, sorting is done on keys. When I want to use a TreeMap or another ordered collection, I have to decide in advance how the elements of the collection should be ordered. This aspect of Cassandra is a good thing. It contributes to the high performance read/writes for which Cassandra is known. It also lends itself very well to working with time-series data.

DataStax posted a great blog the other day about how they use Cassandra as a metrics database. The algorithm described sounds similar to what we do in RHQ; however, there are a few differences (aside from the obvious one of using different database systems). One difference is in bucket sizes. They use bucket sizes of one minute, five minutes, two hours, and twenty-four hours. RHQ uses bucket sizes of one hour, six hours, and twenty-four hours. I will briefly explain what this means. RHQ writes raw data points into a set of round-robin tables. Every hour a job runs to perform aggregation. The latest hour of data points is aggregated into the one hour table or bucket. RHQ calculates the max, min, and average for each metric collection schedule. When the one hour table has six hours worth of data, it is aggregated and written into the six hour table.

Disk space is cheap, but it is not infinite. There needs to be a purge mechanism in place to prevent unbounded growth. For RHQ, the hourly job that does the aggregation also handles the purging. Data in the six hour bucket for instance, is kept for 31 days. With Cassandra, DataStax simply relies on Cassandra's built-in TTL (time to live) feature. When data is written into a column, the TTL is set on it so that it will expire after the specified duration.

So far it has been a good learning experience. Cassandra is clearly an excellent fit for storing RHQ's metric data, but I am starting to how it could also be a good fit for other parts of the data model as well.

Post a Comment

3 comments:

There is something that I have to ask.I had a similar problem to solve and the team was considering use Infinispan to manage thous data. I ended up not using it but there is any reason for not choosing Infinispan as a solution?

@Ricardo, That is a really good question. I should point out that this work, at least for now, is a passion project of mine. No decision has been made about what solution we will use for a metrics database. I can tell you though that we have been thinking about Infinispan and will continue to do so. It definitely ticks a lot of boxes; however, I do have some reservations. The best supported persistent store is a relational database. The docs go so far as to discourage using the file system persistent store for production use. That means more moving parts that RHQ has to manage.