The Five Minute Interview – WSO2

This article is one in a series of quick-hit interviews with companies using Apache Cassandra and/or DataStax Enterprise for key parts of their business. For this interview, we spoke with Paul Fremantle who is the CTO for WSO2 and also VP of Apache Synapse.

DataStax: Paul, thanks for taking the time to chat with us today. Tell us quickly what WSO2 is all about.

Paul: WSO2 has been around for over seven years now, and we’re focused on providing global enterprise middleware that helps our customers build out their large scale enterprise systems both on-premise and in the cloud.

DataStax: Can you give us some background on how you came to use Cassandra and what you’re using it for today?

Paul: Our WSO2 Business Activity Monitor product was using a traditional SQL model at the backend, and we were experiencing significant scalability issues as well as concerns about the performance of writes.

This was a very large pain point as this product takes in a large volume of data, which needs to be written very quickly so that it can be turned around fast for analysis. Much of the data is time series in nature and is concerned with logging and analyzing all kinds of actions and activities that occur in large cloud-based clusters of servers. Just writing the data as fast as we needed to was a big problem.

We took notice of Cassandra some time back and saw that it was a very visible and active project at Apache, and so decided to see if it could do better than the relational databases we were using.

DataStax: What kind of performance gains have you seen since switching to Cassandra?

Paul: From a transactions-per-second perspective, our benchmarks are showing a full 10x increase in write performance by moving to a combination of Cassandra and Apache Thrift for network communications.

DataStax: What else motivated you to use Cassandra?

Paul: Outside of pure performance, the other things that we saw as major benefits were the scalable elasticity and simplicity that Cassandra offers. Once you have Cassandra up and running, it takes just minutes to increase your scale by adding new machines to a cluster. You just can’t do the same thing with MySQL or other databases; setting up new machines, getting the replication working right, and other tasks takes much longer and is more error prone.

Something else was storing unstructured data. One of our capabilities is to allow users to extend our product to store their own unstructured or semi-structured data. Cassandra does it so well with no hassle at all whereas MySQL just has a mismatch on that front.

DataStax: Did you look at other options before deciding on Cassandra?

Paul: Yes, we did do a fair amount of research and looked at other solutions. One deciding factor for us was we are Java-based and so is Cassandra, so our products and it just work well together from that aspect. We’ve completely integrated Cassandra into our OSGi-based runtime – WSO2 Carbon.

The peer-to-peer based scalability Cassandra offers played a large part in our decision as well.

DataStax: What about high availability and multiple data center functionality – are those things that factor into how your products use Cassandra?

Paul: Absolutely. In fact, we found Cassandra to be the easiest database we’ve used when it comes to migrating data between different data centers.

DataStax: You mentioned earlier the need for analyzing data that you capture – how is that done?

Paul: It’s a combination of Cassandra and the Hadoop/Hive integration that Cassandra supports.

DataStax: What advice would you give people who are just starting to use Cassandra?

Paul: There’s not so much of a specific Cassandra learning curve that you have to deal with. Instead, there’s the mind shift that a person needs to do from SQL to NoSQL in general. It is worth the upfront investment of time in really making sure that you understand how to model data in a NoSQL way and understanding the shift from traditional SQL. Don’t just jump in and start coding as if it was SQL!