Five Minute Interview – Arise Virtual Solutions

This article is one in a series of quick-hit interviews with companies using Apache Cassandra and DataStax Enterprise for key parts of their business. For this interview, we spoke with Robert McFrazier who is the director of development at Arise Virtual Solutions.

DataStax: What service does Arise Virtual Solutions provide its customers?

Robert: Arise Virtual Solutions is a provider of a virtual workforce. We not only supply the framework for someone to be able to work from home, but we also help conduct the sourcing. We find the agents, train or certify them for our clients. Primarily it’s a more of a call center, but we’re a call center providing augmentation to our clients. We are branching out into other areas and other industries that allow for a virtualized workforce and currently have a workforce of 20,000.

DataStax: What kind of technology in general makes up your infrastructure?

Robert: Primarily it is the connectivity between the portal or our website, our CSP, that’s what we call our workforce agents, to the client’s phone switch. Those are the two major points – the infrastructure on the portal and the infrastructure around the telephony.

Right now we do have our own private cloud and we are primarily a .NET shop. We have a lot of the Microsoft infrastructure in place. We’re adding more open source solutions into our stack as well. A lot of our Linux and Java applications and PHP applications fall into that open source stack model.

DataStax: So, a private cloud then?

Robert: Yes, we do a lot in the private cloud. A lot of it has to do with security and selling the aspect of security; that’s why we chose the private cloud. Some aspects of our infrastructure are on the Amazon cloud, but not the part that deals with the CSPs or the telephony that they access.

DataStax: Do you run across multiple data centers or just one data center for your private cloud?

Robert: We do have multiple data centers and are actively bringing them into sync so we can get as close to an active-active as possible.

DataStax: What were some of the business or technical challenges that caused you to seek out NoSQL technology versus what you used in the past? And what drove you to Cassandra and DataStax Enterprise?

Robert: Previously, we utilized Microsoft SQL server. We were using that as the only tool in our toolbox and discovered that there were some items we just didn’t need to store relationally. To alleviate the database load in our relational store and add another tool in our toolbox, we were thinking about how to engineer new projects and new features using new technology.

With Cassandra we now have the ability to think, “Okay, given what we need to persist, does it actually need to stay in SQL Server or does this need to be in a NoSQL store?” It gave us that choice which allowed us to build features using the right tool, not just having a relational hammer where then everything is a relational nail.

The second driver was the ability to off-load data from our SQL stores. That was big. We wanted to be able to cache data from long-running or expensive queries, put it into Cassandra where our applications can pull off Cassandra and not pull off our relational store, thus alleviating the load. We can run many jobs and have it hit the database once an hour, once a day, once every five minutes, and store that in Cassandra. Then our applications point to Cassandra to pull that out. That was a major piece.

The final driver involved being able to capture information that we didn’t possess before, capture events coming off the website or even multiple events per a click. You wouldn’t even think about storing that in a relational store. We can easily do that in Cassandra. We can store a lot more information, which gives us access to a lot more information for tools, for reports, to be able to do things. Just having the throughput to be able to store information was really big as well.

DataStax: When switching from SQL Server, did you evaluate any other NoSQL vendors, like MongoDB or HBase?

Robert: Our final two options were MongoDB and Cassandra. We had already knocked Redis off the list. We ultimately chose Cassandra because it implemented a pretty big set of the Hadoop sub-projects, specifically Hive and Pig and Mahout. Those three were really big because although we’re not using them right now, we are planning to use them once we’re able to analyze some of the data we’re capturing, or even some of the data that we had never captured before. Those three were big.

The second one was the ease of administration. That was big. We were not going to add head count specifically to administer the Cassandra cluster so having an easy-to-administer solution was very important. We were very attracted to not worrying about master/slave with MongoDB, and with Cassandra, we have nodes that are peer-to-peer to each other. In fact, the update from DSE 3.0 to 3.1 was significant because the composite columns feature let us implement that update with zero downtime.

DataStax: Are you using any of the new security features in DataStax Enterprise?

Robert: We are working on implementing those. We really value the audit features and ability to see what’s going on and react to it. We’re able to capture and process a lot more information and conduct audit security checks to make sure that what our workforce agents are doing fall in line with our security policies.

We recently conducted a small project where we used Cassandra to put a month’s worth of log files into Cassandra in a column family, and then used Pig for the ETL and Hive for the analysis part to perform some pretty significant analysis. Actually, from our little project we delivered relevant information to our security department.

DataStax: From a customer-facing standpoint, how does Cassandra and/or DataStax Enterprise touch your customer? Are there external things that they see via portals or UIs or things like that, or is it more backend stuff?

Robert: It’s more backend stuff. Basically, we’re in the process of updating our portal. The portal is specifically writing to and reading from Cassandra, so we’re able to basically now log a lot more information about what our CSPs are doing. That goes into Cassandra. Also, a lot of the information that we’re displaying would just be an enormous SQL query to have to run on-demand. Having a job push a lot of that information into Cassandra, that keeps our relational store from being overloaded and improves our response time for our website.

DataStax: What do your clusters look like?

Robert: Right now we have a mixture of Cassandra nodes along with Hadoop nodes, where Hive and Pig all run.

Right now the huge benefit is being able to capture and not have to worry about overloading the store that you’re capturing your information in. We’re able to capture and keep, and not have to worry about a lot of information right now. We have another product coming online within the next 30 days, which is our chat platform. It uses the same technology as Google or Facebook chat, the XMPP protocol. We’re going to be storing the chat messages inside Cassandra so that we can do analysis on that as well.

DataStax: Have you found DataStax Enterprise to be cost effective? Have you saved money over the things you were using prior?

Robert: I believe it is. For the features that we have and for the ease of administration, which do come into play in the cost, I believe that we’re well ahead of the curve for our relational stores when it comes to those two things. The cluster itself has been extremely simple to administer, which means less time for our operations group dealing with Cassandra cluster, which in turn equals savings.

DataStax: Let’s say that somebody is brand new to either NoSQL or Cassandra, and they come to you and they’re just looking for some advice in terms of how best to begin, what to avoid, best practices, anything like that. What kind of advice would you give them?

Robert: First, realize that you have a new tool in your toolbox and give yourself the ability to evaluate the best use. I think a lot of people come in and think it’s a wholesale cutover, just get rid of that and use this, but I don’t think they should look at it like that. Evaluate what is the best tool for the job, and then use it.

The second tip is, don’t allow previously learned behaviors to affect how you use your NoSQL solution. A good example of that is your typical database developer, or DBA. When you’re doing things inside Cassandra sometimes it is literally the exact opposite of what you’re doing in a relational store to how you would operate in an NoSQL store. Understand that there are some hard rules that people have dealt with for a long time, that it might take them a little to get used to operating with. Nothing difficult by any means, but just different. It’s worthy to note that, that you’ll be doing some things different.

The third tip is, capture data. Often this is unfamiliar for folks who haven’t previously operated a store that is built for high capacity data capture like Cassandra. So feel free to build into the applications to the ability to log, and grab information that you just might not have grabbed before. Get used to the idea that you can capture as much information as you need with Cassandra.