Big Data Gets A Closer Look From Riak

Riak 1.0, a NoSQL system with Cassandra characteristics but a lighter, slimmer profile, was announced Tuesday. It can take huge slices of unstructured data and reduce them down into much more manageable, bite-sized chunks.

Cassandra has proven adept at handling really heavyweight jobs, such as serving as the datastore for Facebook users. Riak fits lighter weight, but highly interactive, roles. It's in use at Comcast, Yammer, ClipBoard.com, and Denmark's health system.

Riak has also gained favor, even as a latecomer in the field of NoSQL systems, in part because it consumes fewer resources in getting its finer-grained results. Its queries can have more of the specificity of SQL queries without also picking up the performance drawbacks of relational database, said Tony Falco, COO of Basho, the firm that produces Riak.

"It's very efficient at capturing the data of a user session. Comcast uses Riak for managing content streaming for users its Xfinity TV service," said Falco in an interview.

Mozilla considered HBase, Cassandra, and Riak, and ended up selecting Riak for its Test Labs Pilot project analyzing user data obtained through use of its browser. Riak was selected to capture sessions of 10 million users over a two-day period, amounting to 1.2 TB of data, said Daniel Einspanjer in a May 10 blog as lead developer for the Mozilla metrics team. Riak required less manpower as a well-tested, REST-base system, he concluded, and it was "much lighter on memory requirements."

Like HBase and Cassandra, Riak, is a key value store system that can collect unstructured data and store it as objects in rows that can then be queried. It's also highly scalable, able to distribute itself over a server cluster and add new servers as needed, while maintaining its own high availability.

The 1.0 version includes a new feature, secondary indices, which allows a Riak user to retrieve data through the use of compound criteria. For example, customers between the ages of 17 and 22 in certain states or regions of the country can be identified from the system, instead of just all the customers of a given state or all customers within the age range.

Another Riak feature is Riak Pipe, an implementation of the MapReduce function that distributes a task onto cluster nodes in a way that is most efficient for handling the relevant data. Riak Pipe, in effect, sends a Riak query to a node close to the data for its most efficient execution on the cluster.

Falco said NoSQL systems are good for collecting masses of data and then making chunks of that data available for hundreds or thousands of users at a time, often those visiting a website. Even so, business users often want to be able to submit queries and retrieve data that is more specific than a named key value--an identifier for a particular class of data, such as "customers"--allows. Much valuable information is buried in captured website user sessions, for example. "Once all that data is in there, you want to be able to get it out," Falco noted.

Riak is written in Erlang, a language that gives a system built-in support for distribution across a server cluster, fault tolerance, and an ability to absorb new hardware being added to the cluster without disrupting operations.

Riak is available under an Apache 2 license as open source code or in Basho's commercially supported version. Average deal size for small and midsize businesses runs about $35,000, Falco said. Fortune 1000 firms pay $3,995 a node, he added. A startup firm version is available for $20,000.

Basho is a little-known NoSQL firm that got a high profile CEO, Donald Rippert, the former CTO of Accenture, in June. Basho was formed in San Francisco in 2008 with a senior management team, including Falco, of veterans from Akamai Technologies, the content distribution network. Falco is the former VP of product management at Akamai, and prior to that, Akamai VP of technical services.

The firm is backed by the venture capital firms Georgetown Partners and Trifork, which provided a second round of $7.5 million in February.

ITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.