One To Watch: Continuuity, big data for developers

We speak to the ex-Yahoo! and Facebook employees aiming to bring the power of Hadoop to the masses.

Hadoop’s evolution from a humble framework to (in the
words of creator Doug Cutting) “the kernel of the mainstream
data processing system” continues apace. The latest development is
the emergence of so-called ‘Big Data-as-a-Service’ companies
promising to take the hassle out of Hadoop, such as plucky startup
Continuuity.

Having been founded around a year ago, the company launched at
Hadoop Strata in October on a wave of slightly apprehensive media
hype. Speaking to CEO Todd Papaioannou and CTO Jonathan Gray,
however, it’s clear that Continuuity is more than just hot
air.

For a start, Papaioannou and Gray are already influential pioneers
in the data space. Papaioannou was previously VP and chief cloud
architect at Yahoo, before leaving to work at Battery
Ventures for three months (“It was fun to pretend to be a VC,”
he says of the stint) and Gray worked with HBase at Facebook – a
company which, it’s fair to say, has fairly big data
problems.

However, they both left these comfortable jobs to found a new
company aiming at a gap in the burgeoning ‘big data’ market.
“Really, the first part of this year was heads-down, build an MVP
and get some customers on it,” says Papaioannou. The next step was
raising $10m from the likes of Andreessen Horowitz and
Papaioannou’s ex-employers Battery Ventures to help expand the
business (and perhaps obtain legitimacy in the eyes of the
TechCrunch crowd).

The ambitious business plan is to take advantage of the existing
Hadoop ecosystem to produce a stable platform for application
development, which they call AppFabric. “There’s a ton of
infrastructure software out there to do HBase, MapReduce, Yarn,
Hives,” says Papaioannou. “All of that stuff is low-level, and we
think of it as infrastructure, as a kind of data kernel.”

With AppFabric, this complex infrastructure is abstracted away.
“The regular Java developer who is writing J2EE for the past decade
doesn’t want to have to deal with kernel-level API, so they don’t
have to worry about, ‘how do I store data?’ and worry about method
calls that are like ‘bytes[] byte’ and things like that,” he
asserts. “They want higher-level APIs.”

Papaioannou rejects the idea that understanding the underlying
infrastructure is important, drawing parallels with higher-level
programming languages. “If you write computer programs, do you
really understand the message bus and the CPU architecture that’s
hidden inside your motherboard? Or do you really think about
objects and how they interrelate and use abstracts to abstract away
some of the structure?”

Similarly, he adds, cloud providers like Heroku and AWS abstract
away server infrastructure. “When you go to Amazon you don’t know
how many disks you have, or what the networking topology is in the
SNAP,” he says.

A screenshot from the
Continuuity dashboard.

These custom APIs might worry those allergic to vendor lock-in,
but Papaioannou assures that the company is “very sensitive” to the
issue, and will be releasing the API source code as well as some of
the technology powering Continuuity. In the near future,
Papaioannou says, the company will be releasing Weave, a management
layer on top of Yarn, and a runtime container called BigFlow which
is “similar to Storm or S4”.

The company have already released a beta SDK with an accompanying
Eclipse plugin. Applications can be tested on local single-node
instance of AppFabric before being deployed to Continuuity-hosted
or on-premise private clouds. This will later be followed by a
self-service public cloud, though pricing details are still thin on
the ground (“we know what we’re going to charge, we’re just not
telling anybody!” laughs Papaioannou).

With Hadoop itself being written in Java, it’s currently the first
language AppFabric supports, but more languages are likely to
follow. “We think the first wave of developers are really the
existing Java developers,” says Gray, but “the first very natural
step for us is supporting additional JVM-based languages”.

“All of the APIs to the platform, whether that’s data ingestion,
whether that’s querying, are all exposed via REST,” Papaioannou
adds. “So you can actually integrate your application with pretty
much anything that talks HTTP. So right now the deployable code is
Java, just like it was in WebLogic and WebSphere, but you can
integrate with any language out there that can make a REST
call.”

The pair admit that while ‘big data’ is a useful marketing term,
it’s thrown about with abandon. “The dirty secret of big data is,”
says Gray, “99% of the use cases out there are really medium data.
They’re not terabytes – people have gigabytes.”

“I’ve been doing this for over a decade now,” adds Papaioannou,
“and actually I think the fundamental change is we’re moving from a
data industry that requires schema at write time to a data industry
that requires schema at read time.

“And you can abstract away, take away all of the marketing
hyperbole, BS and all the rest of it – that’s really the
fundamental shift that’s going on here.”