JAX Magazine: Hi Billy, thanks for taking the time
to speak to us. First, could you tell us a little bit about
DataStax?

Billy Bosworth: So, Datastax is the commercial company
behind the open source database Apache Cassandra. And the sweet
spot for Cassandra is powering real-time applications that generate
transactions at hyper velocity. And we deliver that power with an
architecture that is simple, flexible and most importantly
extremely performant once you hit big data scale.

The company was started back in April 2010, and co-founded by
the then and current Apache Cassandra chairman, Jonathan Ellis.

So the links run deep between Cassandra and
DataStax?

Very. And we are very committed to the success
of the community. In fact, I was just looking the other day at,
from a budget standpoint, people standpoint, how much we spend on
direct community activity, and it’s quite a lot, and that’s very
important to us. So we want to see a very thriving and growing
Cassandra community independent of whether or not those customers
ultimately become DataStax customers. We do a lot for the
community, for the community’s sake.

DataStax Enterprise and the free
DataStax Community are both separate to Cassandra. What’s superior
about them?

So DataStax Community is a way you can quickly
consume and get started with pure Apache Cassandra. You can think
of that analogous to Fedora from Red Hat. So, you have the
installers, you have a free version of our management tool,
OpsCentre, you have sample applications, all within a very nice,
ready-made bundle, ready for you to download and install. So
basically that is just our way of helping people in the community
who may not want to chase down all the individual bits from the
various Apache projects, and GitHub, and the connectors, and the
libraries and all that sort of thing. We want to make it just a
very easy way to digest and get started with the open-source
version.

Now, DataStax Enterprise is a little different. DataStax
Enterprise is definitely the choice when you are ready to take your
Apache Cassandra needs into a production environment. And
everything we do in DataStax Enterprise is gear it towards helping
that application team. Everything they need, including the
confidence, and security, and reliability of a company behind you,
but also to have all the functionality you need in a single
platform so that you can work with your data in the context of that
application without having to ETL or move it around.

The big feature of
DataStax 3.0 is security. Were Cassandra and DataStax insecure
before?

Uhh, yeah, pretty much all of these systems are
actually! [Laughs] These NoSQL databases are problematic in that
respect. It’s been definitely something that has been known, and
worked around, but yes – to give you a very short answer, the
ability to handle the type of security that you are accustomed to
in the relational world simply has not existed yet in the NoSQL
world. And so they can do it in different ways. Application guys
are smart, they try as much as they can to use the other techniques
and they’ll use security level at the application layer, which does
add some complexity to the process. So this will now give them that
same kind of trusted feel that they had with working with security
in their relational databases, they’ll start to now have those
capabilities directly inside of Cassandra.

Are those security improvements being
pushed upstream to the Cassandra source?

They are. We’ve done a couple of things. For the
Cassandra community, we have released three pretty major security
features, which is basically what everybody’s familiar with – I
wanna create a user, I’m gonna give that user a password, and then
they’re gonna have to log in to the database with that user ID and
password. Very well understood authentication model – it’s been
around forever in the relational world.

And that feature didn’t exist Cassandra
and DataStax up until now?

That is correct.

Is it implemented in other NoSQL
databases? MongoDB or CouchDB…?

I don’t want to speak on their behalf, I’m not positive of how
they do their implementation. I do not think so – I know that
nobody has, as we go through this list, nobody’s gonna have the
comprehensive solution that we have, but I’m actually not sure to
the details and I don’t want to speak out of school.

That seems like quite an obvious
oversight.

Well, it wasn’t so much an oversight as it was a
design challenge. When these systems were built, you have to
remember – the interesting thing about this NoSQL market is [in the
early days] we had people running Cassandra in production
environments, 0.4, 0.5 releases. That’s insane! The
traditional enterprise applications, you would just never
think about running a zero dot anything in production, right? Think
about it from an application development standpoint.

But the need was so great, the technology challenge was so
monumental, that they simply had to find a way to solve the
problems. And so I would say it’s not so much an oversight as what
you’re seeing now is a maturity. Now these things are finally
coming to fruition – we’ve always know we’ve needed it. We did an
article back in April of 2012, titled “Why
NoSQL Equals No Security”. And he said, almost the same way you
did – the intro was “it seems security is an afterthought at best
in the big data ecosystem”.

It really hasn’t been an afterthought, it’s just been, as I
said, a maturity thing. And now we’re at that stage where we are
ready to introduce that maturity both into the open source line,
and with some enterprise features into DataStax Enterprise.

So the second thing – going back to what we’re giving to the
community, the second thing is what’s called ‘internal object
permissions’. This means that when you create an object inside of
Cassandra, now you have the ability to take that internal user
authentication that you created in step one, and you can say “now I
want to give Elliot read permissions on this object”, or “I want to
give Billy read and write permissions on this object”. That’s now
fully available inside of Apache Cassandra.

And then the third one is also very important, and that’s
client-to-node encryption. The ability to encrypt that data on the
fly as it moves between the Cassandra node and the end application
point.

How would you describe the state of the database
market?

It’s been very exciting, I can tell you, coming from the
relational market for the last 20 years, this has been fascinating
and fun to watch this transition. I very much liken it – being an
old guy – to when I first came out of college in ’92, and I was
watching the revolution of the whole what we used to call “open
systems databases”, which was Oracle and DB2, and Sybase, and later
SQL Server. It’s really fun, it’s like watching that happen all
over again.

I think what’s happening now, in 2013 and I’d say the end of
2012, the biggest shift that I’m starting to see – that is a very
good thing – is people are realising that when they say
‘big data’, that is not a one-size catch-all bucket. There are
definitely different characteristics that different technologies
solve very well, and people are starting to understand those
nuances a little bit better, which is great. So, Hadoop’s been
around for quite a long time now actually, if you think about
Google releasing their white papers back in 2003. And I would say
now, the NoSQL movement is catching up to the mainstream mindset of
people, as they think about big data. And they’re starting to
rightfully now ask: “OK, wait a minute. Are you talking about big
data analytics, which would be the Hadoop data warehouse
world, or are you talking about big data transactions, which would
be like the more classic Oracle type of world.”

And that is an important distinction that’s finally starting to
catch hold, and a lot of us have been out there trying to educate
the market on that. So just understanding that nuance is a good
thing. It’s also a very crowded market, and getting more crowded,
and what I tell people is that as they look at those graphs and
charts that try and capture all the different players, one of the
things I’d say is, if you really want to understand how they’re
doing, go get ten documented use cases. I mean really documented, I
don’t mean top X this or top leading that, go find ten use cases
with companies and customers willing to talk in depth
about what they’re doing with that technology.

My personal take is, if you can’t find ten for a given
technology, skip it for now. Because I don’t know if it’s gonna
make it or not – there’s just too much noise out there in the
marketing side. What you really need to do is get under the covers
and figure out who is using this stuff.

And that’s why I’m so proud of us. We can show you dozens of
DataStax in-depth customers with names that everyone knows and
understands, people like Netflix and Adobe and eBay and healthcare
companies. And then we have hundreds more on the Cassandra side, if
you go over to our community site called Planet Cassandra. So
people right now, they’re getting so glassy-eyed over the
marketing, and that is a great way to cut through the marketing.
Get to the use cases.

[Laughs] Let’s talk about use cases and
customers and see how people are using this in production. That’s
how I’d respond. I find that stuff… interesting.

It was a bold claim.

What does it cost you to make a claim? Nothing. What does it
cost you to get customers lined up to talk about how they’re using
your technology on an enterprise scale? A lot. You better be real,
you better be doing it. By the way, I think they’re going to be
great, I really do. I think they’re going to be a great company.
But the use cases are different. I think about, why do we have such
diversity in the real world, right? Why was MySQL popular after
Oracle, and why do we have SQL Server and Oracle, and why – I’ve
been around this business way too long to give serious credence to
claims like that. I want to see use cases, I want to see things
done in real life, I want to talk facts. I’d rather talk about
what’s happening rather than what’s going to happen. Talking about
what’s going to happen is fun and easy. Talking about what’s
happening is hard and real.

So, relational databases – they’re still going to
be around in 20, 30 years’ time, right?

I completely believe they will. I absolutely believe they will.
These things have such long tails, my goodness. I can remember,
again going back to ’92, developing this stuff – the claims that
would happen, it seemed like every month, about the mainframe was
gonna be dead and there was going to be no mainframe by the year
1995. And then there was going to be no mainframe after Y2K,
because it was all going to be rewritten, and there’s going to be
no mainframe – I dunno, I haven’t checked in a while, but the last
time I looked, mainframe sales were flat for like 20 years? This
stuff has a long tail. It is not easy to just say that a
market that size just goes away. That is a pretty unrealistic way
to think about things, number one; number two, there are still very
good use cases for it. Very good use cases. And this is
all about helping people find the right use cases for the right
problems.

That’s why we exist. We want to be credible and trustworthy.
What we say, we want people to be able to go and verify, and get
help and understanding and deliver a solution that they can put
into production. And what we’re seeing is, when people do that,
often relational technology sits right alongside these other
[modern] technologies. And I know my friend Mike Olson [spelling]
over at Cloudera, he says the same thing about Hadoop. He says, in
the majority of cases I see, these technologies are living in an
ecosystem with these traditional technologies, and I echo that, I
see that exact same thing.

So, no, maybe I’ve just been around too long, and maybe I’ve
been around the relational world for too long, but I just know
these markets have very, very long tails.

Finally, when can we drop this NoSQL label and
just talk about databases?

[Laughs] Man, I wish I could do that tomorrow.
I’ve never liked it, I just – philosophically, I don’t like
describing something by what it’s not. I think that’s a very poor
way to describe anything in life, number one; number two, it’s
actually inaccurate. There’s no reason that you can’t use SQL-like
language, at least for parts of what you do, and in fact that’s
precisely what we do with the language we have called CQL, which is
the Cassandra Query Language. And the Cassandra Query Language is a
subset of SQL. If you know SQL, you’re gonna look at CQL
and go “Oh yeah, of course I get that. SELECT name FROM employee
WHERE…

So, there was a time when it was very hard to categories this
sort of stuff. None of us knew what to call them, and it just
stuck, and I guess it’s going to stick with us for some time to
come because we are really just now starting across the chasm, and
when you get into that more mainstream mindset that they do look
for nice ways to easily classify something and easily name
something, and this name I think is going to be around for some
time, actually. I would have much preferred – I loved the term
‘flexible schema’ actually. That, to me, makes the most sense, but
that’s coming from an old, relational mind who loves the idea of
not being beholden to that schema once I create it. And I love the
idea of having a schema that can change from entry to entry. But,
we’re stuck with it. My guess is it’s going to be with us for a
while.

It’s a blessing and a curse, right?
Because it’s a very useful marketing buzzword.

It is, and y’know it does help people understand categorically
what you’re talking about when you say it. And then you have to
start breaking down the nuances even further. The biggest thing I
think is going to happen with all these databases is, it really is
currently – and I think will remain – a world that is heterogeneous
on the backend of these data stores. And what I mean by that is,
your [JAX Magazine’s] audience, your crowd, is getting very, very
good at building services layers that are flexible, and that route
the right workloads to the right databases.

So here’s the challenge that you have, and that a lot of people
in the marketplace have: you guys may read the use case, you may go
read our eBay use case, and then you’ll turn around and read an
eBay use case about Teradata. And then you’ll turn around and read
an eBay use case about, I dunno, PostgreSQL. And then you’ll start
reading these same logos, and you’ll start seeing all this other
technology. And you’re like, “wait a minute, I though they just ran
this? I thought they ran that?” And it’s no longer an either/or
world.

Back in the day we had an application, that application talked
to a database, and you picked a database. And when you wanted to
move that data somewhere else you ETL’d it. That’s just not what’s
happening now. Even in my smaller customers, I see services layers
being built – very sophisticated services layers being built – that
will route the data to the proper technology. So it’s not uncommon
at all, in the same application, to see a workload going to a
relational database, a workload going to DataStax, a workload going
to Mongo[DB], a workload going to MySQL – in the same application.
And so, back to the [10gen] comment about market share, how do you
claim that? Does that mean “we are that database”?

It’s just going to be a much more complex world in that sense,
but I think it’s going to be a better world for the application
architects. Because now they do have this true ability to go find
the right technology for the right piece of the application stack.
And I think that’s wonderful, I would have loved to have
that flexibility, back when I was developing in the 90s, that would
have been fantastic. So I think it’s a very exciting time right now
for that reason.

Thanks for talking to us, Billy!

This interview appeared in JAX Magazine: Pulling Together.
For that issue and others, click here.

Billy is responsible for the strategy, explosive growth, and day-to-day operations of DataStax. He has 20 years of experience in the database industry in roles ranging from DBA to senior executive. Prior to DataStax, Billy spent 6 years at Quest Software where his most recent role was VP and GM of the database business unit.