Reactive, Streaming Stardog Kernel

byMike Grove,
17 January 2017 · 7 minute read

The Stardog kernel API is morphing into a system based asynchronous, reactive
streams; in this post we discuss motivations and design goals.

Background

If you’ve programmed against Stardog, you’ve had to use at least a bit of the
Stardog Native API for the RDF Language (SNARL API). Probably the most oft-used
bit is the Connection class. Patterned after JDBC connections, Connection is
the means of interacting with a Stardog database within an application. It’s
very stable: only 28 non-javadoc changes to the interface since it was
introduced before Stardog 1.0; none in the last 18 months.

But SNARL is showing its age. The pleasant world of lambdas, try-with-resources,
or just using String in a switch statement is now the actual world. We’ve been
thinking about where Stardog is going in 2017; you might
have
read about it recently as
we recapped an exciting 2016. We unify all the
data:
Structured,
unstructured, and
everything in between. As we reconsider our design, and plan for future growth,
I thought it was time to take a new look at SNARL.

Why?

Let’s set aside the issues of changing a public API first. It’s not something we
decided easily; we take
compatibility seriously. But
Stardog is undergoing fundamental changes, and some of those changes will be
directly reflected in API like SNARL.

At the very least, there’s a lot to be gained from leveraging Stream and
lambda. We weakly embraced this the last time we changed the API. But a lot has
changed in the Stardog world too. When the Connection class and the rest of
the SNARL API was originally created, we were months from our first public
release.

SNARL was created very early on, before stuff like disk-based databases, the
reasoner, CLI, security, even transactions, existed. So there it was, the public
face of Stardog for years to come, built when the system was not much more than
a query engine and memory-based indexes that were simple, sorted arrays. And
we’ve hardly changed it since.

Going Reactive

I first heard about reactive programming not from
the manifesto but from some binge
watching/listening of Erik Meijer videos. If you haven’t watched any of Erik
Meijer’s talks, you should. You’ll learn something, and you’ll definitely
laugh.
This talk about reactive a few years ago was
my introduction.

I’ve seen it referred to as the “Observer pattern done right”, which is fair. It
is basically Observer Pattern. The primary difference is that it adds
error and completion semantics. You can also just think about it as
Iterable. The only difference is it’s push vs pull, and instead of the
barebones Iterable API, you get Stream
plus a whole lot more.

But for me, it was easier to just think about it as a way to work with streams
of data. Consider a click stream from a mouse. These streams are a simple sequence
of events: Value, Error, and Complete. Error and Complete are actually
mutually exclusive. Things worked, or they didn’t, but not both. It’s not rocket
science. You can get zero or more Values before the termination event (i.e.
Error or
Complete). This is a
great explanation of reactive principles in terms of a click stream.

Basic usage of a reactive API would look very similar to the Java Stream API:

Apply an offset and limit, turn the Values into Strings and print them out;
subscribe in this case is akin to forEach. Add
scheduling,
and
backpressure,
and it’s easy to see how this approach can provide the building blocks for a
powerful and flexible API.

Here are a few other links that I found useful when trying to grok this stuff:

Why Reactive?

Why not Future?

One downside to going with the reactive pattern is that it’s not as widely known
as, say, Future. That won’t help the learning curve. And we have always tried
to maximize developer happiness and productivity because after all, we’re
developers too, and we look after our peeps!

Java 8 improves Future. Before we switched from Java 6 to Java 8, I would have
told you that Future is not easily composable and that doing error handling is
cumbersome. The changes introduced in Java 8 go a long way to improving this.
Future also gives us some of the lazy/asynchronous joy we can get out of
reactive, especially if we’re just waiting on a single result. For example, did
dropping the database work or not? Future<Void> is perfect.

However, it’s easy to kill some of that joy by calling Future#get too early.
If you do that, you’re stuck waiting. You can use callbacks to help;
Guava has
some nice support for this, but callbacks can get out of control
quickly.

There’s also the problem that Future is single-valued. So do you use
Future<Collection<T>> or Collection<Future<V>>? Probably the latter, but if
the first one in the collection we call Future#get on isn’t done, we’re
blocked from getting results from anything else in the collection.

Scheduling is pretty straight-forward, but there’s no notion of backpressure:
you’ve got to roll your own. And what about enhancing the Future, like
attaching a progress listener to the asynchronous task? You’ll need to bake that
into anything you want to keep tabs on.

My first attempts to sketch out some ideas based on Future just felt
cumbersome. For what I had in mind, a reactive design just felt more intuitive
and natural to use.

Why use it for SNARL?

Distributed systems are hard. One of the hardest things to internalize, and then
design for
is
failure.
Instead of running one node, you run three so there’s no single point of
failure, but now there are a million more things that can go wrong. Fault
tolerance, including finding ways to be partially available, must be a
fundamental part of the system. And you need to keep a close eye on things, so
when failures do occur, and they will, you can manually intervene if need be;
replace failing nodes, add more capacity, etc.

This is a big change from a single node system. We’ve been building in, around,
and on top of our existing API as the cluster has matured and as we’ve begun the
shift from a vertically scaling system to a horizontally scalable one. One thing
I felt was clear was that a lot of these concerns needed to be built into the
system, rather than on top of it.

We’ve been using Apache Curator for the cluster, which is something that grew
out of the Netflix Open Source treasure trove. So I knew
about Hystrix, and more generally
the
Circuit Breaker pattern.

According to the Hystrix wiki, Hystrix is designed to do the following:

“Give protection from and control over latency and failure from dependencies
accessed (typically over the network) via third-party client libraries.”

Do we want any of that in Stardog? Check, check, and check. That’s a good place
to start. It’s even got steps
toward
monitoring.
This can be part of the glue between the different parts of Stardog, but it’s
almost an implementation detail rather than something that’s part of the API.
And it plays
very nicely with RxJava.
So now a couple different ideas all seem to be coming together to provide the
building blocks for the next generation of the SNARL API.

The switch in design isn’t without drawbacks. I’ve already touched on the
familiarity of Future versus the relative unfamiliarity of reactive APIs.
Other areas of concern are that extending Flowable is cumbersome, but doable,
while Observable is not. The distinction between hot and cold observables is
rarely obvious, you really have to pick one style and stick with it, because
let’s face it developers aren’t going to read the @implNote in the docs to
find out which is which until after there’s a bug. And debugging can be an
exercise in interpreting more convoluted stack traces. Finally, understanding
what errors an exception “throws” is less clear. There’s no throws declaration
to rely on as exceptions are reported to subscribers directly. You have to read
the javadocs to know what kinds of exceptions can be thrown.

But overall, it’s a very positive step forward for Stardog and something
I’m personally very excited about.

Coming Up Next

This is the first in a two part series discussing the upcoming changes to the
SNARL API. In the final installment we’ll be taking a closer look at the API
itself.