SP2B Benchmark Results

SP2Bench
(hereafter referred to as SP2B) is a challenging and comprehensive
SPARQL performance benchmark suite designed
to cover the most important SPARQL constructs and operator constellations
using a broad range of RDF data access
patterns.
While there exist other benchmark suites for RDF data
access—BSBM
and LUBM are two other
well-known benchmarks—we’ve found SP2B to overall be the most helpful
metric by which to track and evaluate performance gains in Dydra’s query
processing. Not coincidentally, SP2B’s main author also wrote one of the
definitive works on
SPARQL query optimization—exactly the kind of light reading we like to
enjoy in our copious free time…

Results

We obtained the following results (note the log scale) on a standalone
deployment of Dydra’s proprietary query engine, affectionately known as
SPOCQ, to a
machine of roughly comparable hardware specifications as those used for
previously-published SP2B results:

We benchmarked four input dataset sizes ranging from 10,000 to 1,000,000
triples (indicated as 10K, 50K, 250K, and 1M). Following the
methodology
of the comprehensive
SP2B analysis
published by Revelytix last year, we executed
each query up to 25 times for a given dataset size, and averaged the
response times for each query/dataset combination after discarding possible
outliers (the several lowest and highest values) from the sample. The query
timeout was defined as 1,800 seconds.

The hardware used in these benchmarks was a dedicated Linux server with a
dual-core AMD Athlon 64 X2 6000+ processor, 8 GB of DDR2 PC667 RAM memory,
and 750 GB of SATA disk storage in a software-based RAID-1 configuration.
This was a relatively underpowered server by present-day standards, given
that e.g. my MacBook Pro laptop outperforms it on most tasks (including
these benchmarks); but, it does have the benefit of being roughly comparable
to both the Amazon EC2 instance size used in
the Revelytix analysis as well as to the hardware used in the original SP2B
papers.

Comments

Several of the SP2B queries—in particular Q4, Q5a, Q6, and Q7—are tough on
any SPARQL engine, and in published results it has been typical to see many
implementations fail some of these already at dataset sizes of 50,000 to
250,000 triples. So far as we know, SPOCQ is the only native SPARQL
implementation that correctly completes all SP2B queries on the 250,000
triple dataset within the specified timeout (1,800 seconds) and without
returning bad data or experiencing other failures. Likewise, we correctly
complete everything but Q5a (on which see a comment further below) on the
1,000,000 triple dataset as well.

Q1, Q3b, Q3c, Q10, and Q12

As depicted above, SPOCQ’s execution time was constant-time on a number of
queries—specifically Q1, Q3b, Q3c, Q10, and Q12c—regardless of dataset
size. The execution times for these queries all measured in the 20-40
millisecond range, depending on the exact query. Of the SP2B queries, these
are the most similar to the types of day-to-day queries we actually observe
being executed on the Dydra platform, and showcase the very efficient
indexing we do in SPOCQ’s storage substrate.

A Detailed Look at Q7

The SP2B query Q7 is designed to test a SPARQL engine’s ability to handle
nested closed-world negation (CWN). Previously-published benchmark results
indicate that along with Q4, this query has proved the most difficult for
the majority of SPARQL implementations, with very few managing to complete
it on datasets larger than 10,000 to 50,000 triples. We’re happy to report
that our SPARQL implementation is among those select few:

The above chart combines our results with Revelytix’s comprehensive
SP2B benchmark results.
The depicted 1,800+ second bars here indicate either a timeout or a failure
to return correct results (see pp. 38-39 of the Revelytix analysis for more
details).

None of the implementations Revelytix benchmarked were able to complete SP2B
Q7 on 1,000,000 triples within a one-hour timeout. SPOCQ completes the task
in 80 seconds.

While we benchmarked on more or less comparable hardware and with comparable
methodology, we do not claim that the comparison in the preceding chart is
valid as such; take it with a grain of salt. It is indicative, however,
of the amount of work we have put, and are putting, into preparing Dydra’s
query engine for the demands we expect it to face as we exit our beta stage.

A Detailed Look at Q4

The SP2B query Q4 deals with long graph chains and produces a very large
solution sequence quadratic to the input dataset size. It is probably the
most difficult of the SP2B queries, with few SPARQL implementations managing
to finish it on input datasets larger than 50,000 triples.

As with Q7, this chart draws on data from the aforementioned Revelytix
analysis (see pp. 32-33 of their report for details), and the same caveats
certainly apply to this comparison. Nonetheless, of the implementations
Revelytix benchmarked, only Oracle completed SP2B Q4 on 1,000,000 triples
within a one-hour timeout. They reported a time of 522 seconds for Oracle.
SPOCQ completes the task in 134 seconds.

A Special Note on Q5a

No existing SPARQL implementation does well on Q5a for larger datasets. We
believe this is due to an oversight in the SP2B specification, where Q5a is
defined as using a plain equality comparison in its FILTER condition, yet
it is suggested that this makes for an implicit join that can be identified
and optimized for. However, since joins in SPARQL are in fact defined in
terms of a sameTerm comparison, such an optimization cannot be safely
performed in the general case.

We have therefore also included results for an amended version of Q5a,
named Q5a′:

Q5a′ simply substitutes sameTerm(?name, ?name2) in place of
the ?name = ?name2 comparison, allowing the join to be optimized for.
Q5a′ runs in comparable time to Q5b, as intended by the authors of
SP2B. We suggest that others benchmarking SP2B note execution times for
Q5a′ as well.

Caveats

While our production cluster has considerably more
aggregate horsepower than the benchmark machine used for the above, it
wouldn’t make for a very meaningful comparison given that all
previously-published SP2B results have been single-machine deployments. So,
the figures given here should be considered first and foremost merely a
baseline capability demonstration of the technology that the Dydra platform
is built on.

Further, during our ongoing beta period we are enforcing a maximum query
execution time of 30 seconds, which of course would tend to preclude
executing long-running analytic queries of the SP2B kind. If you have
special evaluation needs you’d like to discuss with us, please contact
Ben Lavender (ben@dydra.com).

Credits

In closing, we would like to express our thanks to the Freiburg University
Database Group, the authors of the
SP2B performance benchmark suite. SP2B has provided us with an invaluable
yardstick by which to mark our weekly improvements to Dydra’s query
processing throughput.
Anyone developing a SPARQL engine today without exposing it to the
non-trivial and tough queries of SP2B are doing themselves a serious
disservice—as attested to by the difficulty most SPARQL implementations
have with the more strenuous SP2B queries. SP2B is truly the gold standard
of SPARQL benchmarks, ignored at one’s own peril.