Of high priority for the LDBC is getting industry input on benchmark development – benchmarks that are not interesting to industry are generally not very interesting. To address this we engage with industry via bi-annual Technical User Community (TUC) meetings, where experts from both industry and academia are invited to present their data management use cases and participate in the LDBC benchmark development process.

This past April the second TUC meeting was hosted in Munich by the Technical University of Munich, an academic partner of the LDBC project.

One highlight was the talk by Klaus Großmann (Dshini CTO) entitled Neo4j at Dshini(Dshini is a German social network that aims to generate purchasing power through activity only – members earn virtual currency, save up and redeem it to fulfill their wishes).

In his presentation, Klaus shared his experience of using Neo4j as the main data storage technology at Dshini, and provided many insights regarding graph data modeling in the real world. A great talk and very useful input to our benchmark design process – perfect illustration of the value gained by involving industry in the LDBC!

A natural byproduct of Neo’s participation in the LDBC is a general increased presence in academic circles. In the coming months Neo will be present and participating in a number of exciting events, including (but not limited to) the GRADES and GraphLab workshops.

GraphLab workshop (1st of July in SFO): also co-sponsored by the LDBC, this event will focus on large scale machine learning on sparse graphs. Here too Neo is a member of the program committee, and we will have a number of representatives at the event.

Both I and my colleague Philip Rathle will be at the event, to represent Neo and the LDBC project.

Not to mention GraphConnect (@GraphConnect)… this will be a series of five conferences across the USA and England, held between June-November of this year!

Recent benchmark efforts, their relevance, and what we’re busy building

Lately a number of graph database-related micro-benchmarking efforts have been published; these are obviously interesting to Neo, both in general and in the context of LDBC. Though a growing number of such examples are popping up, a recent one that stands out is LinkBench from Facebook. More specifically, what stands out is the data generator embedded in LinkBench.

The general ‘problem’ with generators is they generate synthetic data, the data is not real and its characteristics perhaps not representative of the real world. LinkBench is unique in that it was developed at Facebook – few organizations have access to a real social network dataset as immense or rich as that of Facebook’s. This puts Facebook researchers in the unique position of being able to verify the “realisticness” (I just made it a word…) of the data generators they develop – and, now, Facebook have made LinkBench public, along with details of its data generator!

How does this relate to the LDBC?

It assists us in developing more meaningful benchmarks.

We (Vrije University and the Polytechnic University of Catalonia in particular) are in the process of developing the LDBC data generator – a continuation of the work performed by Vrije University on the SIB social network generator. We’ve now gone through the process of evaluating LinkBench (and a number of real datasets) and are modifying the LDBC data generator, applying the lessons learned to improve the generator’s “realisticness”.

In parallel, we’ve also started development of a benchmark driver, for future LDBC benchmarks to use. More on that in a later post!

The first versions of both the LDBC benchmark driver and LDBC data generator will be published on our public github account as soon as we have something to share!