Fast and Concurrent Query Processing on Big (Linked) Data

News

Overview

Many knowledge bases like Google and Facebook’s knowledge/social graphs are represented and stored as RDF graphs, where users can issue structured queries on such graphs using SPARQL. With massive queries over large and constantly growing RDF data, it is imperative that an RDF graph store should provide low latency and high throughput for concurrent query processing. However, prior systems still experience high per-query latency over large datasets and most prior designs have poor resource utilization such that each query is processed in sequence.

We propose Wukong, a distributed in-memory RDF store that leverages RDMA-based graph exploration to support fast and concurrent RDF queries. Wukong significantly outperforms state-of-the-art systems and can process a mixture of small and large queries at 185,000 queries/second on a 6-node cluster.

Approaches

Wukong extends existing graph based store with builtin index vertices and leverages differentiated graph partitioning to distribute vertices and indexes. Wukong's design is centered around the use of low-latency,
high-throughput one-sided RDMA operations, including a predicate-based RDMA-friendly distributed hashtable, RDMA cost-aware adaption among migration code and data, RDMA-aware full-history pruning. To support highly concurrent queries, Wukong further leverages a worker-obliger work stealing design that minimizes the impact from lengthy queries.

Support Streaming

We further propose Wukong+S (S for stream) that adopts C-SPARQL as streaming model and extends Wukong to support concurrent queries on multiple varied-scale streams as well as the background data. Wukong+S can process a mixture of simple and complex C-SPARQL queries at 56,000 queries/second on a 6-node cluster.

(We will merge the new feature to Wukong by the end of 2017. Coming soon!)

Acknowledgements

The project is supported in part by China National Natural Science Foundation (61402284, 61772335, 61572314, 61525204), the National Key Research & Development Program (No. 2016YFB1000500), the Program for New Century Excellent Talents in University of Ministry of Education of China (No.ZXZY037003), a foundation for the Author of National Excellent Doctoral Dissertation of PR China(No. TS0220103006), Doctoral Fund of Ministry of Education of China (No. 20130073120040), the Open Project Program of the State Key Laboratory of Mathematical Engineering and Advanced Computing under Grant (No. 2014A05), and Singapore CREATE E2S2.