Monday, 19 October 2015

PAGE: A Partition Aware Engine for Parallel Graph Computation

Abstract

Graph
partition quality affects the overall performance of parallel graph computation
systems. The quality of a graph partition is measured by the balance factor and
edge cut ratio. A balanced graph partition with small edge cut ratio is
generally preferred since it reduces the expensive network communication cost.
However, according to an empirical study on Giraph, the performance over well
partitioned graph might be even two times worse than simple random partitions.
This is because these systems only optimize for the simple partition strategies
and cannot efficiently handle the increasing workload of local message
processing when a high quality graph partition is used. In this paper, we
propose a novel partition aware graph computation engine named PAGE, which
equips a new message processor and a dynamic concurrency control model. The new
message processor concurrently processes local and remote messages in a unified
way. The dynamic model adaptively adjusts the concurrency of the processor
based on the online statistics. The experimental evaluation demonstrates the
superiority of PAGE over the graph partitions with various qualities.

Aim

The main aim is to generate
PAGE for an efficient and general parallel graph computation engine.

Scope

A partition aware graph
computation engine named PAGE that monitors three high-level key running
metrics and dynamically adjusts the system configurations.

Existing System

Graph
computation systems. Parallel graph computation is a popular technique to
process and analyze large scale graphs. Different from traditional big data
analysis frameworks (e.g., Map Reduce), most of graph computation systems store
graph data in memory and cooperate with other computing nodes via message
passing interface. Besides, these systems adopt the vertex-centric programming
model and release users from the tedious communication protocol. Such systems
can also provide fault-tolerant and high scalability compared to the traditional
graph processing libraries, such as Parallel BGL and CGMgraph. There exist
several excellent systems, like Pregel, Giraph, GPS, Trinity.

Graph
partitioning algorithms. To evaluate the performance of distributed graph
algorithm, Ma et al. introduced three measures, which are visit times, makespan
and data shipment. As efficiency (makespan) remains the dominant factor, they
suggested to sacrifice visit times and data shipment for makespan, which
advocates a well-balanced graph partition strategy when designing distributed
algorithms. Actually, various graph partitioning algorithms focused on this
object as well. METIS is an off-line graph partitioning package which can bring
off high quality graph partitioning subject to a variety of requirements. But it
is expensive to use METIS partitioning large graphs.

Disadvantages

·The existing
graph computation systems may suffer from the high-quality graph partition at a
certain point.

In
this project, we present a novel graph computation engine, partition aware
graph computation engine (PAGE). To efficiently support computation tasks with
different partitioning qualities, we develop some unique components in this new
framework. First, in PAGE’s worker, communication module is extended with a new
dual concurrent message processor. The message processor concurrently handles
both local and remote incoming messages in a unified way, thus accelerating the
message processing. Furthermore, the concurrency of the message processor is
tunable according to the online statistics of the system. Second, a partition
aware module is added in each worker to monitor the partition related
characters and adjust the concurrency of the message processor adaptively to
fit the online workload.

The
work of dynamic graph repartition and PAGE are orthogonal. To balance the
workload, the proposed strategies repartition the graph according to the online
workload. Thus the quality of underlying graph partition changes along with
repartitioning. The existing graph computation systems may suffer from the
high-quality graph partition at a certain point, but PAGE can mitigate this
drawback and improve the performance by decreasing the cost of a worker
further.

Advantages

·PAGE can
mitigate the drawback and improve the performance by decreasing the cost of a
worker further.

·To efficiently
support computation tasks with different partitioning qualities, we develop
some unique components in this new framework.