Partitioning models for scaling distributed graph computations

Author

Advisor

Date

Publisher

Language

Type

Item Usage Stats

Metadata

Abstract

The focus of this thesis is intelligent partitioning models and methods for
scaling the performance of parallel graph computations on distributed-memory
systems. Distributed databases utilize graph partitioning to provide servers with
data-locality and workload-balance. Some queries performed on a database may
form cascades due to the queries triggering each other. The current partitioning
methods consider the graph structure and logs of query workload. We introduce
the cascade-aware graph partitioning problem with the objective of minimizing
the overall cost of communication operations between servers during cascade processes.
We propose a randomized algorithm that integrates the graph structure
and cascade processes to use as input for large-scale partitioning. Experiments
on graphs representing real social networks demonstrate the e ectiveness of the
proposed solution in terms of the partitioning objectives.
Sparse-general-matrix-multiplication (SpGEMM) is a key computational kernel
used in scienti c computing and high-performance graph computations. We
propose an SpGEMM algorithm for Accumulo database which enables high performance
distributed parallelism through its iterator framework. The proposed
algorithm provides write-locality and avoids scanning input matrices multiple
times by utilizing Accumulo's batch scanning capability and node-level parallelism
structures. We also propose a matrix partitioning scheme that reduces
the total communication volume and provides a workload-balance among servers.
Extensive experiments performed on both real-world and synthetic sparse matrices
show that the proposed algorithm and matrix partitioning scheme provide
signi cant performance improvements.
Scalability of parallel SpGEMM algorithms are heavily communication bound.
Multidimensional partitioning of SpGEMM's workload is essential to achieve
higher scalability. We propose hypergraph models that utilize the arrangement of processors and also attain a multidimensional partitioning on SpGEMM's workload.
Thorough experimentation performed on both realistic as well as synthetically
generated SpGEMM instances demonstrates the e ectiveness of the
proposed partitioning models.