SIAM PP08 Workshop for HPC on Large Graphs

The Rennaissance Atlanta Hotel Downtown
Atlanta, GA

12-14 March 2008

Scope and Goals:

Graphs are a fundamental representation of information that spans the widest possible range of computing
applications. They are particularly important to computational biology, web search, and knowledge
discovery. As the sizes of the graphs increase, the need to apply High Performance Computing (HPC) to
solve these problems is growing dramatically. This workshop represents a unique opportunity for the
community of researchers in this field and focus on the technical issues associated with applying HPC to
large graph problems. The workshop will feature several invited speakers drawn from the leaders in this
new and rapidly growing field.

Location:

This workshop is co-located with SIAM PP08, held 12-14 March 2008, at the Rennaissance Atlanta Hotel Downtown in Atlanta, GA. Registration information for PP08 can be found at here.

Program:

An analytical theory of power law graphs is presented based on the
Kronecker graph generation technique. The analysis uses Kronecker
exponentials of complete bipartite graphs to formulate the
sub-structure of such graphs. This allows various high level quantities
(e.g. degree distribution, betweenness centrality, diameter,
eigenvalues, and isoparametric ratio) to be computed directly from the
model parameters. The implications of this work on ``clustering'' and
``dendragram'' heuristics are also discussed.

Jonathan Berry, Sandia National Laboratories
The MultiThreaded Graph Library (MTGL) is a set of open-source codes for
graph algorithms that target emerging massively multithreaded
architectures. This library is based upon a kernel of the Boost Graph
Library, though it does not require Boost.
MTGL codes run on serial workstations, and can run on the Cray MTA/XMT
supercomputers. Much of the detail inherent in the programming model of
the latter machines is abstracted away from the application programmer.

Danny Dunlavy, Tammy Kolda, and Philip Kegelmeyer, Sandia National Laboratories
Link analysis of data represented by a graph typically focuses on a
single type of connection. However, often we want to analyze data that
has multiple linkages between objects. The goal of this talk is to show
that tensors and their decompositions provide a set of tools for
multi-link analysis. We provide examples of how tensors can be used to
understand the structure of document spaces and define similarities
based on multiple linkages.

Aydin Buluc and John R. Gilbert, University of California, Santa Barbara
Large combinatorial graphs appear in many applications of
high-performance computing, including computational biology,
informatics, analytics, web search, dynamical systems, and sparse matrix
methods. High-performance combinatorial computing is much less well
understood than high-performance numerical computing, where there are
standard algorithmic primitives and a deep understanding of effective
mappings of problems to computer architectures. Here we describe the
usage and implementation of sparse generalized matrix-matrix
multiplication as a primitive operation for high-performance computing
on large graphs.

Jure Leskovec, Carnegie Mellon University
Given a large, real graph, how can we generate a synthetic graph that is
similar, i.e., it has similar degree distribution, diameter, spectrum,
etc.? I will introduce "Kronecker graphs" model, which naturally
generates graphs with above properties, and present a fast and scalable
algorithm for fitting Kronecker model to real networks. Experiments on
large real networks show that Kronecker mimics well the patterns found
in target graphs. Once fitted, the model can be used for anonymization,
extrapolations, and graph summarization.

Eric Robinson, Northeastern University
The betweenness centrality metric measures the importance of a node in a
graph by examining the number of shortest paths through it.
A natural translation of a sequential BC algorithm using linear
algebraic
notation is shown. A batching parameter shifts this algorithm betweenO(E) and O(V^2) space, possibly providing constant speedup for small
batch values. When implemented in pMatlab, both sequential and parallel
algorithms
are shown to have reasonable performance compared to their C
counterparts.