G-Store puts the proposed desiderata into
action. Following desideratum (1), G-Store stores the labels and edges of each
vertex together. G-Store uses an efficient encoding system, separating edges to
vertices in the same block from edges to vertices in a different block.
Desiderata (2), (3), and (4) are realized through a multilevel algorithm. The
objective of the algorithm is to find an approximate solution to the following
problem:

where , , and are parameters. Multilevel algorithms
have previously been applied to the graph partitioning problem and the minimum
linear arrangement problem. G-Store’s storage algorithm may well be viewed as
an attempt to solve a combination of difficult versions of these problems.

Figure 8 illustrates the algorithm. The
input graph is defined in a plain text file. Together with the schema
definition, the text file is used to create a main memory representation of the
input graph. This graph is coarsened until each of its connected components
consists of a single vertex. Finding a -minimizing
partitioning for the coarsest graph is a simple task. Iteratively, the
coarsening steps are undone. In each iteration, the partitioning for the
coarser graph is projected to the next finer graph, and refined. The
partitioning for the finest graph is used to derive a placement of the input
graph into consecutive blocks on the disk.

Throughout this section, we use to denote the input graph defined in
the text file. We use to denote the graph at
coarsening level . Each
graph is undirected and both vertex-weighted
and edge-weighted (vertex weights are not shown in Figure 8). We use and to
denote weights. We let for an arbitrary be the expected number of bytes that
the vertices from that
are represented in will
use on the disk. We let for an arbitrary be the number of edges from that connect a vertex
represented in with a
vertex represented in .

We use to
denote a surjective function that maps the vertices in the graph at level to partitions. is not known. We use to denote the set , and to
denote the sum .

The multilevel storage algorithm can be
broken down into smaller algorithms. The coarsening algorithm (Section 6.3)
first derives for all levels. The
turn-around algorithm (Section 6.4) then derives for
the coarsest graph. The uncoarsening algorithm (Section 6.5) derives for the remaining levels. All
algorithms have been designed to reduce at
each level. There is no strict constraint on the weight of a partition.
However, the algorithms implement various heuristics to push for all for
each level below a
bound that changes proportional to . The finalization algorithm (Section 6.6)
ensures that does not violate for any .

The multilevel storage algorithm uses the
compact storage format (see Section 3.1.2) to represent the graphs at the
various coarsening levels in main memory. Each graph stores a pointer to the
next finer and next coarser graph, much like a doubly-linked list. During
coarsening and uncoarsening, the algorithm works on at most two graphs
concurrently. If the algorithm runs out of memory at any time, it attempts to
move a graph that is not currently used to a temporary location on the disk. It
is read into memory again when it is needed.

The memory bottleneck is the coarsening
from to .
These are the two largest graphs and if the algorithm runs out of memory, there
is no graph that could be moved to the disk. In the next version of G-Store, it
will be possible to run the storage algorithm on a part of the input graph if
the entire graph cannot be represented in memory.

is
created directly from .
The schema definition instructs the algorithm how to parse the text file that
defines . Loops and
parallel edges in are
ignored. Directed edges in are converted to undirected edges in . for
each is either one or two. It is one if the
corresponding edge in is
directed, two if it is undirected.

for
each is the expected number of bytes that
the encoding of the corresponding vertex in will use on the disk. A variable length
character string in a vertex label, for instance, is accounted for with one
byte per character, plus two bytes for a block-internal pointer. ’s edges are accounted for with four
bytes per edge (based on and
ignoring loops and parallel edges). The actual size of each edge is not known
until a partitioning has been derived. More details are given in Sections 6.7
and 7.2. In general, is as least as large as the
actual number of bytes.

G-Store’s coarsening algorithm is a variant
of heavy edge matching (HEM), a greedy heuristic introduced in [13]. HEM
is used in many multilevel graph partitioning algorithms and is implemented in
both Metis version 4.0 (default coarsening method), and Chaco version 2.0
(optional, set via parameter MATCH_TYPE). HEM creates from as
follows:

• Set all vertices in as
‘unmatched’. The vertices in are visited in
random order. Let be
the next vertex.

• If is matched, continue. Else, find an edge such that is unmatched and is
as high as possible. If no edge is found, continue. Else, set and as ‘matched’ and continue.

• After all vertices have been visited,
each unmatched vertex and each pair of matched vertices is mapped to one vertex
in . Unmatched vertices keep their weight,
matched vertices add up their weight. is
created from by converting each edge to match the
mapping to vertices in . Edges that would be loops
in are discarded. A set of edges that
would be parallel in is combined into a single
edge that carries the weight of the set.

Since HEM prefers edges with a large weight
during matching, and since loops in are discarded, HEM
tends to yield graphs with a comparably low total edge weight. This increases
the probability that a partitioning for with
a lower cost can later be found [12], maybe also
with a lower cost .

In multilevel partitioning algorithms,
coarsening is usually stopped as soon as falls
below a given value. In Metis, this value is set with hard code to .
This choice is a further indication that Metis has not been designed for
problems where the number of partitions, , is large.

G-Store’s algorithm keeps coarsening until . When coarsening is stopped, there
will be one vertex in for each connected component
in .

Let us define .
In traditional HEM, tends to decrease with
increasing . This is due
to the emergence of “hub and spoke vertices” after repeated coarsening. Hub
vertices are characterized by a very high degree and a large number of spoke
vertices in their neighborhood. Spoke vertices are characterized by a very low
degree and a hub vertex in their neighborhood. In traditional HEM, each hub
vertex can be matched with only one other vertex in each iteration, leaving a
large number of spoke vertices unmatched.

G-Store modifies HEM as follows: Two
additional parameters are introduced, and . is the number of vertices that a vertex in can be matched with in each iteration.
is the maximum weight
of a vertex in . is initialized to two, to .
After each iteration:

Each iteration of the uncoarsening
algorithm takes , ,
and as input and returns . Each iteration can be broken down
into three smaller algorithms: projection, reordering, and refinement.
Projection derives a first attempt of .
Reordering and refinement modify this function; reordering by swapping
partitions, refinement by reassigning individual vertices to other partitions
and by clearing out entire partitions.

Before we describe the algorithms, we need
to define additional notation: In each iteration, let the weight threshold be the result of , where is
the average of all s
that have been observed during coarsening.

We define the tension for an
arbitrary vertex to be the sum . The tension for a partition is the
sum of the tensions of its vertices.

We define the modified tension for to be the sum , where are
the vertices that and have been mapped to during
coarsening. is not needed to calculate the
modified tension for a vertex in .

Finally, let be
a function that returns for a set of vertices from the
set of vertices from that have been mapped to any
vertex in during
coarsening.

The projection algorithm derives a
first attempt of . After the algorithm
returns, and are
no longer needed and deleted from memory.

The algorithm steps through the individual
sets one by one, starting at . Let integer variable be initialized to 0.

• If = 1 or if , set for
all , increment and , and
continue.

• Else, find and store the modified
tension for all vertices in . Set vertex () to be the vertex with the lowest (highest)
modified tension. Set all vertices in to
‘unassigned’, except for and
. Let integer variables and be
initialized to and , respectively, where .

• and are the roots of two trees. Iteratively,
either the left tree (root ) or the right tree (root ) is grown, depending on which has
the lower total vertex weight. A vertex can be added to a tree if it is yet
unassigned, is connected to the tree, and has the lowest modified tension (left
tree) or the highest modified tension (right tree) among the vertices that are
connected to the tree.

• Suppose the left (right) tree is
grown, and suppose is
chosen to grow the tree. The next steps are:

(a) Set (). Set as ‘assigned’.

(b) If , increment (decrement ).

(c) For each ,
subtract (add) from (to) the stored
modified tension.

• After all vertices in have been assigned to a partition in
this way, the gap between and is closed: For each , set ,
where parameter is 2 if
, 1 if either or
, and 0 otherwise. Notice that one
partition always contains vertices found through both the left or the right
tree.

Finally, set , increment , and
continue.

While running, the projection algorithm
marks each partition of with a boolean
flag. A partition created under the first bullet point is marked true. A partition created under the tree growing algorithm is marked false if it is created through the left tree, and true if it is created through the right tree. The middle partition is
marked false.

Figure 9 illustrates the states of the
flags in an example with 7 partitions for and
14 partitions for . A false flag
is shown as 0, a true flag as 1. The row
‘’ shows the partitioning for . The row ‘’
shows a possible partitioning for after projection.
The row ‘’ shows a possible partitioning for after reordering (see below).

The reordering algorithm attempts to
reduce through the swapping of partitions. and do
not change during reordering.

Notice in Figure 9 how after projection is bound to the
partition borders in the coarser graph. Put differently, if for any , ,
then , , . The reordering algorithm breaks the
borders.

The projection algorithm already tried to
derive in a way that reduced . It used modified tension, however,
which is less expressive than tension. Now that a first attempt of is available, the tension measure can
be used to further refine it.

We illustrate the significance of tension
in a simple example. Suppose that for an arbitrary
vertex , and suppose ,
, , , and . In
this setup, the cost incurred by vertex is 20, and the tension on
vertex is . Figuratively, there is a
force pulling on from
the left. It is easy to see that cost can
be reduced by moving to
a partition in the range [5..10). For instance, if is
set to , decreases to 16. An improvement of 4.

The example can be written for partitions
as well: would be set , and and would be sets and
. would
be , and so on. Moving a partition is more
difficult, as setting for all would create a gap in the partition
numbering. There are two solutions: One is to shift all partitions with a
number greater than 10 left by one, the other is to shift partitions 8 and 9
right by one. The disadvantage of the former is that partition 8 might become
very large. The disadvantage of the latter is that moving partitions 8 and 9
adds complexity as the effect of their move has to be taken into account when
determining if moving partition 10 is beneficial.

The reordering algorithm identifies
opportunities in the partitioning structure, where swapping two adjacent
partitions decreases . Repeated swapping can move
a partition by more than one position.

The algorithm steps through groups of
partitions based on the boolean flags that were set in the projection
algorithm. A group is defined as a sequence of true-flagged
partitions followed by a sequence of false-flagged partitions.
Partitions may only be swapped within a group. As illustrated in figure 9, each
group (, , and ) contains partitions that originated from at
least two different partitions in the coarser graph. This makes a certain
degree of mobility between the partitions possible.

The algorithm uses an updatable,
array-based priority queue to hold the swap alternatives in the order of their
impact on . Repeatedly, the most beneficial swap
is executed and the priority values of the remaining swap alternatives updated.
Through repeated swaps, one partition can move from one end of the group to the
other. The algorithm continues with the next group as soon as the best swap
alternative in the current group does not improve .
The implementation of the priority queue can be found in structs.h.

The refinement algorithm tries to
reduce by reassigning individual vertices to
other partitions. The algorithm is one of the most complex in G-Store, and can
be fine-tuned in several ways. Some parameters can only be modified in the
code, others through G-Store’s parameter interface (see Chapter 7).

Five parameters can be set through the
parameter interface: alpha, beta, gamma, runs_a, and runs_b. The former three correspond to , , and
in .
Their default values are 0.125, 1, and 8, respectively. The parameters control
the relative importance of the cost functions during refinement. In general,
the parameters should be centered around 1.

runs_a sets the number of iterations of the refinement algorithm for the 8
finest levels (). runs_b sets the number of iterations for the remaining levels. Their
default values are 3 and 1, respectively. Lower values reduce computation time,
higher values can yield a better partitioning.

Let count the number of iterations, starting at
0. In each iteration, the refinement algorithm randomly steps through the sets for all . Let
be the next set.

• The algorithm creates a two-dimensional matrix , where .
Every entry is a 3-tuple that
contains the negative of the change in costs , , and if
vertex is moved to the partition with number .

• After the matrix has been created and
filled, the algorithm calculates a score for each entry:

Repeatedly, the
entry with the highest score is found. Suppose was
that entry. Then, if

evaluates to
true, vertex is moved
to partition , and all
entries in that are affected by the move are
updated. Otherwise, the algorithm continues with the next .

Notice how scoring differentiates between and , and between and .
For , an entry can only
have a non-negative score if all , , and are
non-negative. Compared with , this yields a more selective group of
vertices that are moved.

is
used as a reward for moving a vertex to a partition whose weight plus the
weight of that vertex is not larger than . If
the weight of partition is itself not larger than , the reward is canceled out by a
higher threshold to move.

If the weight of partition drops to or below , the algorithm changes the threshold
to move to the negative of to facilitate
moving for the remaining vertices. When partition is
cleared of vertices, all partitions with a number greater than are shifted left by one to close the
gap in the numbering.

is
used as a penalty for moves that increase the weight of the target partition
beyond . The penalty depends on the size of
the target partition and .
For , where , the minimum penalty is .

The finalization algorithm projects to and
takes care of any partition where . Recall from Section 6.2 that for each is
an approximation of the actual number of bytes that the representation of the
corresponding vertex in will
use on the disk. The finalization algorithm works with the actual number of
bytes (see next section).

The treatment of partitions that are too
large to fit in a block is based on the change in costs and
, with priority being given to the
latter. The algorithm is similarly complex as the refinement algorithm and will
not be discussed in detail here. We refer the interested reader to finalize.cpp.

Every block consists of a fixed size area,
followed by a data area, followed by a header area. Free space accumulates
between the data area and the header area. The fixed size area stores
information about the contents of a block and occupies bytes.

Each vertex has an associated header that
occupies number of VARCHAR labels in the schema definition) bytes.
Each header in a block has a unique slot number. The header adjacent to the end
of the block has slot number 0. The header adjacent to the header in slot 0 has
slot number 1, and so on. The slot numbers determine the order of vertices in
the data area. The data for the vertex whose header is in slot 0 is stored
adjacent to the fixed size area. If the block is completely full, the header
with the highest slot number is stored adjacent to its vertex data.

Vertex data is stored in the following
order: fixed size labels, VARCHAR labels, edges to vertices
in the same block (“internal edges”), edges to vertices in a different block
(“external edges”). A header consists of two-byte pointers into the data area.
Without a VARCHAR label, the first pointer marks the beginning of the fixed size
labels, the second marks the beginning of the external edges list, and the
third marks the end of the external edges list. The internal edges list begins
right after the fixed length labels and ends right before the beginning of the
external edges list.

Internal edges are encoded as header slot
numbers and use either one or two bytes of storage. Internal edges have the
same size in all blocks. The size is decided in the finalization algorithm. If
less than five in thousand blocks contain more than 256 vertices, one byte is
used, otherwise two bytes are used. With one-byte internal edges, every block
can store up to 256 vertices. With two-byte internal edges, every block can
store up to 65,536 vertices. The finalization algorithm moves vertices out of
blocks that contain too many vertices. The finalization algorithm also sets a
number of function pointers to avoid having to repeatedly test for the size of
internal edges.

External edges are encoded as global vertex
identifiers (GIDs). A GID is an unsigned four-byte integer that encodes
a block number and a header slot number. G-Store does not need an index. A GID
is sufficient to find a vertex on the disk in constant time.

Each vertex has a unique GID: Let be the largest number of
vertices stored in any block. Let be , and let be . The lowest bits of any GID are used to
encode the header slot, the remaining bits are used to encode the block number.
Thanks to C++’s bitwise operators (&, »), both numbers can be extracted efficiently.

For instance, if , then and
. In this setup, the vertex with GID
648,731 is stored in block 1267, header slot 27:

Both the internal edges lists and the
external edges lists are sorted in increasing order. G-Store’s query engine
exploits this in various ways. For instance, by implementing cycle detection
with binary search.

G-Store creates three files in its working
directory that might be of interest. These files can be deleted.

• _gidmap.g stores the mapping from to GIDs. Line contains the GID for vertex .

• _parts.g stores function . Line contains the partition number for
vertex . Based on this
file, G-Store can recreate the disk representation for the input graph without
having to rerun the storage algorithm. See Section 7.2).

• stats.g stores various statistics of the placement, both per block and on an
aggregated level.