Hive plots — for the impatient

The hive plot is a rational visualization method for drawing networks. Nodes are mapped to and positioned on radially distributed linear axes — this mapping is based on network structural properties. Edges are drawn as curved links. Simple and interpretable.

The purpose of the hive plot is to establish a new baseline for visualization of large networks — a method that is both general and tunable and useful as a starting point in visually exploring network structure.

Hive plots give the reader a passing chance to quantitatively understand important aspects of a network's structure. Unlike hairballs (Network visualizations: how to tame the complexity Paweł Widera describes may layout options), hive plots are excellent at managing the visual complexity arising from large number of edges and exposing both trends and outlier patterns in network structure.

Software

Hive plots — a longer introduction

Network visualizations are notoriously difficult to
interpret. Their canonical representation in a visual form has earned
the moniker hairball, and you can probably guess why. If you are unfamiliar with the hairball, or doubt their prevalence in biological sicences, explore what is always a good source of network hairballs: study of yeast and systems biology.

You can already guess that nothing with the name hairball can truly be useful. In general, they are not. These views are at best accidentally informative, and cannot be relied upon to consistently reveal meaningful patterns.

Interpreting hairballs is made difficult by several significant shortcomings

their form is determined by layout algorithms, which typically cannot be adjusted to address a user's specific questions.

many layout algorithms are stochastic and can produce many different layouts of the same network

layouts of the same network created by different algorithms cannot be easily compared

the layout is brittle — it can be disproportionately affected by very small changes in a network

layouts of different networks created by the same algorithm cannot be easily compared

To rationally visualize networks, we introduce the hive
plot. The hive plot is based on meaningful network
properties, which can be selected to address a specific question.

Nodes are assigned to one of three (or more) axes, which
may be divided into segments. Nodes are ordered on a segment based on
properties such as connectivity, density, centrality or quantitative
annotation (e.g. gene expression). The user is free to choose whatever
rules fit their data and visualization requirements. Edges are drawn
as Bezier curves, which can be annotated with color, thickness or
label to communicate additional information.

Hive plots make it possible to assess network structure because they
are founded on network properties, not on aesthetic
layout. Visualizations of two networks are directly comparable. Importantly, hive plots are perceptually uniform — differences in hive plots are proportional to differences in underlying networks. This makes it possible to use hive plots to assess network similarity.

If connections are drawn as ribbons, the hive plot can demonstrate
ratios between elements of normalized quantities (e.g. comparison of
sizes of annotation categories in different genomes).

the problem

Conventional network visualization is unsuitable for visual
analytics of large networks. So-called hairballs earn their
moniker by becoming impenetrably complex as your network grows. They
are least effective when visualization is most needed — for
large networks.

To understand networks visually, we need to see their structure
directly, not by proxy of a layout algorithm based on aesthetics.

Hairballs turn complex data into visualizations that are just as
complex, or even more so. Hairballs can even seduce us to believe that they carry a high
information value. But, just because they look complex does not mean
that they can communicate complex information. Hairballs are
the junk food of network visualization — they have very low
nutritional value, leaving the user hungry.

In a hairball, data is subordinate to layout — node and edge
positions and lengths depend as much on the layout algorithm (of
which there
are many), as on the data. The effect of layout rules is difficult
to predict, making direct comparisons of these visualizations
impossible. For example, imagine trying to compare two scatter plots
in which the ordinality of the scales were altered
(e.g.x = 1, 2, 3, ... in one and x = 3, 1, 2,
... in the other).

As a result, a great deal of detail about the structure of a
network is irretrievably lost in a hairball and any emergent patterns
may be either real (reflected in the data) or accidental (artefact of
the layout). If you doubt that such artefacts can appear in the literature, consider the figure below from Rual JF, Venkatesan K, Hao T, et al. Towards a proteome-scale map of the human protein-protein inter- action network. Nature 2005;437(7062):1173–8. As indicated in the figure's legend, all notable features in the network visualization are artefacts of the layout algorithm.

The central drawback of hairball-based visualization is that they cannot
be tuned to address a user's specific questions. Implicit in the
hairball approach is the assumption that all questions that the user
wishes to answer are addressable by the layout algorithm. When this assumption
is wrong (as it usually is), the user is left to construct another
hairball, based on another layout algorithm, to attempt to answer the
unanswered questions. Unfortunately, the set of questions answerable
by a hairball is very difficult to determine — no
such list exists because of the complex interplay of data and layout.

The preparation of the kind of visualizations shown above is an
effort of both labour and love. Specific layouts work for one
network, but are not effective in general. There are exceptions,
however. Some network families are ideally suited for a layout
algorithm (e.g. y.layout.router in first panel above).

Before describing the hive plot method in detail, just to assure
you that I love network art I've taken the hairballs of
a variety of network
communities and generated a "spatter profile".

Informative? Somewhat. Juicy? Absolutely.

a solution — the hive plot

The hive plot attempt to address the shortcomings of the conventional
hairball layout. Because hive plots can be tuned, they can identify meaningful structural components of a network.
hive plots are ideal for detecting emerging patterns in your network's structure — the method shows you the entire network and your brain's pattern matching facilities do the rest.

The hive plot is itself founded on a layout algorithm. However, its output is not based on aesthetics but network structure. In this sense, the layout is rational — it depends on network features that you care about (e.g. connectivity).

In a hive plot, nodes are constrained to linear axes and edges are drawn as curves. Node-to-axis assignment and node-on-axis position are determined solely by network structure, node, edge annotation, or any other meaningful properties of the network. In other words, layout rules are defined by you based on properties that are meaningful to you. These rules form a mapping between structure and layout can be as simple or complex as you wish.

Importantly, there is no aesthetic magic sauce added to the layout. If the layout shows a pattern, you can be sure it is due to structure in the underlying data and not on the layout algorithm's interpretation of how the data should be shown.

The axis and node mapping is arbitrary, and this may sound very abstract at this point. To make things concrete, there are certain simple recipes that are extremely useful in most cases (see Krzywinski et al.).

Mapping to axis (A, in figure below), position (B) and color (C) can be a function of sink/source status (for tripartite networks, this axis categorization is natural), node degree, neighbour degree, centralization, density, heterogeneity, topological overlap (there are numerous properties to choose from), or node/edge annotation (e.g. a node could be associated with a classification, or an edge may have a weight).

Interpretation of the linear visualization is easy (once you get the hang of it). Direct visual comparison of hive plots is possible — a valuable and distinguishing feature of hive plots. For example, consider the eight hairballs below — they are layouts of the same network. It is not possible to tell that this is in fact the same network.

If this causes you no concern, consider that simply rotating and/or flipping the same hairball can appear indistinguishable from changing the underlying data.

communicating hive plot rules to your audience

Consider a typical hairball. Now think of how you'd describe to
someone the method used to create it. Chances are, even you don't know
the full details of the layout algorithm. And even if you did, you
could not necessarily relate how specific network structures would
translate into output.

Even if you did describe how the hairball was created (you'd
probably name the layout algorithm), it would be very likely that the
description would not contain any phrases that relate to the structure
of the network (which is, after all, what your audience is keen on).

On the other hand, it is easy to describe how a hive plot was created,
and likewise easy for your audience to understand, because you can use
terms relevant to the questions your visualization is designed to
address. Instead of saying "I used a force-directed approach to
place the nodes.", which does not help your audience relate to the
network's structure, you can say "I put all the sink nodes on this
axis and ordered them by absolute connectivity.", which is
immediately meaningful.

hive plots for undirected networks

Hive plots work equally well on both directed and undirected
networks. In undirected networks, edges don't have a
direction and therefore there is no distinction between sinks (nodes
with in edges) and sources (nodes with out edges). In the example
below, the node degree (number of edges) is used to map nodes to axes.

example

A recent PNAS paper [1], Yan et al. compared the E. coli gene regulatory network with that of the function calls in the Linux kernel. As you can see, the hairballs of these networks reveal no structural information. Other that the Linux network is larger, the hairballs offer no other information.

original visualization

Nodes on the axes were not ordered. Network edges between the top and bottom layer cross the middle layer axis and complicate the view. For example, it is not immediately obvious that there is almost no communication in the first two layers of the E. coli network.

hive plots

The linear layouts clearly demonstrate differences between these networks. For details about the linear layouts of these two networks, refer to slides in the general introduction.

Hover to pause, click to advance

applications

hive panels

Networks are complex data structures and it is rare that they can
be effectively presented as a single image. The hive plot concept can
be extended to hive panels — multiple and independent
hive panel — a matrix of hive plots which independently communicate different structural properties of a network
visual signatures of a network, each based on a different combination
of structural properties to interrogate different aspects of network structure.

Hairballs cannot be used for this purpose because they are not
sensitive to patterns in structural attributes, cannot be directly
compared, and scale poorly.

A single hive plot (deg vs cc) from hive panels of four organisms and a random network are shown below the human panel to demonstrate differences in connectivity and clustering coefficient. Shown also are organic layouts of the locale of the most connected node formed by its neighbors and next-nearest neighbors, the region of the network highlighted in the hive plots. Though it is not possible to confidently conclude anything from the organic layouts, the hive plots clearly communicate differences in a quantitative manner. For example, the most connected node in the human set (A) is more cliquey (large cc) than E. coli (C) and yeast (D) and is connected to nodes which themselves are uniformly cliquey (B). These and other patterns can be quickly identified within the panel.

multi-axis network comparison

Hive plots can be used to compare multiple networks. In this application, the nodes of each network are assigned to different axes and links connect nodes that are shared between the networks (or using some other node similarity criteria).

Comparing four networks requires 6 axes, if the plot area is to be fully used.

layered network

Consider a network which contains multiple and independent layers of connections. How do the layers of connectivity relate?

By creating a hive plot in which the axis/position mapping is done using one layer, with edges of another layer drawn, correlation can be assessed visually.

only for networks?

No.

hive plots for alignments

Hive plots can be applied to data structures other than networks. The method requires that your data be mappable onto a set of pairwise relationships. For networks, this pairwise relationship is the edge between two nodes. In other circumstances, it can relate two spatial positions (where the axis corresponds to an object with a physical length scale) or two intervals (two axis segments are related, thereby creating a ratio comparison).

Circos is a common method to show genome differences, synteny and alignments. For example, below are shown three comparisons of the ancestral genome of Arabidopsis thaliana with each of three modern genomes of the plan (SN, SL and BA) (Figure 3 from Mandakova T, Joly S, Krzywinski M, Mummenhoff K, Lysak MA (2010). Fast diploidization in close mesopolyploid relatives of Arabidopsis. The Plant cell 22: 2277-2290).

Hive plots make an excellent tool for showing three-way alignments. Below is a hive plot of all three alignments shown above. In this representation, positions on modern genomes that align to the same ancestral genome segment are connected.

hive plots for visualizing ratios — evaluating assembly quality

One variation of the hive plot is a circularly composited stacked bar plot, as shown below. In this example (hires, PDF), each of the three axes support two bar plots (on either side). Ribbons connect two intervals of the same category. For another example, see our VIZBI 2011 poster.

This hive plot provides a visual recipe for assessing the quality of a genomic assembly. An assembly is composed of reads (bottom axis), which are assembled into contigs (right axis). Independently, a reference assembly (left axis) may exist and act as a comparator. Among others, this hive plot answers the following questions:

what fraction of reads are unassembled? 20%

what fraction of reads are unaligned to reference? 30%

what fraction of reference has no read coverage? 2%

what fraction of reference has no contig coverage? 15%

what fraction of reference is constructed by contigs < 200kb? 60%

are there contigs > 200kb? no.

what fraction of contigs are unaligned to the reference? 20%

what fraction of the overall assembly is derived from k=27 assembly? 80%

The benefit of this stacked bar plot layout is that the circular layout is both periodic and has visual weight. This approach is similar to a parallel coordinate plot, except here the plot wraps around.

Multiple panels can be combined to display a very large number of ratios. Below are shown 3 x 3 x 3 (27) comparisons, each with 3 x 8 ratios, for a total of 648 ratios. By categorizing each ratio using a spectral color scheme, patterns can be quickly spotted and interpreted. The image below was created for the EMBO Journal 2011 cover contest.