Inf@Vis!

The digital magazine of InfoVis.net

Web Structure Visualisation

by Juan C. Dürsteler

[message nº 173]

The web, by its very nature as hypertext, shows a structure that can be described by means of graphs. The visualisation of said structure, of crucial importance to understanding the results of web mining, is reduced to that of graph drawing. This is, nevertheless a vast field with many possibilities, some of which are presented in this issue.

Map of Internet. Part of the projectOPTE, representing the nodes and main links of all the webon a macroscopic level as of 11 January 2005.Source: Image as can be seen on the webof the project OPTE

Any web, including more especially the world wide web as a whole, is composed of pages (which we'll call nodes) and links between them (which we'll call arcs). This means that, from a mathematical standpoint, we can consider the structure of any website as a graph. Consequently, in order to represent the structure of a website we simply need to draw a graph.

Simple? I fear that it's not so simple. Graph representation turns out to be such a rich topic that it deserves an international symposium which gathers yearly, the International Symposium on Graph Drawing. We already spoke about graphs and how to draw them in number 137 entitled "Graphs" in a struggle for originality.

Bu we aren't interested here in the different ways of drawing graphs in general but in how their visualisation can help us to understand the structure of a website in order to then take appropriate decisions and implement them.

Consequently, we'll see some of the visualisations that have appeared with this goal. Most of them represent the structure as a hierarchy, obviating the links that go back and forth between pages creating circuits. That way the structure can be depicted as a tree (an acyclic connected graph).

Many of them fall into the category of focus+context representations since they allow the user to see the whole content of the web, while establishing an attention focus (for example in a certain page) which can be accessed in greater detail.

Here you find some of the most promising representations:

ConeTrees:

Conetrees, in their most operative fashion, appeared in the prolific Xerox PARC from the efforts of Robertson, Card and Mackinlay. These are tree representations, typically in 3D, where, starting from a node, their children (those nodes linked to it in the next level) are represented in the base of a cone that has the father as its vertex. Every one of the child nodes becomes in turn in a father node whose children are represented in the same way. The algorithm turns out to be a recursive one where all nodes get the same treatment.

Vertical ConeTree. of 10.000 PARC nodesSource: As can be seen at "Design concepts for learning spatial relationships" by Benelli, Caporali, Rizzo & Rubegni, Universita di Siena.Click on the image to enlarge it

Conetrees are suitable to represent the structure of a web site reasonably well. But there are, at least, two caveats:

Scaling up: when the number of nodes is very big the structure is difficult to perceive since the deeper the level represented the less space is available for the nodes, This gets complicated when you find a lot of pages at those levels.

Occlusion: Some nodes hide other ones.

In order to solve the second problem, the cones can rotate as if they were a carrouselso that you can show those elements that are hidden by others. As you can see in the shadows of the images, a 2D representation would yield the interpenetration of the cones, which is an undesirable feature since it can create confusion.

Another way to solve these problems are disc trees, where the cone is substituted by a disc.

Hyperbolic trees

In this case the structure of the web is represented in 2D or 3D by means of a hyperbolic space. Unlike in euclidean geometry, in hyperbolic geometry you can trace more than one parallel line to another one through any point not contained in that line. Without entering into more detail, it's worth noting that hyperbolic space can be represented in 2D as a circle whose periphery depicts infinity. In this geometry, the closer to the periphery (infinite) the smaller the size of what we are representing.

Hyperbolic geometry: 2D representation of the directory structure of a PC.Source: Screenshot of Inxight's Magnifind demo software.
Click on the image to enlarge it.

Hyperbolic geometry: Spectacular view of a complex directory structure in 3D hyperbolic geometry of a complex tree of directories. Source: As can be seen in the page of Walrus
Click on the image to enlarge it.

These facts enable us to represent graphs in hyperbolic geometry maintaining the properties of focus + context. Things that are closer to the center appear magnified in relation to what lies near the periphery. Taking something to the periphery diminishes its size, without disappearing. The 3D version of hyperbolic plane is a hyperbolic sphere, where the outer surface assimilates infinity.

Tamara Munznerdeveloped widely this type of representation in her PhD thesis, that is worth consulting if you are interested in this topic. The advantage of 3D in hyperbolic geometry is that it's easier to understand and to interact with this kind of representation.

Walrus (that we spoke about in number 43) is probably a tool (now it's free, you can download source code) that has gone further in the representation of hierarchies using hyperbolic geometry. There are also here some occlusion problems, but its elegant representation solution allows you to work interactively with some thousands of nodes with certain ease.

Radial trees (or bulls-eye trees)

A way to highlight the connectivity levels (the number of clicks to reach a particular page) is to lay out the nodes of the graph in concentric circles. The outer circles are those with higher depth in the hierarchy i.e. with higher number of clicks to reach them from the root.

Radial tree: Example available at the InfoVis Cyberinfrastructure website. Interactivity plays here a crucial role in order to make an accurate analysis of the structure
Source: Image as can be seen at InfoVis Cyberinfrastructure website
Click on the image to enlarge it.

Radial tree: One of the stages of the animation that allows the user to study the tree from different standpoints.
Source: Image as can be seen at BAILANDO website

This type of diagram is called radial, circular or bulls-eye diagram. Here the nodes are placed in the circle corresponding to their level, with the root node placed in the centre. Their children occupy the next level and so on.

One of their main difficulties resides in the way of distributing the nodes in each circle. If there are many nodes in an exterior circle while there are very few in inner circles, it's very difficult to avoid the overlapping of the outer ones distributing, at the same time the sectors of the circle in a balanced way that allows the user to distinguish the genealogic lines. For large amount of data the use of space is highly inefficient since there's nothing between the circles.

Conclusion

Visualising the structure of the web is a fundamental step when representing its content and usage by the users, since those features can be shown as coloured layers on top of the structural information. The different types of representations show very powerful aspects like

problems of scaling when the amount of nodes increase above some thousands, depending on the type of representation

interpretation of problems by the users, not always used to these types of representation.

Nonetheless visualising the structure of the web is one of the pillars of the analysis that every webmaster should perform, specially with large web sites, in order to take informed decisions when modifying or just understanding the way the live being that a website is, works.