HiVE: Hierarchical Visualization Expression language

HiVE is a conceptual description of an information graphic. It describes with how data variables map onto visual variables to form a data visualization design without prescribing the precise appearance or aesthetics of the graphic.

HiVE is a compact human- and computer-readable language that has potential applications in visual exploratory data analysis, asynchronous collaboration and logging visualisation interactions. Since HiVE does not describe the precise appearance and aesthetics of the graphics, multiple realisations from the same HiVE expression are possible. This allows various HiVE-compliant clients to be designed for different users, tasks and devices.

HiDE can interpret HiVE expressions and generate information graphics from these. It also offers a GUI that lets graphics to be interactively built and modified. These can then be exported as HiVE through the clipboard or via Twitter. HiDE currently supports a subset of HiVE and is still under active development.

How it works

Figure 1

HiVE describes hierarchical visualisations in which variable values are used to condition the data above them in the hierarchy. This approach suits trellis plots, scatterplot metrices and small-multiples. This is achieved by constructing a hierarchy of conditioning variables. In Figure 1, these are $type and $year, variables which refer to a dataset of housing sales in London. The tree structure in Figure 1 illustrates that above the root are four branches which refer to all four housing types (Det, Flat, Semi and Ter). These then condition the data at the next level of the hierarchy. In the branch containing "Flat", all yearly data refer to sales of flats. A graphic representation of this is shown above the tree representation, in which the space is split by the four housing type values, which condition the data contained within these.

Variable or constants that control order (position), size, colour, shape and layout can be specified for any hierarchical level.

Syntax

HiVE contains two types of expression:

State (preceded with 's'): describes the state of a graphic.

Operator (preceded with 'o'): describes a change in the state of a graphic.

Operators and states contain parameters, usually (but not always) of the form:

For states: (path,var1,var2,var3...)

For operators: (path,level,var1)

where:

path: the subtree to which the expression applies (/ for the whole tree;)

level: the numeric level at which the operator applies (operators only). Numbering starts at 1, relative to the path.

var: variable or constant, in the order of the hierarchy, relative to the path

Where multiple variables and/or constants are required for a single level, these are grouped using square brackets, e.g. (path,var1,[var2,var3],var4) which would refer to three hierarchical levels, the second of which is associated with two variables.

Figure 1 summarises a dataset of property transactions in London. The size of rectangles indicates the number of sales and the colour of rectangles indicates the average price. Results are shown by year for each type of property.

Variables: Variables must only contain alpha-numeric characters and cannot contain spaces. By convention, they start with a lowercase letter and employ intercapping. They are preceded with $. Examples:

$year

$yearsSince1990

Hierarchy: The sHier state expression describes the hierarchy of data variables. For the graphic above, they are type of property then year of sale. Variables are preceded with a dollar in HiVE, so the expression is:

sHier(/,$type,$year);

Paths: The values of each variable form a tree as illustrated above. A path uniquely identifies an element in the tree by taking all the variable values from the tree's root, separated by slashes. Asterisk wildcards can be used. Paths identify sub-trees (the tree from the element to the leaves) to which HiVE expression applies. To apply a HiVE expression to the whole tree, the path simply consists of a slash:

/: The entire tree (the tree starting from the root).

/Semi/: The subtree starting from the Semi branch (i.e. just the data associated with semi-detached housing)

/Semi/2003/: Just the data associated with semi-detached housing in 2003 (since this is a leaf node).

/*/2002/: All the 2002 data regardless of housing types (all of which are highlighted in the data graphic).

Expressions are separated by semicolons (;).

Reference

sHier/oHier

sHier(path,var1,var2,...)

oHier(path,level,var)

Assigns conditioning variables to the conditioning hierarchy to path's subtree. If there a variable already exists, this is replaced.

SF, CA, TM and PA accept one conditioning variable.

oInsert

oInsert(path,level,var)

Operator only. Inserts a variable into the conditioning hierarchy in path's subtree at level (relative to the subtree base, numbered from 1).

oCut

oCut(path,level)

Operator only. Removes a variable from the conditioning hierarchy in path's subtree at level (relative to the subtree base, numbered from 1).

oSwap

oCut(path,level1,level2)

Operator only. Swaps the two variables from the conditioning hierarchy in path's subtree at levels level1 and level2 (relative to the subtree base, numbered from 1).

sOrder/oOrder

sOrder(path,var1,var2,...)

oOrder(path,level,var1)

Assigns variables used to order or position elements (depending on layout) to hierarchical levels in the path's subtree. Different layouts will interpret these in different ways.

One variable with CA layout: 1D position.

Two variables with CA layout: 2D position (x,y).

One variable with SF layout: 1D order.

Two variables with SF layout: 2D order.

One variables with AN layout: 1D order (in time).

Any number of variables with PA layout.

sSize/oSize

sSize(path,var1,var2,...)

oSize(path,level,var1)

Assigns variables used to size elements to hierarchical levels in the path's subtree. Different layouts will interpret these in different ways.

One variable with CA layout: area.

Two variables with CA layout: width and height (w,h).

One variable with SF layout: proportion of available area.

Two variables with SF layout: width and height (w,h), but unlike SF layouts with one size (area) layouts are unlikely to be space-filling. Can be used to produce bar charts.

Has no effect for AN layouts

sShape/oShape

sShape(path,shp1,shp2,...)

oShape(path,level,shp)

Assigns shape variables or constants for elements in hierarchical levels in the path's subtree.

a variable: choropleth maps can be produced by using a variable corresponding to an area geometry.

sColor/oColor

sColor(path,var1,var2,...)

oColor(path,level,var)

Assigns variables to control the colour of elements at the hierarchical levels in the path's subtree. NULL values are given to level which do not show a colour.

sLayout/oLayout

sLayout(path,lay1,lay2,...)

oLayout(path,level,lay)

Assigns layout constants to the hierarchical levels in the path's subtree. Each layout controls how the order, size and shape values are interpreted.

We use the following layout constants:

SF: Cartesian space-filling.
Space-filling (and non-overlapping) layout which fills the space depending on the number of the appearance variables/constants for a particular level. Treemaps, mosaic plots and space-filling cartograms may result.

CA: Cartesian.
Uses absolute positioning provides by the order and is used for scatter plots, bar plots and maps. The rank order number is used for categorical variables.

AN: animation; laid out in time.
At any instance in time, only data corresponding to one variable value is shown at once. oFocus with the SL (select) constant is used to scroll to a specific variable value (e.g. oFocus(/May/,1,SL);

PA: parallel plot.
PA lays out multiple variables on parallel axes. Points are typically joined to form parallel coordinates plots.

sFocus/oFocus

sFocus(path,const)

oFocus(path,level,const)

Takes the path of an element/elements to focus on. The constant const controls the type of focusing.

ZM (zoom)

Geometrical focus by reprojecting representation so that the focused objects occupy the screen space. 'zooming' of an imaginary camera on the focussed data is likely
to be the most common form of this type of focus.

HL (highlight)

Some form of symbolic highlighting of the focussed objects such as emboldening, or hue change.

SL (select)

Selection of the focussed objects, and by implication removal from view of the non-selected items. This allows typical filtering operations to be described. For
animated sequences, the select focus can be used to move to a particular frame in a sequence.