In English, we use many different words to describe the same basic objects. In one survey, researchers Dieth and Orton explored which words were used for the place where a farmer might keep his cow, depending on where the speaker resided in England. The results include words like byre, shippon, mistall, cow-stable, cow-house, cow-shed, neat-house or beast-house. We see the same situation in visualization, where a two-dimensional chart with data displayed as a collection of points, using one variable for the horizontal axis and one for the vertical, is variously called a scatterplot, a scatter diagram, a scatter graph, a 2D dotplot or even a star field.

There have been a number of attempts to form taxonomies, or categorizations, of visualizations. Most software packages for creating graphics, such as Microsoft Excel focus on the type of graphical element used to display the data and then sub-classify from that. This has one immediate problem in that plots with multiple elements are hard to classify (should we classify a chart with a bars and points as a bar chart, with point additions, or instead classify it as a point char, with bars added?). Other authors have started with the dimensionality of the data (one-dimensional, two-dimensional, etc.) and used that as a basic classification criterion, but that has similar problems.

Visualizations are too numerous, too diverse and too exciting to fit well into a taxonomy that divides and subdivides. In contrast to the evolution of animals and plants, which did occur essentially in a tree-like manner, with branches splitting and sub-splitting, information visualization techniques have been invented more by a compositional approach. We take a polar coordinate system, combine it with bars, and achieve a Rose diagram. We put a network in 3D. We add texture, shape and size mappings to all the above. We split it into panels. This is why a traditional taxonomy of information visualization is doomed to be unsatisfying. It is based on a false analogy with biology and denies the basic process by which visualizations have been created: composition.

Within SPSS we have adopted a different approach â€“ looking at charts and visualizations as a language in which we compose â€œparts of speechâ€ into sentences. This approach was pioneered by Leland Wilkinson in his book The Grammar of Graphics. Consider natural language grammars. A sentence is defined by a number of elements which are connected together using simple rules. A well-formed sentence has a certain structure, but within that structure, you are free to use a wide variety of nouns, verbs, adjectives and the like. In the same way, a visualization can be defined by a collection of â€œparts of graphical speechâ€, so a well-formed visualization will have a structure, but within that structure you are free to substitute a variety of different items for each part of speech. In a language, we can make nonsensical sentences that are well-formed. In the same way, under the graphical grammar, we can define visualizations that are well-formed, but also nonsensical. One reason not to ban such seeming nonsense is that you never know how language is going to change to make something meaningful. A chart that a designer might see no use for today becomes valuable in a unique situation, or for some particular data. â€œThe tasty aged phone whistles a pinkâ€ might be meaningless, but â€œthe sweet young thing sings the bluesâ€ is a useful statement, and grammatically similar. In our grammar-based approach, we have a set of different â€œparts of speechâ€ that we compose:

data â€“ the variables that are to be used.

coordinates â€“ the basic system into which data will be displayed, together with any transformations of the coordinate systems, lik polarization, reflection, etc.

styles â€“ decorations for the graphic that do not affect its basic structure, but modify the final appearance; fonts, default colors, padding and margins, â€¦

The core concept behind our approach is that you should be able to take a chart and modify the language to replace one part by a similar part, and have a well defined and potentially useful result. The result is a system where the limits of what you can display are neither based on how well you can do graphical programming, or how well the computer program you use has implemented a feature, but instead is based simply on combining well-known parts into novel systems.

For the VizML visualization system used in SPSS products, maps are simply another element that can be used within the grammatical formulation.

Although most people consider a map a very different entity from a bar chart, all that really differs between a bar chart and a map of areas like the one included here is that instead of representing a row of data by a bar, we use a polygon (or set of polygons) on a map. Otherwise their properties ought to be the same -- we can apply color, patterns, labels, transparency. We can set a summary statistic when there are multiple values for each polygon to reflect min, max, mean, median, range, or any of the regular sets of items. We can flip, transpose and panel the charts. Essentially, from the grammatical point of view, if you can do it to a bar chart, you can do it to a map. The only limitation is that whereas the sizes of the bars can be set or determined by data, the map polygons cannot, so setting sizes on the map polygons has no effect.

The above chart can be created within SPSS Statistics and Clementine, using the Graphboard Template feature. The template was created in Viz Designer, but you don't need that tool simply to use a template. Following is a chart showing cell-phone ownership on a per-capita basis for countries throughout the world.

At the end of this post is a zip file containing the templates used to build these two charts. Unzip it on your local machine and then, within the Graphboard Template Chooser dialog, click the "Manage" button to import them. Do that once from any application ,and they will be installed and ready to use in all your graphboard-enabled SPSS applications. All the templates only need a variable with the names in them (Illinois, Alaska etc. or Germany, France, etc.) -- color, labels and transparency can be attached using the optional dialog.

If you have a copy of Viz Designer, you can modify and enhance these templates in many ways. If you don't and you're an XML whiz, you could even try opening the templates in your favorite XML editor and modfiying them directly. The worst that'll happen is a useless template that'll fail to load, or draw strange results. Let us know how you get on if you try that method out ...

Cell Phone Ownership Per Person

And here are the templates, ready to download and install ... Map Templates

In my last post I wrote about my new extension command, SPSSINC TRANS.Â That command makes it very easy to apply Python functions to the case data by handling all the data passing, variable creation, etc, so you just have to write one line of Python code to call the function.

I have now posted a substantial rework of the initial beta version.Â As the saying goes, plan to throw one away: you will anyway.Â The difficult part of designing and implementing this command was getting the Python function expression through the SPSS Universal Parser, which doesn't speak Python, and then taking it apart and setting up the requisite connections with the data.

My first version was based on regular expressions to extract the parameters and PASW Statistics variable names.Â That worked well enough for what I originally had in mind, although the re's were a bit complicated.Â But as I explored the sorts of functions that would be useful with this facility, the problem got more complicated.

I wanted to support functions that did not have named parameters for everything.Â The original implementation required function parameters to be specified in the style parm=variable.Â But many of the built-in Python functions only accept positional arguments.

I wanted to support lists as parameters so that a bunch of variables could be passed in as a single parameter.

I wanted to support other more complicated expressions as parameters.

As I thought this over, I realized that instead of my trying to parse Python code, I should let Python do it.Â Python has a compile function that can compile an expression such as a function call.Â This is then evaluated using the eval function.Â Just what I needed.Â So I ripped out all the original code that sort of parsed the function call expression and used compile to set it up.

It took me a little while to get the hang of how to use compile and what it produces - not the best documentation you might find.Â The issues were how to know what to import to make the function call valid and how to figure out which parameter values needed to be satisfied by Statistics variables.Â And a little bit of error handling code to help the user when something isn't right.

Got all that worked out, so now the command is much more general, and the implementation code is shorter and more robust.Â Should have thought of this the first time.Â And because function parameters can now be more general expressions, I axed the ASINTEGER subcommand in favor of just using int(x) in the parameter expression if that capability was needed, which would be infrequent anyway.

Because the code has to pass through the Universal Parser, there are still some expressions that will not work, but you can quote the entire function call expression and be protected from that if needed.

The new version, still considered a beta, is now posted to Developer Central.

So, once again, my hat is off to the Python designers: just about everything in the language is open to use in ordinary program code.Â Just a bit more work on the documentation, please.

In my last blog post I shared some templates that added functionality -- specifically, maps. That is one use of templates; allowing custom features or new and relatively 'untested' features to be used without needing a lot of new user controls, syntax or whatever. A second use is more prosaic, but can be a real time-saver: custom styles.

Styled Mosaic-like Plot

The figure above (click to show full-size version) is an example of how a template can contain not only structural information, but also style choices. The range of possible style options available through templates is wider than that available through standard GUI choices, and the above figure shows some gradient details, as well as re-arranging the legend to be in the middle of the chart and making room for it there.

If you download the template from the zip file linked at the bottom of this post, you can use the following SPSS 17 syntax to use it: