What the interactive components let you do is download a dataset from a Google spreadsheet and then dynamically filter the data within the page.

So for example, over on the F1Datajunkie blog I’ve been posting links to spreadsheets containing timing data from recent Formula One races. What I can now do is run a query on one of the spreadsheets to pull down particular data elements into the web page, and then filter the results within the page using a dynamic control. An example should make that clear (unfortunately, I can’t embed a live demo in this hosted WordPress blog page:-(

Here’s the key code snippet – the ControlWrapper populates the control using the unique data elements found in a specified column (by label) within the downloaded dataset, and is then bound to a chart type which updates when the control is changed:

As well a drop down lists, there is a number range slider control which can be used to set minimum and maximum values of numerical filter, and a string filter that lets you filter data within a column using a particular term (it doesn’t seem to support Boolean search operators though…) Read more about the controls here: Google visualisation API chart controls

Group, which lets you group rows in a table and present and aggregated view of them:

That is, if you have data loaded into a datatable in a web page, you can locally produce summary reports based on that data using the supported group operation?

There’s also a join operation that allows you to merge data from two datatables where there is a commmon column (or at least, common entries in a given column) between the two tables:

What the join command means is that you can merge data from separate queries onto one or more Google spreadsheets within the page.

With all these programming components in place, it means that Google visulisation API support is now comprehensive to do all sorts of interactive visualisations within the page (I’m not sure of any other libraries that offer quite so many tools for wrangling data in the page? (The YUI datatable supports sorting and filtering, but I think that’s about it for data manipulation?)

I guess it also means that you can start to treat a web page as a database containing one or more datatables within it, along with tool support/function calls that allow you to work that database and display the results in a variety of visual ways?! And more than that, you can use interactive graphical components to construct dynamic queries onto the data in a visual way?!

Wouldn’t it be handy if we could treat all the public spreadsheets uploaded to Google docs as queryable tables in a database? Well, it appears that you can do so, at least at an individual spreadsheet level: Introducing the Google Visualization API.

Over the weekend, I started exploring the Google Visualisation API Query Language, which is reminiscent of SQL (if that means anything to you!). This language provides a way of interrogating a data source such as a public online Google spreadsheet and pulling back the results of the query as JSON, CSV, or an HTML table.

Got that? I’ll say it again: the Google Visualisation API Query Language lets you use a Google spreadsheet like a database (in certain respects, at least).

Google query languages are defined on a spreadsheet in the following way:

Although defined, by default, to return JSON data from a query, wrapped in a pre-defined (and fixed?) callback function (google.visualization.Query.setResponse()), it is also possible to display the results of a query as an HTML table (which is “useful”, as the documentation says, “for debugging”). The trick here is to add another argument to the URL: tqx=out:html, so for example a query would now be defined along the lines of:http://spreadsheets.google.com/tq?tqx=out:html&tq=QUERY&key=SPREADSHEET_ID

Using the Guardian datastore’s MPs expenses spreadsheet 2007-8 as an example, we can write quite a wide variety of queries, which I’ll show below in their ‘HTML preview’ form.

(In a ‘real’ situation, you are more likely to retrieve the data as JSON and then process it as an object. Or, as I will also demonstrate, take the results of the query as CSV output (tqx=out:csv rather then tqx=out:html) and pull it directly into a service such as Many Eyes WIkified.)

In the examples, I will just show the unencoded select statement, but the link will be the complete, well-formed link.

So here we go:

show everything – fetch the whole table: select * (in a lot of computer languages, ‘*’ often refers to ‘everything and anything’);

just show some particular columns, but again for everyone: fetch just columns B (surname), C (first name) and I (total additional costs allowance): select B,C,I

only show the names of people who have claimed the maximum additional costs allowance (£23,083): fetch just columns B, C and I where the value in column I is 23083: select B,C,I where I=23083 (column I is the additional costs allowance column);

How many people did claim the maximum additional costs allowance? Select the people who claimed the maximum amount (23083) and count them: select count(I) where I=23083

Many Eyes WIkified is no more…One other trick is to grab a CSV output, rather than an HTML output, and pull it into Many Eyes Wikified, and then visualise it within that environment – so we grab the data (in this case, using select D,sum(I) where I>=0 group byD, i.e. the total amount of additional costs allowance claims by party):

to give this:

and then visualise it in an appropriate way:

So to recap this final case, then, we are running a query on the original spreadsheet that calculates the total additional costs allowance claims per party, and emits the results as CSV. These results are imported into Many Eyes Wikified, and displayed therein.

Now I’m pretty sure that Many Eyes Wikified will continue (how often?) to synch data from a potentially changing data source, which means we should be able to use a similar approach to plot a running total of claims from the Shadow Cabinet Expenses spreadsheet…

…but, at the time of writing at least, it seems as if the publication/privacy settings on that spreadsheet are set such that access via th query language is denied…:-(

Anyway – that was a quick intro to the Google Visualisation API Query Language – so go play… ;-)

PS so what other spreadsheets might make for some interesting queries?

PPS @adrianshort has made a valuable point about how easy it is for a publisher to change the order of rows in a spreadsheet, and hence make a nonsense of your query. (Also, I think the approach I’m taking sort of assumes a simple, regular spreadsheet where row 1 is for headers, then the data, and ideally no other text e.g. in cells below the table describing the data in the table.) So always check… ;-)

PPPS If the first row in the table defines column headings, then there are intervening lines (maybe spaces) before the data starts, putting offset N (where N is a number) will skip that many rows before displaying the data.

Something else I noticed on the order by setting, this can be of the form order by COL asc (to sort in ascending order, which is the default) or order by COL desc ( to sort in descending order).