About Me

Web person at the Imperial War Museum, just completed PhD about digital sustainability in museums (the original motivation for this blog was as my research diary).
Posting occasionally, and usually museum tech stuff but prone to stray. I welcome comments if you want to take anything further.
These are my opinions and should not be attributed to my employer or anyone else (unless they thought of them too).
Twitter: @jottevanger

Tuesday, August 14, 2007

Moaning about stats, coz I hate them, basically

We have been having a very stimulating debate on the whole "what's the point of (automated)web stats?" question (by which I mean to exclude data gathered through surveys, focus groups etc.) . I guess I should acknowledge from the off the my problem is that I'm lazy, so since doing anything useful with web stats involves some work and I'd rather be doing something else, I don't do enough with them. Consequently I subconsciously look for a reason not to use them that doesn't require acknowledging my lazy-arseness, and such reasons are abundant. However in truth I know that there is utility to be found in server logs or client-based methods or whatever you choose to employ, as long as you look deeply enough. The main issue I have is with using simplistic figures as KPIs that are meant somehow to make results comparable across projects and institutions. I still think that's stupid.

Of course, if I get what I wish for, there's going to be a whole load more work in the evaluation process than churning out some stats, which isn't ideal for a lazy sod, but it's not really the effort that bothers me, it's the pointlessness. Worse than that, it's the potential to distract you from what really matters, taking you in the direction of what doesn't, simply because that's what's measured. Happily we're in a sector where at least people don't die if you do this sort of idiot target-chasing thing, unlike health, but still, it's not good.

So below are e-mails relating to a forthcoming project to evaluate web stats methods in the London Hub: their use at present and recommendations for future practice. I wrote a response to the whole idea, which is being shepherded by the estimable Dylan Edgar (formerly of the Scottish Museums Council, and now ICT co-ordinator of the London Hub). Dylan wrote back with a comprehensive response, for which I'm most grateful.

[I wrote:]

If you stand in the sea in a pair of wellies waiting to see how quickly they fill up, in the hope that this will tell you how quickly sea levels are changing in response to global warming, you might learn something. However it will probably be more to do with what the tide is doing, how big the waves are, whether any large boats have passed by or whether you're generally unlucky and keep falling over. If your friend is doing the same, you can compare your results better if your wellies are the same height, but perhaps they're standing on a rock or by chance got hit by a large piece of chop.

Web stats tell you a similar story about how successful your web offering is in achieving your mission-relevant goals. If we have the same software as our peers, it's like having the same wellies - a small improvement but still measuring a proxy that is quite some way off representing what we're really interested in. I know we all recognise this, and I know too that we all have to, firstly, give our funders what they ask for in terms of KPIs, and secondly, be able to lean on some kind of indicator to see how well we're doing. But I think it would be much more interesting and much more useful if we were to explore means other than site stats as indicators of our impact and success relative to mission. Of course it would be good in a pedantic, mechanical way if we used a common platform to record and "analyse" these obligatory KPIs, but it's like polishing turnips (sic). I'd love to see us making an argument to MLA (and above that to the DCMS) to develop KPIs that actually mean something - and these may be different from institution to institution and project to project. They may be amenable. You'll know the debates going on about cultural/social value at the moment (Tessa Jowell, John Holden, Peter Cannon-Brookes, Jim Collins and others), so perhaps this is a cue to start prodding the funders about finding better ways of evaluating success - ones that might not resolve into easy figures but that keeps us focussing on what matters.

Having said all that, this is clearly an area that is thought by the Hub and its partners to be important and I certainly don't know all the ways in which people use our stats and may find them useful. Right now, though, what we are collecting is (as far as I can tell) chiefly used for reporting. We in MST look at them sometimes to give us a clue about what parts of the site are doing well, what search terms seem to drive traffic to us, who's linking to us etc., but we look through sceptical lenses and interpret everything we see with a lot of knowledge that simply isn't embodied in those stats, and isn't available to funders. In that sense they can be a useful tool for making improvements, but not really for demonstrating success except in the most clumsy, questionable way. Similarly, if the cows are lying down you might want to think about taking a mac with you on a walk, but if you were making a weather report you would only say "it rained today" if you saw it raining.

I should just expand on why stats are becoming ever more deficient PIs. Our digital resources aren't only used by our website visitors. Our data are taken from feeds and reused on other websites (for which we cannot access the stats), our images are taken and put into personal sites and blogs, we are bookmarked in de.icio.us and events taken from our pages and added to personal Google Calendars. As the semanticised web inches forward, more and more of our stuff moves into places where we aren't measuring it (and can't). The OAI data handed over to the PNDS is part of a Hub (and MoL) job of work, but if they were to decide to open up their repositories for other parties to build applications with, how do we measure the impact of our data? Not through measuring browser-based visits to our websites. We have started pilot work with Simulacra that encourages teachers to build interactives from our content, but whilst we will get stats from Simulacra we can't assess how good these interactives are, how much of their content is "ours" (they can combine all sorts of material), how they're used in the classroom once they're downloaded, and so on. And if we decide to put some curatorial time into creating great content on Wikipedia, how do we know if it's "successful" in the sense of contributing to our strategic goals? Not from web stats. But if we can't measure success like that, should we accept that the whole idea is wrong? If I make a KML file and upload it to Google, or even host it on our server, how do I know how much use someone has had from it by downloading it once and viewing it perhaps many times in their local installation of Google Earth? If we can't know, or can't prove it with hits and visits data, should we not do it? I'd say yes, we should do it, but we need some new ways of guesstimating how successful it is. I want to be able to do things that can't be demonstrated to have had a great impact through the current KPIs (or perhaps through any means). I think this means moving beyond crude web stats, as well as thinking more about what those stats mean.

I don't think the project is wrong, but I do think it's perhaps putting a lot of effort into improving something that is never going to be that useful, whereas it could be about making the case for useful KPIs and developing new techniques for assessing them. I note that there are some references to other tools in the brief (surveys etc.), which is good. Maybe we can slant it more in this direction, but keeping in mind the reason: not to come up with new ways of measuring usage, but assessing impact relative to mission.

Thanks for getting back to me on this. Of course you're right - web stats are always going to be a rather blunt instrument, and I'm not suggesting that they provide a complete means of establishing the impact or otherwise of our web sites. Throw in Web 2, 3 and the rest of it and it gets even messier of course! I've always been an advocate of taking a more rounded approach to evaluating the impact of online delivery - in fact I commissioned some research back in 2004 which resulted in guidance supporting museums to start doing exactly the kind of thing you're suggesting. This was subsequently embedded into government funding streams for ICT (although not as a KPI): http://www.scottishmuseums.org.uk/areas_of_work/ICT/digitisation.asp

One of the most important recommendations from this work was that museums shouldn't rely on web stats in isolation as a measure of impact. However, they can make a useful contribution to establishing impact if used in conjunction with other methodologies.

Some other work has been done in this area. The EU ran a conference last year which brought experts together from around Europe and set out to look at ways in which cultural heritage bodies can evaluate the impact that their digitised resources are having on people. I was invited along to talk on the subject, and there was some useful discussion on the various different approaches that could be taken as well as looking at definitions of 'impact': http://www.nba.fi/nrg

...however, we didn't get anywhere near to coming up with a consistent measure that could be applied across the board. Incidentally, MLA and DCMS were involved in this conference, so don't be too hard on them - they are well aware of the issues here!

The Hub partners are already doing a lot of this of course, but it's still not formalised as a KPI. I agree that in an ideal world we'd be working towards a more meaningful indicator for establishing the impact of web delivery. However, in reality this is still a very very long way off.

A recent review of a selection of Renaissance KPIs by Morris Hargreaves McIntyre recommended that the one measure relating to web use (visits) remain as is for the foreseeable future, but MLA should "assess need for additional web performance indicators". There's a very good reason for taking such a cautious approach when looking at new indicators, and that is museums' capacity to deliver. We have already found that the current web KPI is collected inconsistently across the country, and even within the regional Hubs themselves. This is mainly because of loose definitions and the range of different systems that museums are using to collect the data.

If we tried to go straight from this situation to working up a more complex set of National indicators that really do set out to measure impact in the round, I don't think that museums would be able to provide meaningful data consistently throughout the sector (at least without a great deal of investment in systems, training, staff etc.). This would be counterproductive because the resulting KPI would probably fail, and confidence in the whole idea of measuring the value of web delivery would be lost, both within government and the wider museum community. This is why I think we need to take a more gradual, staged approach.

Despite their limitations, I do believe that web stats are important. Firstly, they are important politically. For years we have been arguing that the web is an essential delivery route for museums. This is only now being acknowledged by government, who are (quite rightly in my opinion) asking us to justify their investment and tell them just how important it is through this KPI, admittedly in a rather primitive way. It's naive to think that we can continue to sermonise about the importance of the web if we aren't then prepared to stand up and be counted on it when we are asked to. So it's an important opportunity for us to engage and show willing, rather than dismiss out of hand.

Reporting web stats in this way is also important because it shows that it can be done! As you rightly point out, establishing the real impact of online delivery is complex and is by no means an exact science. However by getting this relatively simple measure right, we can demonstrate how museums can deliver consistent data on web use that can be built on towards more meaningful indicators for the future. If we can't or won't do it for a simple KPI, how can we expect to do the more fancy stuff?

As you say though, the stats are limited in what they can tell us in isolation. However analysis is becoming more sophisticated, and if used correctly stats can provide us with a usable indication of how much the different aspects of our sites are being used, and even something of how they are being used and who is using them. So I don't think this is worthless information by any means, as long as we accept that there are limits to what it can tell us and there's always going to be a certain level of inaccuracy involved (that everyone is going to be subject to of course, which helps). But then what data of this kind is ever 100% accurate? I would say that web stats are on a par with gallery visitor numbers, which are often collated automatically and don't tell us anything about visitor experience or the museum's ultimate impact on the people coming through the door. However, as a sector we've been happy to allow this to become accepted currency and have supplied the data for years now in return for government funding. So why not web stats as well? If anything, they tell us more about our users than automated gallery counters.

In this project we want to look at how the London Hub collects data for the existing KPI. We also want to build on this by establishing a more detailed set of quantitative measures that will provide more detail on the extent to which the sites are being used, and that a small group of four museums can collect in a consistent and meaningful way. This will be a challenge in itself. More importantly in my view, we also want to look at how the Hub can be using this information, and ultimately why we're collecting it. As you say much of the KPI data goes into a black hole, which doesn't help because museums don't necessarily see the relevance of the data that they have collected. So by producing practical recommendations on how the Hub can be actively sharing and using this information in the future, we make it more relevant and it becomes a more helpful resource.

This project isn't going on in isolation though. It's being supplemented by more qualitative work this financial year, profiling the Hub's online audiences, understanding their requirements and expectations, and establishing the extent to which these are currently being met. Importantly, this work will also provide a shared methodology for the Hub to continue this process in the coming years. This combination of quantitative and qualitative work will, I believe, gradually build up a better overview of the impact that the Hub's collective online product is having on users.

My own view is that we need to take it slowly, and better methodologies for establishing impact will emerge as we become more comfortable with the idea and as we start implementing this kind of initiative. The whole concept is still in its infancy, and we need to be careful to bring museums with us rather than imposing something that will be unworkable in the short-term and jeopardise the work that's already happened.

[There wasn't much need for further debate in a sense, since we both had pretty clear positions on the utility of stats and Dylan's case is strong for pursuing them in the context of this project, but I my final thoughts were:]

Thanks for the comprehensive response. I know I was very negative about web stats when they're not in fact the devil's spawn, and used wisely can tell us useful things. You've made a lot of very valid points (not least about visitor figures), and you clearly have a strategy for moving towards a situation where we have to rely on stats less whilst working with the political reality. Unfortunately the sophisticated analysis that can make these stats less blunt and more insightful isn't performed before they are given to the funders that use them as KPIs. Nor could it be, because such analyses are case-specific and context-dependent, so we keep on working to deliver targets that at worst might seriously divert us from doing what is really important, simply because we aren't allowed to demonstrate excellence in any other way. So whilst there's more that can be done with stats, we aren't allowed to do so where it counts, so they remain most useful internally (where we can try to use all the knowledge that can be squeezed out) rather than externally, where they cherry-pick the most dessicated fruit.If MLA, DCMS et al really are interested in working towards better measures, and I'm sure you're right that they are, it would be nice to think that this project could in some small way start to explore that shift and demonstrate to them that we as museums are also keen to look at what really matters. Having said that, I accept what you say, that to move to a whole new national system of KPIs would be a huge step and not one to take in one go. But I wish we could, at least, start discussing with funders the possibility of experimenting with more interesting measures on a project-by-project basis.

[and then Dylan responded with more clarifications and some thoughts on the direction of the London Hub. Some of it is not really for public consumption but here's a bit]

Funding organisations like HLF, MLA etc have always expected museums to have clear plans for how they will evaluate the success of their projects. It's the job of the applicant to make the case for how they will do this - it's part of the application process and a good project needs to have good evaluation built in from the outset. Having said that though, funding organisations do have a role to play in providing guidance to applicants on the kind of thing they are expecting.

The way we approached this in Scotland was to come up with high-level guidelines for museums on how to start evaluating the impact of web resources, beyond using stats alone (see previous email). The guidance was quite generic, but designed to get people thinking about how they can establish the overall success or otherwise of their projects. This broke down into two areas - process and outcomes (i.e. impact).

One of the main reasons for doing this was to integrate it into our funding streams, which was starting to pay dividends by the time I moved down here. The way this worked in practice was if a museum came to us with an application for ICT funding they were expected to show us how they would integrate the recommendations in practice into their project, and they would be held to account if they didn't end up doing what they said they would...

The challenge is making the link clear to museums who don't necessarily see the value of evaluation, and just view it as another hoop to jump through.

Demonstrate the impact of web site = providing clear case to funders = more funding further down the line...

Many thanks to Dylan for the debate and for permission to publish that which isn't libellous! :-)