Open data – experience needed

The open data movement is proceeding apace with more and more development data being made available publicly and better tools to manipulate and visualize it. The World Bank now makes all its development data available for free and allows datasets to be easily accessed through its API. More and more development agencies are now joining IATI and making their aid spending – and soon their project documents available in a standard format. The Center for Global Development now publish the datasets and methods used for all papers they produce so that the results can be independently verified. Lots of NGOs, social enterprises and groups are crowdsourcing data directly from communities and individuals and making it publicly available whether it be Ushahidi deployments or community mapping. Soon we will have volumes and types of data not previously seen available to anyone to use and analyze.

This is a very good thing.

But, an interesting aspect of this is that while you might be tempted to conclude that data is now a more valuable resource than individual knowledge – I think it is actually the reverse.

Although anyone can download a dataset, manipulate it and create visualizations from it – not everyone has the skills to do it properly. Analyzing data, making sense of it and knowing how to use it to inform decision making is a specialized skill – and not one that everyone masters. AS I mentioned in my previous blog – some kinds of analysis require modelling and other techniques which while they can be automated – need to be properly understood to be used properly. Similarly knowledge of the data sources, reliability and the context and important to be able interpret the data correctly.

As more data is available, this specialized skill will be in increasing demand, and the work of those individuals and organizations who can do this will be at a premium. At present there are just not enough people with this skills around, and it will take time for people to be trained in them.

Similarly as data becomes more readily available, this also puts a premium on the types of knowledge that cannot easily be boiled down into data points – areas such as experience, social networks and interpersonal skills (Or as Einstein put it “Everything that can be counted does not necessarily count; everything that counts cannot necessarily be counted.”). Even interpretation of data and turning that into politically feasible policy recommendations requires not only technocal knowledge but also experience and judgement.

In a way the benefit of open data is that it frees up time and effort spend just trying to collect or get access to data, and allow us to spend more time analyzing, interpreting, thinking and ultimately doing – and those people and organizations who are better equipped to do these tasks will be the ones that will prosper.

One potential negative side effect of opening up data to all, is that there will be a boom in poorly done, misleading secondary analyses and attractive but inaccurate data visualizations, and conclusions will be drawn and decisions taken that will be based on faulty analysis. On the other hand these analyses will be able to be reproduced, checked and corrected or countered by others. In the shorter term instead of getting no data and inf0rmed analysis on a topic we might instead get multiple analyses with different conclusions competing with each other. But the benefits of this debate and self-correction mechanisms mean that in the long run those who analyze data will also be more accountable for what they do and that reputations of the analysts will be built which help the better analyses to rise above the poorer ones.

And it’s important to remember that even experts make mistakes – but these can now be corrected by “The crowd”. And in this case the crowd isn’t the general public – but rather those who have the required experience and technical skills – but who are not sitting in the organization which collected or produced the data in the first place. This way more expert eyes on a data set can both produce new analyses and validate those that have already been produced.

One point I want to push further is where you mention the Benefit of open data is that it requires little effort to obtain. Unfortunately you cannot always get on with the analysis straight away as formats and standards can vary. The Preparation phase can take a considerable amount of effort before you can even get around to Analysis. For example it is difficult to correspond automatically between Main Street in one source and Main St. in another. Using Linked Open Data is one way of getting around these issues, and permitting an almost instant start with Analysis, but unfortunately only a fraction of Open Data is currently being published as Linked Data.

I do look forward though to see what processes spring up around data analysis to help log and validate the steps used to obtain any given result set. As people find ways to express their means of working, lets hope that will guide others to use similar suitable methods – in the long run this will help educate the non-statisticians among us in how to make the best sense of all this data deluge.

Ever more data is available, but very difficult to access for those who may benefit from understanding it. I notice all sorts of ‘knowledge brokers’ springing up that jump into such opportunities for meaningful work. Here the need would be for analysists who can broker between the data and user groups.

I am wondering if there are brokering initiative like this emerging in the M&E profession? Many development organisations still have their own M&E department, but with dwindling funds for AID this will soon become a luxury. Pulling resources and outsourcing then become interesting options. Clever analysts might want to seize the opportunity now! And clever investors/donors might want to try incentivising the emergence of ‘free analysts’.

Convergence of knowledge brokers is begining, for instance in the climate change community, where there are just too many portals doing too much of the same for too few users. Fascinating to follow such processes, which always seem to have just a few driven individuals at the heart of it.

@lucia – thanks for your comment. For the M&E convergence you are talking about one concrete step would be pooled or shared rosters – but getting the various big players to agree to do this would probably be quite challenging. Let’s hope they can get together to see the value of doing this. Professional evaluation associations could possibly play a strong role in this.

I think the issue of too many portals on a specific topic will worsen in the shorter term as a result of open data as it will be relatively easier for people to make a portal using other people’s data. In the longer term better portals (more comprehensive, higher credibility, more timely, better designed) will become more popular driving out others who can’t do the job as well. It will be interesting to see whether this happens as a result of competition or co-ordination or both. It will also be interesting to see whether new brokers emerge or whether in the end the major existing information providers continue to dominate.