Back in 2006 as we rolled out the first public draft of the Talis Community Licence, the world of data licensing seemed a simple place. Today, the Open Knowledge Foundation‘s Data Hub contains 3,888 data sets, many of which are explicitly licensed with respect to the Open Definition. But many are still not explicitly licensed. Over at the UK Government, there are 8,619 data sets today, and an assertion that “in general, the data is licensed under the Open Government License.” Too much still isn’t, of course, but they’re getting there. And then there are the many, many more data sets out on the web, not registered with repositories like the Data Hub or data.gov.uk at all.

It simply sets out to assess the relative proportions of data that are not openly licensed, that are implicitly open, explicitly open with some home-grown statement, or explicitly open and using a recognised data license like CC0 or one of the Open Data Commons licenses.

We’ve seen a welcome burst of enthusiasm for ‘open’ release of data. This has been driven most visibly by government transparency agendas here and overseas. But libraries, the scholarly publishing community and others have also been enthusiastic adopters in recent years. Less welcome has been the sometimes rampant license proliferation. Everyone, it seems, finds something not quite right about one of the licenses on the table. Everyone, it sometimes appears, has a burning desire to create their own license that is just a little bit different, just a little bit closer to their world view. Everyone, perhaps, has a lawyer who sees the opportunity to write themselves a blank cheque alongside a new — ’better’ — license. Every local tweak to a common license, however well-meaning, is a barrier to interoperability. Every new license, however laudable the aims behind its creation, is a further complication to an already complicated picture; another excuse to wait rather than do. Although the meaning and the intent may be the same in all of these licenses, every different set of legalese requires careful — repeated — study as everyone else tries to work out whether or not some incompatibility or impediment has (unintentionally, we hope!) been introduced. Unconstrained license proliferation is, simply, bad.

So… I’ll be taking a look at figures from the Data Hub, data.gov.uk and elsewhere, to get some solid numbers on license proliferation, and on the geographies, domains and volumes in which each license is used. I’ll track all of that and more here, when it happens.