Quick Core Dump of Idle Thoughts on the Public Data Corporation (PDC) Consultation

“Please provide evidence to support your answer where possible.”
I read this as: “We haven’t really provided any evidence in this consultation, but if you don’t, we can ignore what you say on the grounds it’s anecdotal at best, or more likely, completely unjustified…”

***”1. How do you think Government should best balance its objectives around increasing access to data and providing more freely available data for re-use year on year within the constraints of affordability?”

[Paras 1.12, 1.17, 1.18] Presumably, the first implication is that the PDC will incoprate public bodies involved with the production of “core reference data”/those organisations “whose primary purpose is collecting, managing and disseminating data and providing value-added services based on that data” and the policy framework for deciding who’s in and who’s out must in the first instance rule HM Land Registry, Met Office and Ordnance Survey in? Based on the criteria used to rule these organisations in, is the gut feeling that organisations such as Companies House (who mint unique corporate identifiers), the General Register Office, Office for National Statistics, DVLA (eg Vehicle Checking or Driver Validation Service), Highwyas Agency (eg http://www.trafficengland.com/index.aspx?ct=true ), academic data repositories such as http://www.data-archive.ac.uk/ or http://www.census.ac.uk/ , The National Archives, the data models and data assets being developed as part of the BBC Digital Public Space project or the JISC UK Discovery initiative. With publicly funded research increasingly being required to disseminate findings through open access publications (eg http://www.epsrc.ac.uk/about/standards/researchdata/Pages/default.aspx ), to what extent might (or should?) the deposit and/or release of research data be covered: a) by open data principles, b) via the PDC or a research council equivalent, bearing in mind that access to publicly funded research data may be subject to FOI requests (eg http://www.jisc.ac.uk/publications/programmerelated/2010/foiresearchdata.aspx ).

[“1.22. The way that Government has sought to cover those high fixed costs and to ensure sustainable investment in data infrastructure has been to encourage public sector bodies to licence their core reference data to third parties”]

But costs introduce friction downstream and may result in one part of government (the data publisher) recruiting more than cost from other public bodies? In addition, is it possible that central data collection bodies may as part of their remit collect data from other public bodies that is then resold back to those bodies in an alternative (albeit potentially enriched) form?

PDC Objective 3 states: “create a vehicle that can attract private investment.” So there is presumably a requirement that money flows in from the private sector to the PDC and then out again in spades (because investors will want a return)?

[4.12] If there is a large up-front cost in producing/releasing data, and limited marginal cost, another model would be a one-off fee to offset the production/release cost, rather than a metered/ongoing usage fee offset against production/release cost + marginal cost? That the public body would carry the marginal cost is just a consequence of its data being worked/used, which is partly the point of releasing it in the first place? (ie presumably some benefit accrues elsewhere in the system as a result of the data being worked?)

[4.22] A single fee may provide a barrier to entry to personal users, researchers, SMEs engaged in invention and innovation where there is no established market for as yet undeveloped products or services. Might fee waivers be a possibility, and if so, how would they be awarded. Might there be an equivalent of a public lending library service (a service that traditionally has provided universal access to information, including information from resources that may have a singnifcant acquisition cost associated with them) that will provide “personal research” access to a public task dataset?

[4.23] Is the work of producing datasets part of the public task of the Office for National Statistics, and if so, will we have to pay for access to those statistics?

[4.24] As a corollary to the case, for example, of locals councils licence out the management and operartion of civic carparks, would the public bodies be allowed to do the same with contracting out the management, publication of and charging for their public data usage by third parties, and if so, how will limits be set on the pricing, bearing in mind any commercial operator would expect to make a financial return on the operation of that service, and would it imapct on the way the public body collects, quality checks and operates its own data processes?

If public bodies are to develop “commercial products to serve commercial markets”, how does this sit with para. 4.18 (profit maximisation model) where an “incentivised PDC [would] fully commercialise all its products and services. While aligned with a strategy focussed purely on maximising value for the taxpayer such a model is unlikely to be consistent with Managing Public Money guidance and delivering on a commitment for free data”? Presumably the “commercial products to serve commercial markets” would be expected to be profit maximising, or not? Cost recovering (as in 4.23)? But what cost (eg would that include the cost of advertising, marketing, and other activities associated with commercial services)?

[4.25, 4.26, 4.28] Freemium does not necessarily imply “try out”. Many freemium services provide an access quota that allows an on-user to use the service as part of their own service, for free, up to certain usage limits. If the usage is heavy, then the commercial plan kicks in. But the small player can run a small service, for free, until they hit usage limits. In some cases, a condition of using the freemium service may be that the user cannot cache the data; ie they must faithfully draw the data down as they use it, rather than building up a local copy. In other cases, they may be encouraged to cache the data so as to prevent repeated service calls for the same data, in which case they usage quota is based on unique data accesses rather than repeated data accesses.

***”2. Are there particular datasets or information that you believe would create particular economic or social benefits if they were available free for use and re-use? Who would these benefit and how? Please provide evidence to support your answer where possible.

***”3. What do you think the impacts of the three options would be for you and/or other groups outlined above?”

[4.39 Government as user of PDC data] If the fees go up, and public bodies are changed universal commercial rates, they will have to pay more, which will introduce further friction into the process and reduce opportunites for effective data (re)use.

How I read the “options”:
[“4.40. Under all options, charges for some units of PDC information are likely to change, with more data being provided free at the point of use.”] So there are no additional benefits from Option 1.. so rule this one out?
[“4.41. Under Option 2, it is possible that some efficiency savings could be delivered through having a single price, although there will be some upfront investment and resource required to implement a change.”] Savings possible, but it will cost in the short term? Rule this one out too?
[“4.42. Under Option 3, it is likely that in the short term income would decrease, but if the freemium model was successful income might then increase over time.”] Presumably we’re expected to read this as: “It won’t cost anything, and profits may go down in the short term; but then we might get a viable business out of it, and moreover a business capabale of growth, using a sexy sounding techie inspired business model… Cool… let’s have that then’? The truth being, of course, that costs are generally associated with any change, and that this is a status quo offering, where public bodies charge other public bodies and private enterprises for data collected as a matter of course (although admittedly at some expense) as part of the operating environment for government.

***”4. A further variation of any of the options could be to encourage PDC and its constituent parts to make better use of the flexibility to develop commercial data products and services outside of their public task. What do you think the impacts of this might be?”

[“4.30 There is the potential for providing a PDC and its constituent parts with greater encouragement to make better use of the existing flexibility to develop commercial products to serve commercial markets.”] Does this include the ability to develop commercial services based around expertise and support? (Expertise that may not be available widely, for example, particularly to SMEs? In which case, the service would also help support knowledge transfer from the public sphere and into the private sphere?

Rather than produce data and make it available to other public bodies as well as developing commercial products, would it be possible to give the data away under a truly open license and task the PDC with developing data products and services that save the other public bodies money, working with them to reduce costs that can be then considered as in kind direct returns on investment in the data services and products.

***”5. Are there any alternative options that might balance Government’s objectives which are not covered here? Please provide details and evidence to support your response where possible”

[4.10] The assumption here appears to be that payment for data should be based on the basis that commercial users purchase a license to make use of data from the PDC and pay the PDC directly in financial terms. However, might a commercial user not offer an in-kind payment, such as a guarantee to resell services /at a discount or reduced margin/ to other public services, or make value-added versions of the data produced by the commercial user available for free to specified public bodies? This compares with 4.17 where data is provided free to users who then resell added-value data back to the public body. What is important is that if PDC bodies are producing value add data, this should be provided free of rights encumberance to other public bodies, and ideally free of cost; the issue then remains of how the value added data may be passed on to non-public bodies? The intent of these users might also be worth considering: for example, personal or academic research, commercial research/innovation by SMEs, or as part of a service offering by an established larger company. Differential license/charging agreements of course need to be fair, but might this not be handled through offset grants, for example public data access grants awarded to SMEs via the TSB?

[4.36] Defining the future PDC on the basis of supporting incumbent business models predicated on current processes and ‘the old way of doing things’ is a dangerous step to take. If the open data policy framework is intended to foster innovation, it would be foolish to constrain innovation and limit the future possible use of open and public data to legacy models and processes that represent the current status quo. True innovation may well be disruptive, and upset the current status quo. Such is life.

A set of models that do not appear to have been considered are business models that develop around open source software. In the same way that data can be expensive to produce, may be protected by ownership and licensing rights, and may be used as the basis of other commercially viable services, so too can software. A summary of business and sustainability models appropriate for the open source software domain can be found here: http://www.oss-watch.ac.uk/resources/businessandsustainability.xml It may be worth doing a simple mapping of these models, based as they are around open source software, onto an open data (rather than software) resource context. There is the tension that whilst cost recovery by selling on data may be deemed to be an acceptable mode of operation for a public body, in part because it supports cost recovery through getting a return on sale of goods/services for minimal marginal cost of making those goods/services available, the sale of high value consultancy, for example, requires large additional cost and activity not aligned directly with the provision of public service (in effect, the use of public service to provide private commercial services outside the public sphere, not just internally on a cost recovery basis).

If it is the case that better access to information – and data – helps us make better decisions (and I’m not convinced that what we want is to make decisions: most people have no real choice and just want effective local public services), then the reward to the public body is not so much a direct financial return as a minimisation of costs incurred elsewhere becuase a bad decision was made.

Recent years have seen a return to prize fund/Grand Challenge based funding models in wich prizes are awarded to technology solutions to particular technical challenges. This funding models replace the research funding support model with a reward based model. To what extent might the PDC act as a prize fund awarding body that can reward innovation around the use of public data, and sponsor parties engaged in such competition with “data permits” or “data credits” that provide them with data access in return for them submitting responses to data related Grand Challenges?

Could the TSB, in association with the PDC, even operate as an angel fund, supporting companies wishing to develop services or products based on public data, in exchange for a share in the companies involved, harking back to ideas behind the foundation of 3i, for example?

***”6. To what extent do you agree that there should be greater consistency, clarity and simplicity in the licensing regime adopted by a PDC?”

Experience of using Creative Commons licences and open source software licenses suggests that even within a open licensing framework, if different license types are combined it rapidly becomes difficult to work out how license conditions surrounding differently licensed components interact. Haviong a single license mitigates against creating confusion through complex, and possible inconsistent, combinations of license conditions arising from the novel combination of differently licensed resources.

The confusion as to what is allowable may act as a significant barrier to developing services that combine resources licensed in different ways. Since much innovation is likely to arise from combination of resources, the multi-license approach is not really viable. Regulating on how datasets may be used/what license terms apply for different use cases may place arbitrary conditions on the innovation of new models that fall outside or across models that are assumed to be possible when the model license conditions are framed.

Furthermore, in a truly open licensing regime, the scope of reuse would not be artificially bounded and the user would be free to reuse the resources in any way they wanted.

***”7. To what extent do you think each of the options set out would address those issues (or any others)? Please provide evidence to support your comments where possible.”

Options 1 or 2 may lead to situations where complex and even pathological combinations of different license types make it impossible for a user to work out whether or not they are allowed to combine a set of resources in a particular way, or develop business models that operate across different license condition regimes.

***”8. What do you think the advantages and disadvantages of each of the options would be? Please provide evidence to support your comments”

Option 1 “[5.10] … each organisation within a PDC would have its own portfolio of standard licences, terms and conditions appropriate to the nature of their business.”

Complications around from ill specified consequences arising from the combination of differently licensed resources arise here just as they do in the case of option 1.

Option 3 “While a single licence would offer greater consistency of standard terms and conditions it is likely that there would be a wide range of other terms, clauses and schedules required to cover the various types and uses of PDC information. It is therefore likely to be lengthy and will contain clauses and schedules that will not be relevant to all users”

Does this mean that there will essentially be different licenses according to the status of the user (personal use, academic, commercial, etc) rather than the situation in options 1 and 2 where there are essentially different licenses relating to the use to which resources will be put?

***”9. Will the benefits of changing the models from those in use across Government outweigh the impacts of taking out new or replacement licences?”

I don’t know.

***”10.To what extent is the current regulatory environment appropriate to deliver the vision for a PDC?”

“[6.8] … it is envisaged that all organisations within a PDC will be advised to develop and agree with the regulator the statement of their public task.”

So the management of the current operting funds will be expected to work together to produce their own regulatory framework, at least insofar as the definition of their public task goes? This is likely to be backward looking and protective of current operating models rather than being open to new models and potentially even new ways of defining public tasks that do not respect current organisational boundaries, processes and modes of operation. The PDC as thus described is a way of bringing together current orgnisations and their associated business models and allowing them to work together to protect those interests, interests that were defined to support a data environment that may no longer exist.

“6.10. In the freemium model there may be a role for the regulator, as indicated earlier, in advising PDC bodies how they can best go about making practical arrangements to make more data free for re-use while ensuring a sustainable business model.”

Requiring that any innovations also protect the current operating model suggests that the establishment of the PDC is actually a rearguard action to protect against a radical change in the ways in which data is produced, managaed and exploited within government, as well as for the wider public good through development of third sector and even private services.

***”11.Are there any additional oversight activities needed to deliver the vision for a PDC and if so what are they?”

The vision being the preservation of the status quo through the creation of a conglomerate of current data selling public bodies? And through oversight, you presumably do not mean the creation of a body that can force through changes to the way the board of the PDC decide it will operate, but rather will limit it’s role to seeing that the board does what it says it will do?

The current proposal for a PDC seems to favour the creation of a conglomerate charged with exploiting public data for financial return wherever it can, rather than act as a regulator, ombudsman, or advocate tasked with getting the most value out of public data through making effective use of it, and maximising the possibility of making effective use from it?

“[6.1] Given the confines of this consultation, and its remit to focus only on the data policy options for a PDC itself, it would not be appropriate to consult on the whole policy and legislative framework”

Which is to say, you are not soliciting ideas about how to set up a governance regime that will require a nascent PDC to develop structures and processes that seek to innovate in the way public data is collected, processed and exploited, or helps realise a vision where free flowing open public data revitalises the way in which public bodies operate?

***”12.What would be an appropriate timescale for reviewing a PDC or its constituent parts public task(s)?”

It seems you have a done deal already, and the PDC will be set up in a way that means it will be difficult to dismantle or restructure significantly and that any regulatory scheme will that is established will have to be defined so as to regulate an entity that has itself defined how it wants to be regulated?

As with the quick comments on the Making Data Public consultation, I probably need to spend a bit of time reviewing these immediate impressions, but as before, time is short… If you want to harangue me on any obvious howlers, or call me out on any obvious inconsistencies (it might well be the case that comments appear to come from contradictory positions!), feel free to post a comment:-)