if writing is a muscle, this is my gym

Canada launches data.gc.ca – what works and what is broken

Those on twitter will already know that this morning I had the privilege of conducting a press conference with Minister Day about the launch of data.gc.ca – the Federal Government’s Open Data portal. For those wanting to learn more about open data and why it matters, I suggest this and this blog post, and this article – they outline some of the reasons why open data matters.

In this post I want to review what works, and doesn’t work, about data.gc.ca.

What works

Probably the most important thing about data.gc.ca is that it exists. It means that public servants across the Government of Canada who have data they would like to share can now point to a website that is part of government policy. It is an enormous signal of permission from a central agency that will give a number of people who want to share data permission, a process and a vehicle, by which to do this. That, in of itself, is significant.

Indeed, I was informed that already a number of ministries and individuals are starting to approach those operating the portal asking to share their data. This is exactly the type of outcome we as citizens should want.

Moreover, I’ve been told that the government wants to double the number of data sets, and the number of ministries, involved in the site. So the other part that “works” on this site is the commitment to make it bigger. This is also important, as there have been some open data portals that have launched with great fanfare, only to have the site languish as neither new data sets are added and the data sets on the site are not updated and so fall out of date.

What’s a work in progress

The number of “high value” datasets is, relatively speaking, fairly limited. I’m always cautious about this as, I feel, what constitutes high value varies from user to user. That said, there are clearly data sets that will have greater impact on Canadians: budget data, line item spend data by department (as the UK does), food inspection data, product recall data, pretty much everything on the statscan website, Service Canada locations, postal code data and, mailbox location data, business license data, Canada Revenue data on charities and publicly traded companies are all a few that quickly come to mind, clearly I can imagine many, many more…

I think the transparency, tech, innovation, mobile and online services communities will be watching data.gc.ca closely to see what data sets get added. What is great is that the government is asking people what data sets they’d like to see added. I strongly encourage people to let the government know what they’d like to see, especially when it involves data the government is already sharing, but in unhelpful formats.

What doesn’t work

In a word: the license.

The license on data.gc.ca is deeply, deeply flawed. Some might go so far as to say that the license does not make it data open at all – a critique that I think is fair. I would say this: presently the open data license on data.gc.ca effectively kills any possible business innovation, and severally limits the use in non-profit realms.

The first, and most problematic is this line:

“You shall not use the data made available through the GC Open Data Portal in any way which, in the opinion of Canada, may bring disrepute to or prejudice the reputation of Canada.”

What does this mean? Does it mean that any journalist who writes a story, using data from the portal, that is critical of the government, is in violation of the terms of use? It would appear to be the case. From an accountability and transparency perspective, this is a fatal problem.

But it is also problematic from a business perspective. If one wanted to use a data set to help guide citizens around where they might be well, and poorly, served by their government, would you be in violation? The problem here is that the clause is both sufficiently stifling and sufficiently negative that many businesses will see the risk of using this data simply too great.

UPDATE: Thursday March 17th, 3:30pm, the minister called me to inform me that they would be striking this clause from the contract. This is excellent news and Treasury Board deserves credit for moving quickly. It’s also great recognition that this is a pilot (e.g. beta) project and so hopefully, the other problems mentioned here and in the comments below will also be addressed.

It is worth noting that no other open data portal in the world has this clause.

The second challenging line is:

“you shall not disassemble, decompile except for the specific purpose of recompiling for software compatibility, or in any way attempt to reverse engineer the data made available through the GC Open Data Portal or any part thereof, and you shall not merge or link the data made available through the GC Open Data Portal with any product or database for the purpose of identifying an individual, family or household or in such a fashion that gives the appearance that you may have received or had access to, information held by Canada about any identifiable individual, family or household or about an organization or business.”

While I understand the intent of this line, it is deeply problematic for several reasons. First, many business models rely on identifying individuals, indeed, frequently individuals ask businesses to do this. Google, for example, knows who I am and offers custom services to me based on the data they have about me. It would appear that terms of use would prevent Google from using Government of Canada data to improve its service even if I have given them permission. Moreover, the future of the digital economy is around providing customized services. While this data has been digitized, it effectively cannot be used as part of the digital economy.

More disconcerting is that these terms apply not only to individuals, but also to organizations and businesses. This means that you cannot use the data to “identify” a business. Well, over at Emitter.ca we use data from Environment Canada to show citizens facilities that pollute near them. Since we identify both the facilities and the companies that use them (not to mention the politicians whose ridings these facilities sit in), are we not in violation of the terms of use? In a similar vein, I’ve talked about how government data could have prevented $3B of tax fraud. Sadly, data from this portal would not have changed that since, in order to have found the fraud, you’d have to have identified the charitable organizations involved. Consequently, this requirement manifestly destroys any accountability the data might create.

It is again worth noting that no other open data portal in the world has this clause.

And finally:

4.1 You shall include and maintain on all reproductions of the data made available through the GC Open Data Portal, produced pursuant to section 3 above, the following notice:

Reproduced and distributed with the permission of the Government of Canada.

4.2 Where any of the data made available through the GC Open Data Portal is contained within a Value-Added Product, you shall include in a prominent location on such Value-Added Product the following notice:

This product has been produced by or for (your name – or corporate name, if applicable) and includes data provided by the Government of Canada.

The incorporation of data sourced from the Government of Canada within this product shall not be construed as constituting an endorsement by the Government of Canada of our product.

or any other notice approved in writing by Canada.

The problem here is that this creates what we call the “Nascar effect.” As you use more and more government data, these “prominent” displays of attribution begin to pile up. If I’m using data from 3 different governments, each that requires attribution, pretty soon all your going to see are the attribution statements, and not the map or other information that you are looking for! I outlined this problem in more detail here. The UK Government has handled this issue much, much more gracefully.

Indeed, speaking of the UK Open Government License, I really wish our government had just copied it wholesale. We have a similar government system and legal systems so I see no reason why it would not easily translate to Canada. It is radically better than what is offered on data.gc.ca and, by adopting it, we might begin to move towards a single government license within Commonwealth countries, which would be a real win. Of course, I’d love it if we adopted the PDDL, but the UK Open Government License would be okay to.

In Summary

The launch of data.gc.ca is an important first step. It gives those of us interested in open data and open government a vehicle by which to get more data open and improve the accountability, transparency as well as business and social innovation. That said, there is much work to be done still: getting more data up and, more importantly, addressing the significant concerns around the license. I have spoken to Treasury Board President Stockwell Day about these concerns and he is very interested and engaged by them. My hope is that with more Canadians expressing their concerns, and with better understanding by ministerial and political staff, we can land on the right license and help find ways to improve the website and program. That’s why we to beta launches in the tech world, hopefully it is something the government will be able to do here too.

Apologies for any typos, trying to get this out quickly, please let me know if you find any.

54 thoughts on “Canada launches data.gc.ca – what works and what is broken”

Agreed that the existence of the portal is the big news and the good news. Good new indeed.

Sad to hear about the licence.

I wonder if the gov perhaps wants to have it both ways. They want to experiment with data release, and actually hope that people will use it in exciting ways that will justify their having moved in this direction, perhaps even in ways which violate the letter or spirit of the licence. But at the same time they want to have the option of enforcement open to them, so they can shut down the experiment if they don’t like where it’s going, potentially on a case-by-case basis.

In other words, using a restrictive but selectively enforceable license means that they can shift the liability onto the public and away from themselves. Even if they’re not planning on actual blanket enforcement.

Just speculation. But it’s the kind of hedged-bet conservatism that Canadian ministries have some reputation for.

Regardless, I am on the balance very glad for this announcement, and thank you David for the part you have played in pushing us forward on this.

Hi Aaron, on second glance it looks like only historical datasets for which there may not be digital formats are available as pdfs/jpgs. For instance, here’s the 1906 1st Edition of the Atlas of Canada. (How cool is that?)

Great comments & insights as always. Glad to see Canada moving in the direction of open data. I have to say that so far I’ve only skimmed the data.gc.ca site, but the moment I saw the license it was a complete turn off. I didn’t even bother reading it — I simply thought, why would there be a license requirement for an open data site? Now that I read your comments about it, I can see that it’s bad news indeed. Hopefully the powers that be will see fit to make the data truly open.

In a word: to be ignored! It is totally meaningless. An interpretation is just that an interpretation.

Of course if the feds continue to micro-manage the collection of data so that only Sanitized Approved data is released then … that is a problem.

Look at the way the federal government has skewed the Crime Statistics:
“Why Canadian Crime Statistics Don’t Add Up: Not the whole truth” authored by Scott Newark published in January 2011. The full report is available at http://macdonaldlaurier.ca/files/pdf/MLI-Crime_Statistics_Review-Web.pdf. The Macdonald-Laurier Institute for Public Policy is national public policy think tank based in Ottawa claims to be rigorously independent and non-partisan in nature.

I thought that the licence for open data was suppose to protect the user of the data, not limit the ways that it can be used or to enable enforcement. The spirit of openness and sharing seems to have been lost due to the problematic wording mentioned here.

Agreed, the value here is that it exist at all. I am going to presume that the goal here was to make the government data available for use by citizens and support innovative Canadian developers and other open data enthusiasts. We’re keen, willing and able to create valuable solutions for ourselves and our fellow citizens.

Although the license proposed today is a non-starter, it can be replaced, and Canada can join join the other commonwealth countries enjoying the benefits of increased citizen participation.

I would encourage the Canadian Government to engage with the developer community to find out what we’re looking for in a license. In BC, citizens have created an open friendly online discussion group where we discuss these issues ( http://groups.google.com/group/opendatabc ). Our provincial and local government employees use this discussion group as a sounding board for ideas and issues related to open data policy, standards and strategies and to engage the community as partners and fellow citizens.

Though developers are just one of the stakeholders in this discussion, licensing issues are especially important to us as they make the difference between us being able to contribute our skills to create innovative and valuable solutions, or not. Thanks to the Internet and free software, when individual developers are enabled and supported to create these innovative solutions, many thousands of citizens can directly benefit from their work.

This is great news – thanks for all the work to help us get here, Dave.

It looks like the license is already evolving, so we at least have a point of reference in making this government adopt actual innovative and useful open data policies, rather than just the title. If the evolution is already starting, hours after its release, it’s a great sign.

Congratulations to everyone who has worked incessantly for this to pass – there is more potential than can be imagined, and while the license is a huge hurdle, it was kind of predictable. However, I don’t think it’s insurmountable, and I think that the community can work with government to come up with ways to ‘open’ the license but still address govt concerns.

Very good news overall. This morning I was wondering about the licence which seemed to me a little restrictive and this post confirms my understanding. Difficult situation given that this site is a pilot… if people starts using it in a way that does not please gov, is there a risk that it will be closed ?…

Many of the datasets in the portal seem to be an aggregation of what’s already available on other sites. Other datasets seem to be missing from crucial areas. I think this may lead to some confusion. For instance the fact that the 1906 National Atlas is listed here yet the current National Topographic data is not. The data are available through the geogratis.ca portal. Do you think there will be a better more streamlined way of making all these “portals” talk to each other rather than aggregating in the way the data.gc.ca seems to be functioning?

David, it’s great that the Federal government is trying out open data. Unfortunately it seems like you’ll be waiting a long time for mailbox location data. I’ve had an ATIA request in with Canada Post for well over a year for that info, with the hopes of making an app to find the nearest box. My case officer has had the data for most of that time, but has been delaying releasing it for ever-changing reasons. First they wanted to print it (!?) and couldn’t give me an electronic copy for “security” reasons, now it’s because they have a “backlog” and need to review each record first. From my experience, I wouldn’t expect much in the way of proactive data releases from Canada Post.

Wow, Paul, that is stunning. It’s also just stupid beyond belief for them. Someone wants to help people find post boxes? You’d think they’d WANT someone to build that app. Have tweeted this and will see if there is anyone I know that can reframe this to them.

That was my thinking too, David. Funny enough, in my most recent communication with them earlier this week, I pointed out that open government movements around the world are releasing exactly this sort of data, and shared a link to a site built around the USPS’s mailbox data. So I’m thrilled to see data.gc.ca launch as a domestic example to point them towards. Given how long the InfoCom takes to deal with complaints, I’d rather Canada Post see the benefit and release the data. With the site, hopefully openness by default will start permeating our federal culture. Thanks for tweeting to share my frustration, and for pushing so hard for open data in general.

I appreciate that this is just a pilot project, but the open data website is slow and poorly organized. It is not easy to browse and search the available datasets. I hope it evolves to be more like data.gov.uk.

Agreed. The top three criteria for a data site should be 1) simple presence of the data, 2) usable licensing and 3) discoverability. Canadian data sharing sites have typically suffered on that third criteria, and this doesn’t seem to be a particular exception.

The fact that virtually all Canadian federal websites seem to be locked into a many-years-outdated design template doesn’t help. I would assume that the content management system underlying it is just as many years outdated, and that can’t be helping them to develop and deploy good data catalogues.

Again, I’m pleased that they’re making efforts. All of these problems are solvable if there is will to do so, and the existence of a new data portal suggests there may just be.