On May 3, 2019 I was very pleased to give a keynote talk at the Go Open Data 2019 Conference in Toronto (video recordings of the conference proceedings are now available from the site). The following post includes the gist of my talk, along with hyperlinks to the different sources and examples I referenced. My talk was built around the theme of the conference: Inclusive, Equitable, Ethical, and Impactful.

In my talk this morning I am going to use the conference’s themes of Inclusive, Equitable, Ethical and Impactful to shape my remarks. In particular, I will apply these concepts to data in the smart cities context as this has been garnering so much attention lately. But it is also important to think about these in the artificial intelligence (AI) context which is increasingly becoming part of our everyday interactions with public and private sector actors, and is a part of smart cities as well.

As this is an open data conference, it might be fair to ask what smart cities and AI have to do with open data. In my view, these contexts extend the open data discussion because both depend upon vast quantities of data as inputs. They also complicate it. This is for three broad reasons:

First, the rise of smart cities means that there are expanding categories and quantities of municipal data (and provincial) that could be available as open data. There are also growing quantities of private sector data gathered in urban contexts in a variety of different ways over which arguments for sharing could be made. Thus, there is more and more data and issues of ownership, control and access become complex and often conflictual. Open government data used to be about the operations and activities of government, and there were strong arguments for making it broadly open and accessible.But government data is changing in kind, quality and quantity, particularly in smart cities contexts. Open data may therefore be shifting towards a more nuanced approach to data sharing.

Second, smart cities and AI are just two manifestations of the expanding demand for access to data for multiple new uses. There is not just MORE data, there are more applications for that data and more demand from public, private sector and civil society actors for access to it. Yet the opacity of data-hungry analytics and AI contribute to a deepening unease about data sharing.

Third, there is a growing recognition that perhaps data sharing should not be entirely free and open. Open data, under an open licence, with few if any restrictions and with no registration requirement was a kind of ideal, and it fit with the narrower concept of government data described earlier. But it is one that may not be best suited to our current environment. Not only are there potential use restrictions that we might want to apply to protect privacy or to limit undesirable impacts on individuals or communities, but there might also be arguments for cost recovery as data governance becomes more complex and more expensive. This may particularly be the case if use is predominantly by private sector actors – particularly large foreign companies. The lack of a registration requirement limits our ability to fully understand who is using our data, and it reduces the possibility of holding users to account for misuse. Again this may be something we want to address.

I mentioned that I would use the themes of this conference as a frame for my comments. Let me start with the first – the idea of inclusiveness.

Inclusive

We hear a lot about inclusiveness in smart cities – and at the same time we hear about privacy. These are complicated and intertwined.

The more we move towards using technology as an interface for public and private sector services, for interaction with government, for public consultations, elections, and so on, the more we need to focus on the problem of the digital divide and what it means to include everyone in the benefits of technology. Narrowing the digital divide will require providing greater access to devices, access to WIFI/broadband services, access to computer and data literacy, and access in terms of inclusiveness of differently-abled individuals.These are all important goals, but their achievement will inevitably have the consequence of facilitating the collection of greater quantities and more detailed personal information about those formerly kept on the other side of the digital divide. The more we use devices, the more data we generate. The same can be said of the use of public WIFI. Moving from analog to digital increases our data exhaust, and we are more susceptible to tracking, monitoring, profiling, etc. Consider the controversial LinkNYC Kiosks in New York. These large sidewalk installations include WiFi Access, android tablets, charging stations, and free nation-wide calling.But they have also raised concerns about enhanced tracking and monitoring. This is in part because the kiosks are also equipped with cameras and a range of sensors.

No matter how inclusiveness is manifested, it comes with greater data collection.The more identifiable data collected, the greater the risks to privacy, dignity, and autonomy. But de-identified data also carries its own risks to groups and communities. While privacy concerns may prompt individuals to share less data and to avoid data capture, the value of inclusiveness may actually require having one’s data be part of any collection. In many ways, smart cities are about collecting vast quantities of data of many different kinds (including human behavioural data) for use in analytics in order to identify problems, understand them, and solve them. If one is invisible in the data, so are one’s particular needs, challenges and circumstances. In cases where decisions are made based on available data, we want that data to be as complete and comprehensive as possible in order to minimize bias and to make better diagnoses and decisions. Even more importantly, we want to be included/represented in the data so that our specificity is able to influence outcomes. Inclusiveness in this sense is being counted, and counting.

Yet this type of inclusion has privacy consequences – for individuals as well as groups. One response to this has been to talk about deidentification. And while deidentification may reduce some privacy risks, but it does not reduce or eliminate all of them. It also does not prevent harmful or negative uses of the data (and it may evade the accountability provided by data protection laws). It also does not address the dignity/autonomy issues that come from the sense of being under constant surveillance.

Equitable and Ethical

If we think about issues of equity and ethics in the context of the sharing of data it becomes clear that conventional open data models might not be ideal. These models are based on unrestricted data sharing, or data sharing with a bare minimum of restrictions. Equitable and ethical data sharing may require more restrictions to be placed on data sharing – it may require the creation of frameworks for assessing proposed uses to which the data may be put. And it may even require changing how access to data is provided.

In the privacy context we have already seen discussion about reforming the law to move away from a purely consent-based model to one in which there may be “no-go zones” for data use/processing. The idea is that if we can’t really control the collection of the information, we should turn our attention to identifying and banning certain inappropriate uses. Translated into the data sharing context, licence agreements could be used to put limits on what can be done with data that is shared. Some open data licences already explicitly prohibit any attempts to reidentify deidentified data. The Responsible Data Use Assessment process created by Sidewalk Labs for its proposed data governance framework for Toronto’s Quayside development similarly would require an ‘independent’ body to assess whether a proposed use of urban data is acceptable.

The problem, of course, is that licence-based restrictions require oversight and enforcement to have any meaning. I wrote about this a couple of years ago in the context of the use of social media data for analytics services provided to police services across North America. The analytics companies contracted for access to social media data but were prohibited in their terms of use from using this data in the way they ultimately did. The problem was uncovered after considerable effort by the ACLU and the Brennan Center for Justice – it was not discovered by the social media companies who provided access to their data or who set the terms of use. In the recent Report of Findings by the Privacy Commissioner of Canada into Facebook’s role in the Cambridge Analytica scandal, the Commissioner found that although Facebook’s terms of service with developers prohibited the kind of activities engaged in by Dr Kogan who collected the data, they failed in their duty to safeguard personal information, and in particular, ignored red flags that should have told them that there was a problem. Let’s face it; companies selling access to data may have no interest in policing the behaviour of their customers or in terminating their access. An ‘independent’ body set up to perform such functions may lack the resources and capacity to monitor and enforce compliance.

Another issue that exists with ethical approaches is, of course, whose ethics? Taking an ethical approach does not mean being value-neutral and it does not mean that there will not be winners and losers. It is like determining the public interest – an infinitely malleable concept. This is why the composition of decision-making bodies and the location of decision-making power, when it comes to data collection and data sharing, is so important and so challenging.

Impactful

In approaching this last of the conference’s themes – impactful – I think it is useful to talk about solutions. And since I am almost out of time and this is the start of the day’s events, I am going to be very brief as solutions will no doubt be part of the broader discussion today.

The challenges of big data, AI and smart cities have led to a broad range of different proposed data governance solutions. Some of these are partial; for example, deidentification/anonymization or privacy by design approaches address what data is collected and how, but they do not necessarily address uses.

Some are aspirational. For example, developing ethical approaches to AI such as the Montreal Declaration for a Responsible Development of Artificial Intelligence. Others attempt to embed both privacy and ethics into concrete solutions – for example the federal Directive on Automated Decision-Making for the public sector, which sets parameters for the adoption, implementation and oversight of AI deployment in government. In addition, there are a number of models emerging, including data trusts in all their variety (ODI), or bottom-up solutions such as Civic Data Trusts (see, e.g.: MaRS, Element AI, SeanMcDonald), which involve access moderated by an independent (?), representative (?) body, in the public interest (?) according to set principles.

Safe sharing sites is another concept discussed by Lisa Austin and David Lie of the University of Toronto – they are not necessarily independent of data trusts or civic data trusts. Michel Girard is currently doing very interesting work on the use of data standards (see his recent CIGI paper).

On November 23, 2018, Waterfront Toronto hosted a Civic Labs workshop in Toronto. The theme of the workshop was Smart City Data Governance. I was asked to give a 10 minute presentation on the topic. What follows is a transcript of my remarks.

Smart city governance relates to how smart cities govern themselves and their processes; how they engage citizens and how they are transparent and accountable to them. Too often the term “smart city” is reduced to an emphasis on technology and on technological solutionism – in other words “smart cities” are presented as a way in which to use technology to solve urban problems. In its report on Open Smart Cities, Open North observes that “even when driven in Canada by good intentions and best practices in terms of digital strategies, . . .[the smart city] remains a form of innovation and efficient driven technological solutionism that is not necessarily integrated with urban plans, with little or no public engagement and little to no relation to contemporary open data, open source, open science or open government practices”.

Smart cities governance puts the emphasis on the “city” rather than the “smart” component, focusing attention on how decisions are made and how the public is engaged. Open North’s definition of the Open Smart City is in fact a normative statement about digital urban governance:

An Open Smart City is where residents, civil society, academics, and the private sector collaborate with public officials to mobilize data and technologies when warranted in an ethical, accountable and transparent way to govern the city as a fair, viable and liveable commons and balance economic development, social progress and environmental responsibility.

This definition identifies the city government as playing a central role, with engagement from a range of different actors, and with particular economic, social and environmental goals in mind. This definition of a smart city involves governance in a very basic and central way – stakeholders are broadly defined and they are engaged not just in setting limits on smart cities technology, but in deciding what technologies to adopt and deploy and for what purposes.

There are abundant interesting international models of smart city governance – many of them arise in the context of specific projects often of a relatively modest scale.Many involve attempts to find ways to include city residents in both identifying and solving problems, and the use of technology is relevant both to this engagement and to finding solutions.

The Sidewalk Toronto project is somewhat different since this is not a City of Toronto smart city initiative. Rather, it is the tri-governmental entity Waterfront Toronto that has been given the lead governance role. This has proved challenging since while Waterfront Toronto has a public-oriented mandate, it is not a democratically elected body, and its core mission is to oversee the transformation of specific brownfield lands into viable communities. This is important to keep in mind in thinking about governance issues. Waterfront Toronto has had to build public engagement into its governance framework in ways that are different from a municipal government. The participation of federal and provincial privacy commissioners, and representatives from federal and provincial governments feed into governance as does the DSAP and there has been public outreach. There will also be review of and consultation of the Master Innovation Development Plan (MIDP) once it is publicly released. But it is a different model from city government and this may set it apart in important ways from other smart cities initiatives in Canada and around the world.

Setting aside for a moment the smart cities governance issue, let’s discuss data governance. The two are related – especially with respect to the issue of what data is collected in the smart city and for what purposes.

Broadly speaking, data governance goes to the question of how data will be stewarded (and by whom) and for what purposes. Data governance is about managing data. As such, it is not a new concept. Data governance is a practice that is current in both private and public sector contexts. Most commonly it takes place within a single organization which develops practices and protocols to manage its existing and future data. Governance issues include considering who is responsible for the data, who is entitled to set the rules for access to and reuse of it, how those rules will be set, and who will profit/benefit from the data and on what terms. It also includes addressing issues such as data security, standards, interoperability, and localization. Where the data include personal information, compliance with privacy laws is an aspect of data governance. But governance is not limited to compliance – for example, an organization may adopt higher standards than those required by privacy law, or may develop novel approaches to managing and protecting personal information.

There are many different data governance models. Some (particularly in the public sector) are shaped by legislation, regulations and government policies. Others may be structured by internal policies, standards, industry practice, and private law instruments such as contracts or trusts. As the term is commonly used, data governance does not necessarily implicate citizen involvement or participation in the same way as “smart city governance” does – it is the “city” part of “smart city governance” that brings in to focus democratic principles of transparency, accountability, engagement and so on. However, where there is a public sector dimension to the collection or control of data, then public sector laws, including those relating to transparency and accountability, may apply.

With the rise of the data economy, data sharing is becoming an important activity for both public and private sector actors. As a result, new models of data governance are needed to facilitate data sharing. There are many different benefits that flow from data sharing. It may be carried out for financial gain, or it may be done to foster innovation, enable new insights, stimulate the economy, increase transparency, solve thorny problems, and so on. There are also different possible beneficiaries. Data may be shared amongst a group of entities each of which will find advantages in the mutual pooling of their data resources. Or it may be shared broadly in the hope of generating new data-based solutions to existing problems. In some cases, data sharing has a profit motive. The diversity of actors, beneficiaries, and motivations, makes it necessary to find multiple, diverse and flexible frameworks and principles to guide data sharing arrangements.

Open government data regimes are an important example of a data governance model for data sharing. Many governments have decided that opening government data is a significant public policy goal, and have done tremendous amount of work to create the infrastructure not just for sharing data, but for doing it in a useful, accessible and appropriate manner. This means the development of standards for data and metadata, and the development of portals and search functions. It has meant paying attention to issues of interoperability. It has also required governments to consider how best to protect privacy and confidential information, or information that might impact on security issues. Once open, the sharing frameworks are relatively straightforward -- open data portals typically offer data to anyone, with no registration requirement, under a simple open licence.

Governments are not the only ones developing open data portals – research institutions are increasingly searching for ways in which to publicly share research outputs including publications and data. Some research data infrastructures support sharing, but not necessarily on fully open terms – this requires another level of consideration as to the policy reasons for limiting access, how to limit access effectively, and how to set and ensure respect for appropriate limits on reuse.

The concept of a data trust has also received considerable attention as a means of data sharing. The term data trust is now so widely and freely used that it does not have a precise meaning. In its publication “What is a Data Trust”, the ODI identifies at least 5 different concepts of a data trust, and they provide examples of each:

·A data trust as a repeatable framework of terms and mechanisms.

·A data trust as a mutual organisation.

·A data trust as a legal structure.

·A data trust as a store of data.

·A data trust as public oversight of data access.

The diversity of “data trusts” means that there are a growing number of models to study and consider. However, it also makes it a little dangerous to talk about “data trust” as if it has a precise meaning. With data trusts, the devil is very much in the details. If Sidewalk Labs is to propose a ‘data trust’ for the management of data gathered in the Sidewalk Toronto development, then it will be important to probe into exactly what the term means in this context.

What Sidewalk Labs is proposing is a particular vision of a data trust as a data governance model for data sharing in a smart cities development. It is admittedly a work in progress. It has some fairly particular characteristics. For example, not only is it a framework to set the parameters for sharing the subset “urban data” (defined by Sidewalk Labs) collected through the project, it also contemplates providing governance for any proposals by third parties who might want to engage in the collection of new kinds, categories or volumes of data.

In thinking about the proposed ‘trust’, some questions I would suggest considering are:

1) What is the relationship between the proposed trust and the vision for smart city governance? In other words, to what extent is the public and/or are public sector decision-makers engaged in determining what data will be governed by the trust, on what terms, for whose benefit, and on what terms will sharing take place?

2) A data governance model does not make up for a robust smart city governance up front (in identifying the problems to be solved, the data to be collected to solve them, etc.). If this piece is missing, then discussion of the trust may involve discussing the governance of data where there is no group consensus or input as to its collection. How should this be done (if at all)?

3) A data governance model can be created for the data of a single entity (e.g. an open government portal, or a data governance framework for a corporation); but it can also be developed to facilitate data sharing between entities, or even between a group of entities and a broader public. So an important question in the ST context is what model is this? Is this Sidewalk Labs data that is being shared? Or is it Waterfront’s? Or the City’s? Who has custody/control or ownership of the data that will be governed by the ‘trust’?

4) Data governance is crucial with respect to all data held by an entity. Not all data collected through the Sidewalk Toronto project will fall within Sidewalk’s definition of “urban data” (for which the ‘trust’ is proposed). If the data governance model under consideration only deals with a subset of data, then there must be some form of data governance for the larger set. What is it? And who determines its parameters?

Late in the afternoon of Monday, October 15, 2018, Sidewalk Labs released a densely-packed slide-deck which outlined its new and emerging data governance plan for the Sidewalk Toronto smart city development.The plan was discussed by Waterfront Toronto’s Digital Strategy Advisory Panel at their meeting on Thursday, October 18. I am a member of that panel, and this post elaborates upon the comments I made at that meeting.

Sidewalk Labs’ new data governance proposal builds upon the Responsible Data Use Policy Framework (RDUPF) document which had been released by Sidewalk Labs in May 2018. It is, however, far more than an evolution of that document – it is a different approach reflecting a different smart city concept. It is so different that Ann Cavoukian, advisor to Sidewalk Labs on privacy issues, resigned on October 19. The RDUPF had made privacy by design its core focus and promised the anonymization of all sensor data. Cavoukian cited the fact that the new data governance framework contemplated that not all personal information would be deidentified as a reason for her resignation.

Neither privacy by design nor data anonymization are privacy panaceas, and the RDUPF document had a number of flaws. One of them was that by championing deidentification of personal information as the key to responsible data use, it very clearly only addressed privacy concerns relating to a subset of the data that would inevitably be collected in the proposed smart city. In addition, by focusing on privacy by design, it did little to address the many other data governance issues the project faced.

The new proposal embraces a broader concept of data governance. It is cognizant of privacy issues but also considers issues of data control, access, reuse, and localization. In approaching data governance, Sidewalk is also proposing using a ‘civic data trust’ as a governance model. Sidewalk has made it clear that this is a work in progress and that it is open to feedback and comment. It received some at the DSAP meeting on Thursday, and more is sure to come.

My comments at the DSAP focused on two broad issues. The first was data and the second was governance. I prefaced my discussion of these by warning that in my view it is a mistake to talk about data governance using either of the Sidewalk Labs documents as a departure point. This is because these documents embed assumptions that need to be examined rather than simply accepted. They propose a different starting point for the data governance conversation than I think is appropriate, and as a result they unduly shape and frame that discussion.

Data

Both the RDUPF and the current data governance proposal discuss how the data collected by the Sidewalk Toronto development will be governed. However, neither document actually presents a clear picture of what those data are. Instead, both documents discuss a subset of data. The RDUPF discussed only depersonalized data collected by sensors. The second discussed only what it defines as “urban data”:

Urban Data is data collected in a physical space in the city, which includes:

● Public spaces, such as streets, squares, plazas, parks, and open spaces

● Private spaces accessible to the public, such as building lobbies, courtyards, ground-floor markets, and retail stores

● Private spaces not controlled by those who occupy them (e.g. apartment tenants)

This is very clearly only a subset of smart cities data. (It is also a subset that raises a host of questions – but those will have to wait for another blog post.)

In my view, any discussion of data governance in the Sidewalk Toronto development should start with a mapping out of the different types of data that will be collected, by whom, for what purposes, and in what form. It is understood that this data landscape may change over time, but at least a mapping exercise may reveal the different categories of data, the issues they raise, and the different governance mechanisms that may be appropriate depending on the category. By focusing on deidentified sensor data, for example, the RDUPF did not address personal information collected in relation to the consumption of many services that will require identification – e.g., for billing or metering purposes. In the proposed development, what types of services will require individuals to identify themselves? Who will control such data? How will it be secured? What will policies be with respect to disclosure to law enforcement without a warrant? What transparency measures will be in place?Will service consumption data also be deidentified and made available for research? In what circumstances? I offer this as an example of a different category of data that still requires governance, and that still needs to be discussed in the context of a smart cities development. This type of data would also fall outside the category of “urban data” in the second governance plan, making that plan only a piece of the overall data governance required, as there are many other categories of data that are not captured by “urban data”. The first step in a data governance must be for all involved to understand what data is being collected, how, why, and by whom.

The importance of this is also made evident by the fact that between the RDUPF and the new governance plan, the very concept of the Sidewalk Toronto smart city seems to have changed. The RDUPF envisioned a city in which sensors were installed by Sidewalk and Sidewalk was committing to the anonymization of any collected personal information. In the new version, the model seems to be of the smart city as a technology platform on which any number of developers will be invited to build. As a result, the data governance model proposes an oversight body to provide approval for new data collection in public spaces, and to play some role in the sharing of the collected data if appropriate. This is partly behind the resignation of Ann Cavoukian. She objected to the fact that this model accepts that some new applications might require the collection of personal information and so deidentification could not be an upfront promise for all data collected.

The technology-platform model seems responsive to concerns that the smart city would effectively be subsumed by a single corporation. It allows other developers to build on the platform – and by extension to collect and process data. Yet from a governance perspective this is much messier. A single corporation can make bold commitments with respect to its own practices; it may be difficult or inappropriate to impose these on others. It also makes it much more difficult to predict what data will be collected and for what purposes. This does not mean that the data mapping exercise is not worthwhile – many kinds and categories of data are already foreseeable and mapping data can help to understand different governance needs. In fact, it is likely that a project this complex will require multiple data governance models.

Governance

The second point I tried to make in my 5 minutes at the Thursday meeting was about data governance. The new data governance plan raises more questions than it answers. One glaring issue seems to be the place for our already existing data governance frameworks. These include municipal and provincial Freedom of Information and Protection of Privacy Acts and PIPEDA. They may also include the City of Toronto’s open data policies and platforms. There are very real questions to be answered about which smart city data will be private sector data and which will be considered to be under the custody or control of a provincial or municipal government. Government has existing legal obligations about the management of data that are under its custody or control, and these obligations include the protection of privacy as well as transparency. A government that decides to implement a new data collection program (traffic cameras, GPS trackers on municipal vehicles, etc.) would be the custodian of this data, and it would be subject to relevant provincial laws. The role of Sidewalk Labs in this development challenges, at a very fundamental level, the understanding of who is ultimately responsible for the collection and governance of data about cities, their services and infrastructure. Open government data programs invite the private sector to innovate using public data. But what is being envisaged in this proposal seems to be a privatization of the collection of urban data – with some sort of ‘trust’ model put in place to soften the reality of that privatization.

The ‘civic data trust’ proposed by Sidewalk Labs is meant to be an innovation in data governance, and I am certainly not opposed to the development of innovative data governance solutions. However, the use of the word “trust” in this context feels wrong, since the model proposed is not a data trust in any real sense of the word. This view seems to be shared by civic data trust advocate Sean MacDonald in an article written in response to the proposal. It is also made clear in this post by the Open Data Institute which attempts to define the concept of a civic data trust. In fact, it is hard to imagine such an entity being created and structured without significant government involvement. This perhaps is at the core of the problem with the proposal – and at the root of some of the pushback the Sidewalk Toronto project has been experiencing. Sidewalk Labs is a corporation – an American one at that – and it is trying to develop a framework to govern vast amounts of data collected about every aspect of city life in a proposed development. But smart cities are still cities, and cities are public institutions created and structured by provincial legislation and with democratically elected councils. If data is to be collected about the city and its residents, it is important to ask why government is not, in fact, much more deeply implicated in any development of both the framework for deciding who gets to use city infrastructure and spaces for data collection, and what data governance model is appropriate for smart cities data.

This post (and my presentation) explores the concept of the ‘smart’ city and lays the groundwork for a discussion of governance by exploring the different types of data collected in so-called smart cities.

Although the term ‘smart city’ is often bandied about, there is no common understanding of what it means. Anthony Townsend has defined smart cities as “places where information technology is combined with infrastructure, architecture, everyday objects, and even our bodies to address social, economic, and environmental problems.” (A. Townsend, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia. (New York: W.W. Norton & Co., 2013), at p. 15). This definition emphasizes the embedding of information technologies within cities with the goal of solving a broad range of urban problems. Still, there is uncertainty as to which cities are ‘smart’ or at what point a city passes the invisible ‘smart’ threshold.

Embedded technologies are multiple and ever-evolving, and many are already in place in the cities in which we live. Technologies that have become relatively commonplace include smart transit cards, GPS systems on public vehicles (e.g.: buses, snowplows, emergency vehicles, etc.), smart metering for utilities, and surveillance and traffic cameras. Many of the technologies just identified collect data; smart technologies also process data using complex algorithms to generate analytics that can be used in problem identification and problem solving. Predictive policing is an example of a technology that generates information based on input data and complex algorithms.

While it is possible for a smart city to be built from the ground up, this is not the most common type of smart city. Instead, most cities become ‘smarter’ by increments, as governments adopt one technology after another to address particular needs and issues. While both from-the-ground-up and incremental smart cities raise important governance issues, it is the from-the-ground-up projects (such as Sidewalk Toronto) that get the most public attention. With incremental smart cities, the piecemeal adoption of technologies often occurs quietly, without notice, and thus potentially without proper attention being paid to important overarching governance issues such as data ownership and control, privacy, transparency, and security.

Canada has seen two major smart cities initiatives launched in the last year. These are the federal government’s Smart Cities Challenge – a contest between municipalities to fund the development of smart cities projects – and the Sidewalk Toronto initiative to create a from-the-ground-up smart development in Toronto’s Quayside area. Although Canadian cities have been becoming ‘smart’ by increments for some time now, these two high-profile initiatives have sparked discussion of the public policy issues, bringing important governance issues to the forefront.

These initiatives, like many others, have largely been conceived of and presented to the public as technology, infrastructure, and economic development projects. Rather than acknowledging up-front the need for governance innovation to accompany the emerging technologies, governance tends to get lost in the hype. Yet it is crucial. Smart cities feed off data, and residents are primary sources. Much of the data collected in smart cities is personal information, raising obvious privacy issues. Issues of ownership and control over smart cities data (whether personal or non-personal) are also important. They are relevant to who gets to access and use the data, for what purposes, and for whose profit. The public outcry over the Sidewalk Toronto project (examples here, here and here) clearly demonstrates that cities are not just tech laboratories; they are the places where we try to live decent and meaningful lives.

The governance issues facing so-called smart cities are complex. They may be difficult to disentangle from the prevailing ‘innovate or perish’ discourse. They are also rooted in technologies that are rapidly evolving. Existing laws and legal and policy frameworks may not be fully adequate to address smart cities challenges. This means that the governance issues raised by smart cities may require a rethinking of the existing law and policy infrastructure almost at pace with the emerging and evolving technologies.

The complexity of the governance challenges may be better understood when one considers the kind of data collected in smart cities. The narrower the categories of data, the more manageable data governance in the smart city will seem. However, the nature of information technologies, including the types and locations of sensors, and the fact that many smart cities are built incrementally, require a broad view of the types of data at play in smart cities. Here are some kinds of data collected and used in smart cities:

·traditional municipal government data (e.g. data about registrants or applicants for public housing or permits; data about water consumption, infrastructure, waste disposal, etc.)

·data sourced from private sector companies (e.g.: data about routes driven or cycled from companies such as Waze or Strava; social media data, etc.)

·data from individuals as sensors (e.g. data collected about the movements of individuals based on signals from their cell phones; data collected by citizen scientists; crowd-sourced data, etc.)

·data that is the product of analytics (e.g. predictive data, profiles, etc.)

Public sector access to information and protection of privacy legislation provides some sort of framework for transparency and privacy when it comes to public sector data, but clearly such legislation is not well adapted to the diversity of smart cities data. While some data will be clearly owned and controlled by the municipality, other data will not be. Further the increasingly complex relationship between public and private sectors around input data and data analytics means that there will be a growing number of conflicts between rights of access and transparency on the one hand, and the protection of confidential commercial information on the other.

Given that few ‘smart’ cities will be built from the ground up (with the potential for integrated data governance mechanisms), the complexity and diversity of smart cities data and technologies creates a stark challenge for developing appropriate data governance.

(Sorry to leave a cliff hanger – I have some forthcoming work on smart cities data governance which I hope will be published by the end of this year. Stay tuned!)

Metrolinx is the Ontario government agency that runs the Prestocard service used by public transit authorities in Toronto, Ottawa and several other Ontario municipalities. It ran into some trouble recently after the Toronto Star revealed that the organization shared Prestocard data from its users with police without requiring warrants (judicial authorization). The organization has now published its proposals for revising its privacy policies and is soliciting comment on them. (Note: Metrolink has structured its site so that you can only view one of the three proposed changes at a time and must indicate your satisfaction with it and/or your comments before you can view the next proposal. This is problematic because the changes need to be considered holistically. It is also frankly annoying).

The new proposals do not eliminate the sharing of rider information with state authorities without a warrant.Under the new proposals, information will be shared without a warrant in certain exigent circumstances. It will also be shared without a warrant “in other cases, where we are satisfied it will aid in an investigation from which a law enforcement proceeding may be undertaken or is likely to result.” The big change is thus apparently in the clarity of the notice given to users of the sharing – not the sharing itself.

This flabby and open-ended language is taken more or less directly from the province’s Freedom of Information and Protection of Privacy Act (FOIPPA), which governs the public sector’s handling of personal information. As a public agency, Metrolinx is subject to FOIPPA. It is important to note that the Act permits (but does not require) government entities to share information with law enforcement in precisely the circumstances outlined in the policy. However, by adapting its policy to what it is permitted to do, rather than to what it should do, Metrolinx is missing two important points. The first is that the initial outrage over its practices was about information sharing without a warrant, and not about poor notice of such practices. The second is that doing a good job of protecting privacy sometimes means aiming for the ceiling and not the floor.

Location information is generally highly sensitive information as it can reveal a person’s movements, activities and associations.Police would normally need a warrant to obtain this type of information.It should be noted that police are not relieved of their obligations to obtain warrants when seeking information that raises a reasonable expectation of privacy just because a statute permits the sharing of the information. It would be open to the agency to require that a warrant be obtained prior to sharing sensitive customer location data. It is also important to note that some courts have found that the terms of privacy policies may actually alter the reasonable expectation of privacy – particularly when clear notice is given. In other words, even though we might have a reasonable expectation of privacy in location data about our movements, a privacy policy that tells us clearly that this information is going to be shared with police without a warrant could substantially undermine that expectation of privacy. And all of this happens without any ability on our part to negotiate for terms of service,[1] and in the case of a monopoly service such as public transportation, to choose a different provider.

Metrolinx no doubt expects its users to be comforted by the other changes to its policies. It already has some safeguards in place to minimize the information provided to police and to log any requests and responses. They plan to require, in addition, a sign off by the requesting officer and supervisor. Finally, they plan to issue voluntary transparency reports as per the federal government’s Transparency Reporting Guidelines. Transparency reporting is certainly important, as it provides a window onto the frequency with which information sharing takes place. However, these measures do not correct for an upfront willingness to share sensitive personal information without judicial authorization – particularly in cases where there are no exigent circumstances.

As we move more rapidly towards sensor-laden smart cities in which the consumption of basic services and the living of our daily lives will leave longer and longer plumes of data exhaust, it is important to reflect not just on who is collecting our data and why, but on the circumstances in which they are willing to share that data with others – including law enforcement officials. The incursions on privacy are many and from all directions.Public transit is a basic municipal service. It is also one that is essential for lower-income residents, including students.[2]Transit users deserve more robust privacy protections.

Notes:

[1] A recent decision of the Ontario Court of Appeal does seem to consider that the inability to negotiate for terms of service should be taken into account when assessing the impact of those terms on the reasonable expectation of privacy. See: R. v. Orlandis-Habsburgo.

[2] Some universities and colleges have U-Pass agreements which require students to pay additional fees in exchange for Prestocard passes. Universities and colleges should, on behalf of their students, be insisting on more robust privacy.

Note: the following are my speaking notes for my appearance before the Standing Committee on Transport, Infrastructure and Communities, February 14, 2017. The Committee is exploring issues relating Infrastructure and Smart Communities. I have added hyperlinks to relevant research papers or reports.

Thank you for the opportunity to address the Standing Committee on Transport, Infrastructure and Communities on the issue of smart cities.My research on smart cities is from a law and policy perspective. I have focused on issues around data ownership and control and the related issues of transparency, accountability and privacy.

The “smart” in “smart cities” is shorthand for the generation and analysis of data from sensor-laden cities. The data and its accompanying analytics are meant to enable better decision-making around planning and resource-allocation. But the smart city does not arise in a public policy vacuum. Almost in parallel to the development of so-called smart cities, is the growing open government movement that champions open data and open information as keys to greater transparency, civic engagement and innovation. My comments speak to the importance of ensuring that the development of smart cities is consistent with the goals of open government.

In the big data environment, data is a resource. Where the collection or generation of data is paid by taxpayers it is surely a public resource. My research has considered the location of rights of ownership and control over data in a variety of smart-cities contexts, and raises concerns over the potential loss of control over such data, particularly rights to re-use the data whether it is for innovation, civic engagement or transparency purposes.

Smart cities innovation will result in the collection of massive quantities of data and these data will be analyzed to generate predictions, visualizations, and other analytics. For the purposes of this very brief presentation, I will characterize this data as having 3 potential sources:1) newly embedded sensor technologies that become part of smart cities infrastructure; 2) already existing systems by which cities collect and process data; and 3) citizen-generated data (in other words, data that is produced by citizens as a result of their daily activities and captured by some form of portable technology).

Let me briefly provide examples of these three situations.

The first scenario involves newly embedded sensors that become part of smart cities infrastructure. Assume that a municipal transit authority contracts with a private sector company for hardware and software services for the collection and processing of real-time GPS data from public transit vehicles. Who will own the data that is generated through these services? Will it be the municipality that owns and operates the fleet of vehicles, or the company that owns the sensors and the proprietary algorithms that process the data?The answer, which will be governed by the terms of the contract between the parties, will determine whether the transit authority is able to share this data with the public as open data. This example raises the issue of the extent to which ‘data sovereignty’ should be part of any smart cities plan. In other words, should policies be in place to ensure that cities own and/or control the data which they collect in relation to their operations. To go a step further, should federal funding for smart infrastructure be tied to obligations to make non-personal data available as open data?

The second scenario is where cities take their existing data and contract with the private sector for its analysis. For example, a municipal police service provides their crime incident data to a private sector company that offers analytics services such as publicly accessible crime maps. Opting to use the pre-packaged private sector platform may have implications for the availability of the same data as open data (which in turn has implications for transparency, civic engagement and innovation). It may also result in the use of data analytics services that are not appropriately customized to the particular Canadian local, regional or national contexts.

In the third scenario, a government contracts for data that has been gathered by sensors owned by private sector companies. The data may come from GPS systems installed in cars, from smart phones or their associated apps, from fitness devices, and so on. Depending upon the terms of the contract, the municipality may not be allowed to share the data upon which it is making its planning decisions. This will have important implications for the transparency of planning processes. There are also other issues. Is the city responsible for vetting the privacy policies and practices of the app companies from which they will be purchasing their data? Is there a minimum privacy standard that governments should insist upon when contracting for data collected from individuals by private sector companies? How can we reconcile private sector and public sector data protection laws where the public sector increasingly relies upon the private sector for the collection and processing of its smart cities data?Which normative regime should prevail and in what circumstances?

Finally, I would like to touch on a different yet related issue. This involves the situation where a city that collects a large volume of data – including personal information – through its operation of smart services is approached by the private sector to share or sell that data in exchange for either money or services. This could be very tempting for cash-strapped municipalities. For example, a large volume of data about the movement and daily travel habits of urban residents is collected through smart card payment systems. Under what circumstances is it appropriate for governments to monetize this type of data?