A Closer Look at Location Data: Privacy and Pandemics

In this series, Privacy and Pandemics, the Future of Privacy Forum explores the challenges posed by the COVID-19 crisis to existing ethical, privacy, and data protection frameworks, and will seek to provide information and guidance to companies and researchers interested in responsible data sharing to support public health response. Future posts will examine pandemic-tracking mobile apps, regulatory guidance across the world, and more.

In light of COVID-19, there is heightened global interest in harnessing location data held by major tech companies to track individuals affected by the virus, better understand the effectiveness of social distancing, or send alerts to individuals who might be affected based on their previous proximity to known cases. Governments around the world are considering whether and how to use mobile location data to help contain the virus: Israel’s government passed emergency regulations to address the crisis using cell phone location data; the European Commission requested that mobile carriers provide anonymized and aggregate mobile location data; and South Korea has created a publicly available map of location data from individuals who have tested positive.

Public health agencies and epidemiologists have long been interested in analyzing device location data to track diseases. In general, the movement of devices effectively mirrors movement of people (with some exceptions discussed below). However, its use comes with a range of ethical and privacy concerns.

In order to help policymakers address these concerns, we provide below a brief explainer guide of the basics: (1) what is location data, (2) who holds it, and (3) how is it collected? Finally we discuss some preliminary ethical and privacy considerations for processing location data. Researchers and agencies should consider: how and in what context location data was collected; the fact and reasoning behind location data being classified as legally “sensitive” in most jurisdictions; challenges to effective “anonymization”; representativeness of the location dataset (taking into account potential bias and lack of inclusion of low-income and elderly subpopulations who do not own phones); and the unique importance of purpose limitation, or not re-using location data for other civil or law enforcement purposes after the pandemic is over.

What is precise location data?

Precise location data, or “mobility data,” involves information about how devices and people move through spaces over time. Most of this information comes from the devices we carry with us, with smartphones acting as proxies for people (according to Pew, smartphone ownership in 2019 was near-universal at 81% of Americans).

Why is this the case? Even the most basic connectivity, or the ability to send and receive wireless content on devices, has to involve information about where those devices are located. For example, providers of wireless services know where devices are located because they provide the service through local cell towers and networks. At a more general level, an IP address (an identifier that is freely and openly shared by devices to send and receive Internet traffic) is often sufficient to know a person’s city and state.

However, most researchers analyzing COVID-19 are interested in highly “precise” information about where devices (and therefore people) are located over time. The fact that an individual is located in “Washington, DC” is not sufficient for tracking an infectious disease, but information such as “works in the same building” or “attended the same restaurant at the same time as a diagnosed person” (precise location) can be very useful. Typically, we think of location data as having privacy implications when it is precise enough to single out an individual with reasonable specificity. This is often GPS-level specificity, and would usually not include information like an IP address. Measuring precise location depends in part on context, such as population density (for example, in a rural or remote area, a lower level of specificity might be more able to identify a person than if that same person were standing in Times Square). Recent legislative proposals have attempted to create strict cut-offs (such as an 1,640 foot radius under the U.S. House and Commerce Discussion Draft, or an 1,850 foot radius under the California Privacy Rights Act ballot initiative of 2020).

Sometimes mobility or location data is tied to known individuals (such as a name associated with a cell phone subscription), and at other times it is tied to a unique identifier associated with a device. In these cases, individualized data is often referred to as “anonymized.” In other cases, if a dataset has been modified to show movements of groups of people (and not individuals), it is often referred to as “aggregated.”

Who has access to location data?

Location data is held by a variety of commercial entities that provide different services, including as part of the core functionality of a device (mobile phone carriers and operating systems), as part of a consumer-facing feature (mobile apps), or as part of tracking in physical spaces that relies on device connectivity (Internet of Things):

Mobile phone carriers. Cell phone carriers know where phones are located because they direct calls to phones through local cell towers, which may be enhanced with GPS location data.

Operating Systems. Providers of mobile operating systems — Android (Google) and iOS (Apple) — may know where devices are located as a result of providing services, improving functionality, or enabling opt-in location features. In addition, some users may have opted in to the use of cell tower and Wi-Fi data being used to improve location services.

Left – iOS (Apple), middle and right – Android (Google)

Apps and App Partners. Many people have installed apps with location-based features, such as weather alerts, ridesharing, or groceries deliveries. In many cases, this location data is shared with partners in order to provide personalized ads or to monetize the free app. Many apps use Software Development Kits (SDKs), or code developed by third parties. Frequently, location data is shared with these SDK providers to improve their service or in exchange for monetization or other services.

Location Analytics Providers (Internet of Things).Connected devices emit identifying information that allows them to be tracked, even when they are not actively connected to a network. This includes mobile phones (when Wi-Fi or Bluetooth are turned on), but also other Internet of Things (IoT) devices such as fitness trackers, smart toys, or vehicles. As a result, many airports, stadiums, and brick-and-mortar stores analyse this signal data to better understand when their busiest hours are, where the highest in-store foot-traffic is, what products customers show an interest in, or how long people are waiting in lines.

How is location data collected?

When most people think of location data, they think of GPS (Global Positioning System). In fact, GPS is only one of many ways to infer where devices are located, most used in some combination by carriers, OS’s, apps, and others. Commonly used methods include: GPS; Cell Towers; Wi-Fi Networks; and Beacons (among others). Each provides a different level of precision and can be used for different purposes:

GPS. Smartphones and other devices can detect location via satellite GPS independently of any telephone or internet reception, although a phone’s GPS chip it is only one sensor among many. The accuracy of GPS signals varies widely, and can be affected by weather, or physical interference. For example, it is much less accurate in urban areas, and especially poor for detecting specific locations inside large buildings. As a result, modern cell phones use GPS in combination with other forms of location signal (Wi-Fi, Bluetooth) at various times to create a more accurate location determination.

Cell Towers. Cell towers have a main function, which is to be used by carriers to provide cell service. As a result, mobile carriers (such as AT&T, Sprint, Verizon, T-Mobile, and many others in the United States) know approximately where devices are located because they know which cell towers the devices connect with. In addition to this core function, cell towers also emit unique “Cell Tower IDs,” which can be freely detected. There are many private and public databases of the Cell Tower IDs associated with mapped locations of known cell towers. As a result, the proximity of nearby cell towers (and the signal strength of their IDs) can be used to infer where a device is located. Find your local cell towers here (OpenCellID).

Wi-Fi Networks. Mobile devices can infer their location by scanning for nearby Wi-Fi networks. Nearby networks or “access points” might include, for example, neighbors’ Wi-Fi, or the Wi-Fi available in cafes and shops. Large databases exist of the unique identifiers (MAC addresses and SSID) of wireless routers and their known locations, with companies such as Mozilla and Combain reporting databases of millions of unique Wi-Fi networks. Despite the relatively public nature of these identifiers, most (but not all) commercial databases offer an Opt Out mechanism for users who prefer that their own network not be included. In 2011, Google created an approach for opting-out a particular access point from being included in its database, which involves appending the phrase “_nomap” to the end of the wireless router’s SSID. Mozilla similarly honors the _nomap method, but other databases do not, or offer their own opt-outs.

Bluetooth Beacons. Many apps are designed to detect their proximity to “beacons,” small radio transmitters that broadcast one-way Bluetooth signals. Beacons are inexpensive and can be attached to personal items (such as a person’s keys or wallet).They can also be installed at known locations, for example in a retail space or in front of a special display of products in a shop. In these cases, an app that a user has given permission to access Bluetooth can infer the device’s location or send proximity-based alerts or other content.

Combining Signals for Precision. Modern smartphones combine signals detected from the sources above to create a more precise location than any one signal (such as GPS) would provide alone. For example, iOS and Android harness the signals from many different sensors on the device, such as altimeter and accelerometer sensors, to provide consolidated a “Location Services” feature that offers highly precise location information to apps (with a user’s permission) and that users can control in Settings. Signals can also be combined to create predictive location services, for example to predict a future traffic jam, or show users upcoming attractions on their predicted path.

Ethical and Privacy Considerations for Location Data

Lawmakers are beginning to navigate whether and how to make use of the many sources of commercial location data. As they do so, we recommend that they consider: how and in what context location data was collected (described above), as well as: the fact and reasoning behind location data being classified as legally “sensitive” in most jurisdictions; challenges to effective “anonymization”; representativeness of the location dataset (taking into account potential bias and lack of inclusion of low-income and elderly subpopulations who do not own phones); and the unique importance of purpose limitation, or not re-using location data for other civil or law enforcement purposes after the pandemic is over.

Precise location data is legally sensitive. In most jurisdictions, location data is treated as a special category of data subject to greater protections, such as heightened security standards, and the requirement of affirmative express consent. For example, the longstanding approach of the US Federal Trade Commission (FTC) has been to require affirmative consent for location data. In 2016 the FTC settled with ad platform InMobi for failing to respect users’ choice not to agree to share location data with apps. Affirmative express consent is also a feature of most US legislative proposals from 2018-2020, such as the proposed California Privacy Rights Actof 2020; and U.S. Senator Cantwell’s proposed Consumer Online Privacy Rights Act. The U.S. Supreme Court has also held that location data carries unique sensitivities because of its ability to reveal highly sensitive data about people’s behaviors, patterns, and personal life, most recently in Carpenter v. United States (requiring law enforcement to obtain a warrant for cell site location data). In the EU, access to location data is normally regulated as a matter of confidentiality of telecommunications, by the strict provisions of the ePrivacy Directive which require individual consent (with very narrow exceptions).

Precise location data is very challenging to fully “anonymize.” Many government entities are interested in gaining access to “anonymous” or “anonymous and aggregated” location data, to observe population-level trends and movements. While in some cases this is possible, it is very challenging to make any dataset of individual precise location data truly “anonymous.” Even if unique identifiers are used instead of names, most people’s behavior can be easily traced back to them — for example, from the location of their home (where the device “dwells” at night). These challenges are not insurmountable, but policymakers should be very careful not to overpromise, and should treat location datasets as private, sensitive information. This means it should be subject to administrative, technical, and legal controls to ensure it remains protected and limited in who can access it and for what purposes.

Even fully “aggregate” location data can sometimes be revealing. At times, even highly aggregated data about patterns of large groups of people (such as high-level heat maps) can inadvertently reveal information. In 2017, an interactive “Global Heat Map” of movements of users of the Strava fitness app inadvertently revealed the locations of deployed military personnel at classified locations. This incident highlights some of the wider ethical issues associated with open data and default public data sharing. In FPF’s privacy assessmentof the City of Seattle, we recommended that companies thoroughly analyze all risks, not only risks to privacy and re-identification, but also to “group privacy,” and impact on other values such as data quality, fairness, equity, and public trust.

Representativeness and bias are uniquely important for location datasets. Unfair data processing practices involving geolocation fall disproportionately on marginalized and vulnerable communities. As such, heightened privacy protections are especially critical for these groups. Voluntary apps, for example, are more likely to capture affluent communities. For example, a mobile app ‘Street Bump’ was released by a municipal authority in an attempt to crowdsource data to work out which roads it needed to repair. However, affluent citizens downloaded the app more than people in poorer neighborhoods. As such, the system reported a disproportionate number of potholes in wealthier neighborhoods, and could have led the city to distribute or prioritize its repair services inequitably. In contrast, mobile phone carrier data may be more representative, but may miss more of the elderly, very young, or lowest income people who may not own cellphones.

Purpose limitation is uniquely important in a crisis. Purpose limitation is a core guiding light of the US-based Fair Information Practice Principles (FIPPs) and the EU’s General Data Protection Regulation (GDPR). Because location data is sensitive and challenging to truly “de-identify” (i.e. to significantly reduce or eliminate all privacy risks), there is a serious concern that once collected by a public health agency for pandemic tracking, it could be retained or used for other purposes. Governments should consider how the location data was collected in the first instance (with users’ knowledge or consent?), and if the decision is made to repurpose it for pandemic tracking, it should be clearly siloed for that purpose and not re-used or retained for other civil or law enforcement uses. Researchers or agencies should have clear policies and procedures in place that describe the operational and technical aspects of data management.

Conclusion

As COVID-19 continues to spread, we are facing global challenges to existing norms and best practices for data collection and use. In some cases, location and mobility data might provide one path to better understanding and combatting the pandemic. Governments and researchers seeking to address concerns and risks should ask: how and in what context the location data was collected; whether it is necessary and appropriate to achieving their goals (including whether the data is truly representative of the overall population and takes into account vulnerable populations such as the elderly); whether those goals can be achieved through less invasive means; and how that data will be used, safely stored, retained, or re-purposed following the conclusion of the pandemic.

Image Attribution: “My New York heat map” by matteoc is licensed under CC BY-NC-SA 2.0.

FPF’s blog post on the FTC Settlement with InMobi (June, 2016) (outlining the InMobi settlement for misrepresenting the fact that they were collecting location data via Wi-Fi networks);

FPF’s City of Seattle Open Data Risk Assessment (Jan. 30, 2018) (providing tools and guidance to the City of Seattle and other municipalities navigating the complex policy, operational, technical, organizational, and ethical standards that support privacy-protective open data programs.