Sunday, July 26, 2015

One of the most difficult privacy conundrums facing libraries today is how to deal with the data that their patrons generate in the course of using digital services. Commercial information services typically track usage in detail, keep the data indefinitely, and regard the data as a valuable asset. Data is used to make many improvements, often to personalize the service to best meet the needs of the user. User data can also be monetized; as I've written here before, many companies make money by providing web services in exchange for the opportunity to track users and help advertisers target them.

The downside to data collection is its impact on user privacy, something that libraries have a history of defending, even at the risk of imprisonment. Since the Patriot Act, many librarians have believed that the best way to defend user privacy against legally sanctioned intrusion is to avoid collecting any sensitive data. But as libraries move onto the web, that defense seems more and more like a Maginot Line, impregnable, but easy to get around. (I've written about an effort to shore up some weak points in library privacy defenses.)

At the same time, "big data" has clouded the picture of what constitutes sensitive data. The correlation of digital library use with web activity outside the library can impact privacy in ways that never would occur in a physical library. For example, I've found that many libraries unknowingly use Amazon cover images to enrich their online catalogs, so that even a user who is completely anonymous to the library ends up letting Amazon know what books they're searching for.

Recently, I've been serving on the Steering Committee of an initiative of NISO to try to establish a set of principles that libraries, providers of services to libraries, and publishers can use to support privacy patron privacy. We held an in-person meeting in San Francisco at the end of July. There was solid support from libraries, publishers and service companies for improving reader privacy, but some issues were harder than others. The issues around data collection and use attracted the widest divergence in opinion.

One approach that was discussed centered on classifying different types of data depending on the extent to which they impact user privacy. This also the approach taken by most laws governing privacy of library records. They mostly apply only to "Personally Identifiable Information" (PII), which usually would mean a person's name, address, phone number, etc., but sometimes is defined to include the user's IP address. While it's important to protect this type of information, in practice this usually means that less personal information lacks any protection at all.

I find that the data classification approach is another Maginot privacy line. It encourages the assumption that collection of demographics data – age, gender, race, religion, education, profession, even sexual orientation – is fair game for libraries and participants in the library ecosystem. I raised some eyebrows when I suggested that demographic groups might deserve a level of privacy protection in libraries, just as individuals do.

OCLC's Andrew Pace gave an example that brought this home for us all. When he worked as a librarian at NC State, he tracked usage of the books and other materials in the collection. Every library needs to do this for many purposes. He noticed that materials placed on reserve for certain classes received little or no usage, and he thought that faculty shouldn't be putting so many things on reserve, effectively preventing students not taking the class from using these materials. And so he started providing usage reports to the faculty.

In retrospect, Andrew pointed out that, without thinking much about it, he might have violated the privacy of students by informing their teachers that that they weren't reading the assigned materials. After all, if a library wants to protect a user's right to read, they also have to protect the right not to read. Nobody's personally identifiable information had been exposed, but the combination of library data – a list of books that hadn't circulated – with some non-library data – the list of students enrolled in a class and the list of assigned reading – had intersected in a way that exposed individual reading behavior.

What this example illustrates is that libraries MUST collect at least SOME data that impinges on reader privacy. If reader privacy is to be protected, a "privacy impact assessment" must be made on almost all uses of that data. In today's environment, users expect that their data signals will be listened to and their expressed needs will be accommodated. Given these expectations, building privacy in libraries is going to require a lot of work and a lot of thought.

But I've also become a volunteer for the Library Freedom Project, run by radical librarian Alison Macrina. The project I'm working on is the "Library Digital Privacy Pledge."

The Library Digital Privacy Pledge is a result of discussions on several listservs about how libraries and the many organizations that serve libraries could work cooperatively to (putting it bluntly) start getting our shit together with regard to patron privacy.I've talked to a lot of people about privacy in digital libraries, and there's remarkable unity about its importance. There's also a lot of confusion about some basic web privacy technology, like HTTPS. My view is that HTTPS sets a foundation for all the other privacy work that needs doing in libraries.Someone asked me why I'm so passionate about working on this. After a bit of thought, I realized that the one thing that gives me the most satisfaction in my professional life is eliminating bugs. I hate bugs. Using HTTP for library services is a bug.The draft of the Library Digital Privacy Pledge is open for comment and improvement for a few more weeks. We want all sorts of stakeholders to have a chance to improve it. The current text (July 12, 2015) is as follows:

The Library Digital Privacy Pledge of 2015

The Library Freedom Project is inviting the library community - libraries, vendors that serve libraries, and membership organizations - to sign the "Library Digital Privacy Pledge of 2015". For this first pledge, we're focusing on the use of HTTPS to deliver library services and the information resources offered by libraries. Building a culture of library digital privacy will not end with this 2015 pledge, but committing to this first modest step together will begin a process that won't turn back. We aim to gather momentum and raise awareness with this pledge; and will develop similar pledges in the future as appropriate to advance digital privacy practices for library patrons.

We focus on HTTPS as a first step because of its timeliness. At the end of July the Let's Encrypt initiative of the Electronic Frontier Foundation will launch a new certificate infrastructure that will remove much of the cost and technical difficulty involved in the implementation of HTTPS, with general availability scheduled for September. Due to a heightened concern about digital surveillance, many prominent internet companies, such as Google, Twitter, and Facebook, have moved their services exclusively to HTTPS rather than relying on unencrypted HTTP connections. The White House has issued a directive that all government websites must move their services to HTTPS by the end of 2016. We believe that libraries must also make this change, lest they be viewed as technology and privacy laggards, and dishonor their proud history of protecting reader privacy.

We protect each library user's right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.

It's not always clear how to interpret this broad mandate, especially when the everything is done on the internet. However, one principle of implementation should be clear and uncontroversial:

Library services and resources should be delivered, whenever practical, over channels that are immune to eavesdropping.

The current best practice dictated by this principle is as following:

Libraries and vendors that serve libraries and library patrons, should require HTTPS for all services and resources delivered via the web.

The Pledge for Libraries:

1. All web services and resources that this library directly controls will use HTTPS by the end of 2015.

2. Starting in 2016, this library will not sign or renew any contracts for web services or information resources that do not commit to use HTTPS by the end of 2016.

The Pledge for Service Providers (Publishers and Vendors):

1. All web services that we (the signatories) control will enable HTTPS by the end of 2015.

2. All web services that we (the signatories) offer will require HTTPS by the end of 2016.

The Pledge for Membership Organizations:

1. All web services that this organization directly controls will use HTTPS by the end of 2015.

2. We encourage our members to support and sign the appropriate version of the pledge.

Schedule:

This document will be open for discussion and modification until finalized by July 27, 2015. The finalized pledge will be published on the website of the Library Freedom Project. We expect a number of discussions to take place at the Annual Conference of the American Library Association and associated meetings.

The Library Freedom Project will broadly solicit signatures from libraries, vendors and publishers.

In September, in coordination with the Let's Encrypt project, the list of charter signatories will be made announced and broadly publicized to popular media.

FAQ

Q: What is HTTPS and what do we need to implement it?

A: When you use the web, your browser software communicates with a server computer through the internet. The messages back and forth pass through a series of computers (network nodes) that work together to pass messages. Depending on where you and the server are, there might be 5 computers in that chain, or there might be 50, each possibly owned by a different service provider. When a website uses HTTP, the content of these messages is open to inspection by each intermediate computer- like a postcard sent through the postal system, as well as by any other computer that shares a network those computers. If you’re connecting to the internet over wifi in a coffee shop, everyone else in the coffee shop can see the messages, too.

When a website uses HTTPS, the messages between your browser software and the server are encrypted so that none of the intermediate network nodes can see the content of the messages. It’s like sending sealed envelopes through the postal system.

Your web site and other library services may be sending sensitive patron data across the internet: often bar codes and passwords, but sometimes also catalog searches, patron names, contact information, and reading records. This kind of data ought to be inside a sealed envelope, not exposed on a postcard.

Most web server software supports HTTPS, but to implement it, you’ll need to get a certificate signed by a recognized authority. The certificate is used to verify that you are who you say you are. Certificates have added cost to HTTPS, but the Electronic Frontier Foundation is implementing a certificate authority that will give out certificates at no charge. To find out more, go to Let’s Encrypt.

Q: Why the focus on HTTPS?

A: We think this issue should not be controversial and is relatively easy to explain. Libraries understand that circulation information can’t be sent to patron on postcards. Publishers don’t want their content scooped up by unauthorized entities. Service providers don’t want to betray the trust of their customers.

Q. How can my library/organization/company add our names to the list of signatories?

A. Email us at pledge@libraryfreedomproject.org. Please give us contact info so we can verify your participation.

Q. Is this the same as HTTPS Everywhere?

A. No, that's a browser plug-in which enforces use of HTTPS.

Q. My Library won't be able to meet the implementation deadline. Can we add our name to the list once we've completed implementation?

A. Yes.

Q. A local school uses an internet filter that blocks https websites to meet legal requirements. Can we sign the pledge and continue to serve them?

A. Most of the filtering solutions include options that will whitelist important services. Work with the school in question to implement a work-around.

A. The developers behind the “Let’s Encrypt” initiative are ensuring that best practices are used in setting up the HTTPS configuration. If you are deploying HTTPS on your own, we encourage you to use the Qualys SSL Labs SSL Server Test service to review the performance of your implementation. You should strive for at least a “B” rating with no major security vulnerabilities identified in the scan.

Q. Our library subscribes to over 200 databases only a fraction of them currently delivered via https. We might be able to say we will not sign new contracts but the renewal requirement could be difficult for an academic library like ours. Can we sign the pledge?

A. No one is going to penalize libraries that aren’t able to comply 100% with their pledge. One way to satisfy the ethical imperatives of the pledge would be to clearly label for users the small number of insecure library resources that remain after 2016 as being subject to surveillance.

Q. I/We can contribute to the effort in a way that isn’t covered well by the pledges. Can I add another pledge?

A. We want to keep this simple, but we welcome your support. email us with your individualized statement, and we may include it on our website when signatories are announced.