Libraries, computing, metadata, and more

Menu

At the first face-to-face meeting of the LITA Patron Privacy Technologies Interest Group at Midwinter, one of the attendees mentioned that they had sent out an RFP last year for library databases. One of the questions on the RFP asked how user passwords were stored — and a number of vendors responded that their systems stored passwords in plain text.

This is a repeatable response, by the way — much like the way a hammer strike to the patellar ligament instigates a reflexive kick, mention of plain-text password storage will trigger an instinctual wail from programmers, sysadmins, and privacy and security geeks of all stripes.

Call it the Vanilla Password Reflex?

I’m not suggesting that you should whisper “plain text passwords” into the ear of your favorite system designer, but if you are the sort to indulge in low and base amusements…

A recent blog post by Eric Hellman discusses the problems with storing passwords in plain text in detail. The upshot is that it’s bad practice — if a system’s password list is somehow leaked, and if the passwords are stored in plain text, it’s trivially easy for a cracker to use those passwords to get into all sorts of mischief.

This matters, even “just” for library reference databases. If we take the right to reader privacy seriously, it has to extend to the databases offered by the library — particularly since many of them have features to store citations and search results in a user’s account.

As Eric mentions, the common solution is to use a one-way cryptographic hash function to transform the user’s password into a bunch of gobbledegook.

For example, “p@ssw05d” might be stored as the following hash:

d242b6313f32c8821bb75fb0660c3b354c487b36b648dde2f09123cdf44973fc

To make it more secure, I might add some random salt and end up with the following salted hash:

To log in, the user has to prove that they know the password by supplying it, but rather than compare the password directly, the result of the one-way function applied to the password is compared with the stored hash.

How is this more secure? If a hacker gets the list of password hashes, they won’t be able to deduce the passwords, assuming that the hash function is good enough. What counts as good enough? Well, relatively few programmers are experts in cryptography, but suffice it to say that there does exist a consensus on techniques for managing passwords and authentication.

The idea of one-way functions to encrypt passwords is not new; in fact, it dates back to the 1960s. Nowadays, any programmer who wants to be considered a professional really has no excuse for writing a system that stores passwords in plain text.

Back to the “Vanilla Password Reflex”. It is, of course, not actually a reflex in the sense of an instinctual response to a stimulus — programmers and the like get taught, one way or another, about why storing plain text passwords is a bad idea.

Where does this put the public services librarian? Particularly the one who has no particular reason to be well versed in security issues?

At one level, it just changes the script. If a system is well-designed, if a user asks what their password is, it should be impossible to get an answer to the question. How to respond to a patron who informs you that they’ve forgotten their password? Let them know that you can change it for them. If they respond by wondering why you can’t just tell them, if they’re actually interested in the answer, tell them about one-way functions — or just blame the computer, that’s fine too if time is short.

However, libraries and librarians can have a broader role in educating patrons about online security and privacy practices: leading by example. If we insist that the online services we recommend follow good security design; if we use HTTPS appropriately; if we show that we’re serious about protecting reader privacy, it can only buttress programming that the library may offer about (say) using password managers or avoiding phishing and other scams.

There’s also a direct practical benefit: human nature being what it is, many people use the same password for everything. If you crack an ILS’s password list, you’ve undoubtedly obtained a non-negligible set of people’s online banking passwords.

I’ll end this with a few questions. Many public services librarians have found themselves, like it or not, in the role of providing technical support for e-readers, smartphones, and laptops. How often does online security come up during such interactions? How often to patrons come to the library seeking help against the online bestiary of spammers, phishers, and worse? What works in discussing online security with patrons, who of course can be found at all levels of computer savvy? And what doesn’t?

I invite discussion — not just in the comments section, but also on the mailing list of the Patron Privacy IG.

Am I saying that Koha service providers are imaginary creatures? Not at all — at the moment, there are 54 paid support providers listed on the Koha project’s website.

But not a one of them is “authorized”.

I bring this up because a friend of mine in India (full disclosure: who himself offers Koha consulting services) ran across this flyer by Avior Technologies:

The bit that I’ve highlighted is puffery at best, misleading at worst. The Koha website’s directory of paid support providers is one thing, and one thing only: a directory. The Koha project does not endorse any vendors listed there — and neither the project nor the Horowhenua Library Trust in New Zealand (which holds various Koha trademarks) authorizes any firm to offer Koha services.

If you want your firm to get included in the directory, you need only do a few things:

Contributing code, documentation, or anything else to the Koha project.

Having any current customers who are willing to vouch for you.

Being alive at present (although eventually, your listing will get pulled for lack of response to inquiries from Koha’s webmasters).

What does this mean for folks interested in getting paid support services? There is no shortcut to doing your due diligence — it is on you to evaluate whether a provider you might hire is competent and able to keep their customers reasonably happy. The directory on the Koha website exists as a convenience for folks starting a search for a provider, but beyond that: caveat emptor.

I know nothing about Avior Technologies. They may be good at what they do; they may be terrible — I make no representation either way.

But I do know this: while there are some open source projects where the notion of an “authorized” or “preferred” support provider may make some degree of sense, Koha isn’t such a project.

And that’s generally to the good of all: if you have Koha expertise or can gain it, you don’t need to ask anybody’s permission to start helping libraries run Koha — and get paid for it. You can fill niches in the market that other Koha support providers cannot or do not fill.

You can in time become the best Koha vendor in your niche, however you choose to define it.

But authority? It will never be bestowed upon you. It is up to you to earn it by how well you support your customers, and by how much you contribute to the global Koha project.

Shortly after it came to light that Adobe Digital Editions was transmitting information about ebook reading activity in the clear, for anybody to snoop upon, I asked a loaded question: does ALA have a role in helping to verify that the software libraries use protect the privacy of readers?

As with any loaded question, I had an answer in mind: I do think that ALA and LITA, by virtue of their institutional heft and influence with librarians, can provide significant assistance in securing library software.

I waited a bit, wondering how the powers that be at ALA would respond. Then I remembered something: an institution like ALA is not, in fact, a faceless, inscrutable organism. Like Soylent Green, ALA is people!

Well, maybe not so much like Soylent Green. My point is that despite ALA’s reputation for being a heavily bureaucratic, procedure-bound organization, it does offer ways for members to take up and idea an run with it.

And that’s what I did — I floated a petition to form a new interest group within LITA, the Patron Privacy Technologies IG. Quite a few people signed it… and it now lives!

Here’s the charge of the IG:

The LITA Patron Privacy Technologies Interest Group will promote the design and implementation of library software and hardware that protects the privacy of library users and maximizes user ability to make informed decisions about the use of personally identifiable information by the library and its vendors.

Under this remit, activities of the Interest Group would include, but are not necessarily limited to:

Providing a conduit for responsible disclosure of defects in software that could lead to exposure of library patron information.

Providing sample publicity materials for libraries to use with their patrons in explaining the library’s privacy practices.

I am fortunate to have two great co-chairs, Emily Morton-Owens of the Seattle Public Library and Matt Beckstrom of the Lewis and Clark Library, and I’m happy to announce that the IG’s first face-to-face meeting will at ALA Midwinter 2015 — specifically tomorrow, at 8:30 a.m. Central Time in the Ballroom 1 of the Sheraton in Chicago.

We have two great speakers lined up — Alison Macrina of the Library Freedom Project and Gary Price of INFODocket, and I’m very much looking forward to it.

But I’m also looking forward to the rest of the meeting: this is when the IG will, as a whole, decide how far to reach. We have a lot of interest and the ability to do things that will teach library staff and our patrons how to better protect privacy, teach library programmers how to design and code for privacy, and verify that our tools match our ideals.

Despite the title of this blog post… it’s by no means my effort alone that will get us anywhere. Many people are already engaging in issues of privacy and technology in libraries, but I do hope that the IG will provide one more point of focus for our efforts.

Share this:

As some of you already know, Marlene and I are moving from Seattle to Atlanta in December. We’ve moved many (too many?) times before, so we’ve got most of the logistics down pat. Movers: hired! New house: rented! Mail forwarding: set up! Physical books: still too dang many!

We could do it in our sleep! (And the scary thing is, perhaps we have in the past.)

One thing that is different this time is that we’ll be driving across the country, visiting friends along the way. 3,650 miles, one car, two drivers, one Keurig, two suitcases, two sets of electronic paraphernalia, and three cats.

Who wants to lay odds on how many miles it will take each day for the cats to lose their voices?

Fortunately Sophia is already testing the cats’ accommodations:

I will miss the friends we made in Seattle, the summer weather, the great restaurants, being able to walk down to the water, and decent public transportation. I will also miss the drives up to Vancouver for conferences with a great bunch of librarians; I’m looking forward to attending Code4Lib BC next week, but I’m sorry to that our personal tradition of American Thanksgiving in British Columbia is coming to an end.

As far as Atlanta is concerned, I am looking forward to being back in MPOW’s office, having better access to a variety of good barbecue, the winter weather, and living in an area with less de facto segregation.

It’s been a good two years in the Pacific Northwest, but much to my surprise, I’ve found that the prospect of moving back to Atlanta feels a bit like a homecoming. So, onward!

Share this:

I recently circulated a petition to start a new interest group within LITA, to be called the Patron Privacy Technologies IG. I’ve submitted the formation petition to the LITA Council, and a vote on the petition is scheduled for early November. I also held an organizational meeting with the co-chairs; I’m really looking forward to what we all can do to help improve how our tools protect patron privacy.

But enough about the IG, let’s talk about the petition! To be specific, let’s talk about when the signatures came in.

I’ve been on Twitter since March of 2009, but a few months ago I made the decision to become much more active there (you see, there was a dearth of cat pictures on Twitter, and I felt it my duty to help do something about it). My first thought was to tweet the link to a Google Form I created for the petition. I did so at 7:20 a.m. Pacific Time on 15 October:

Since I wanted to gauge whether there was interest beyond just LITA members, I also posted about the petition on the ALA Think Tank Facebook group at 7:50 a.m. on the 15th.

By the following morning, I had 13 responses: 7 from LITA members, and 6 from non-LITA members. An interest group petition requires 10 signatures from LITA members, so at 8:15 on the 16th, I sent another tweet, which got retweeted by LITA:

By early afternoon, that had gotten me one more signature. I was feeling a bit impatient, so at 2:28 p.m. on the 16th, I sent a message to the LITA-L mailing list.

That opened the floodgates: 10 more signatures from LITA members arrived by the end of the day, and 10 more came in on the 17th. All told, a total of 42 responses to the form were submitted between the 15th and the 23rd.

The petition didn’t ask how the responder found it, but if I make the assumption that most respondents filled out the form shortly after they first heard about it, I arrive at my bit of anecdata: over half of the petition responses were inspired by my post to LITA-L, suggesting that the mailing list remains an effective way of getting the attention of many LITA members.

By the way, the petition form is still up for folks to use if they want to be automatically subscribed to the IG’s mailing list when it gets created.

The values of the various keys in that structure are obviously Base 64-encoded, but when run through a decoder, the result is just binary data, presumably the result of another layer of encryption.

Thus, we haven’t actually gotten much further towards verifying that ADE is sending only the data they claim to. That packet of data could be describing my progress reading that book purchased from Kobo… or it could be sending something else.

That extra layer of encryption might be done as protection against a real man-in-the-middle attack targeted at Adobe’s log server — or it might be obfuscating something else.

Either way, the result remains the same: reader privacy is not guaranteed. I think Adobe is now doing things a bit better than they were when they released ADE 4.0, but I could be wrong.

If we as library workers are serious about protection patron privacy, I think we need more than assurances — we need to be able to verify things for ourselves. ADE necessarily remains in the “unverified” column for now.

A couple hours ago, I saw reports from Library Journal and The Digital Reader that Adobe has released version 4.0.1 of Adobe Digital Editions. This was something I had been waiting for, given the revelation that ADE 4.0 had been sending ebook reading data in the clear.

ADE 4.0.1 comes with a special addendum to Adobe’s privacy statement that makes the following assertions:

It enumerates the types of information that it is collecting.

It states that information is sent via HTTPS, which means that it is encrypted.

It states that no information is sent to Adobe on ebooks that do not have DRM applied to them.

It may collect and send information about ebooks that do have DRM.

It’s good to test such claims, so I upgraded to ADE 4.0.1 on my Windows 7 machine and my OS X laptop.

First, I did a quick check of strings in the ADE program itself — and found that it contained an instance of “https://adelogs.adobe.com/” rather than “http://adelogs.adobe.com/”. That was a good indication that ADE 4.0.1 was in fact going to use HTTPS to send ebook reading data to that server.

Next, I fired up Wireshark and started ADE. Each time it started, it contacted a server called adeactivate.adobe.com, presumably to verify that the DRM authorization was in good shape. I then opened and flipped through several ebooks that were already present in the ADE library, including one DRM ebook I had checked out from my local library.

So far, it didn’t send anything to adelogs.adobe.com. I then checked out another DRM ebook from the library (in this case, Seattle Public Library and its OverDrive subscription) and flipped through it. As it happens, it still didn’t send anything to Adobe’s logging server.

Finally, I used ADE to fulfill a DRM ePub download from Kobo. This time, after flipping through the book, it did send data to the logging server. I can confirm that it was sent using HTTPS, meaning that the contents of the message were encrypted.

To sum up, ADE 4.0.1’s behavior is consistent with Adobe’s claims – the data is no longer sent in the clear and a message was sent to the logging server only when I opened a new commercial DRM ePub. However, without decrypting the contents of that message, I cannot verify that it only information about that ebook from Kobo.

But even then… why should Adobe be logging that information about the Kobo book? I’m not aware that Kobo is doing anything fancy that requires knowledge of how many pages I read from a book I purchased from them but did not open in the Kobo native app. Have they actually asked Adobe to collect that information for them?

Another open question: why did opening the library ebook in ADE not trigger a message to the logging server? Is it because the fulfillmentType specified in the .acsm file was “loan” rather than “buy”? More clarity on exactly when ADE sends reading progress to its logging server would be good.

Finally, if we take the privacy statement at its word, ADE is not implementing a page synchronization feature as some, including myself, have speculated – at least not yet. Instead, Adobe is gathering this data to “share anonymous aggregated information with eBook providers to enable billing under the applicable pricing model”. However, another sentence in the statement is… interesting:

While some publishers and distributors may charge libraries and resellers for 30 days from the date of the download, others may follow a metered pricing model and charge them for the actual time you read the eBook.

In other words, if any libraries are using an ebook lending service that does have such a metered pricing model, and if ADE is sending reading progress information to an Adobe server for such ebooks, that seems like a violation of reader privacy. Even though the data is now encrypted, if an Adobe ID is used to authorize ADE, Adobe itself has personally identifying information about the library patron and what they’re reading.

Adobe appears to have closed a hole – but there are still important questions left open. Librarians need to continue pushing on this.

It came to light on Monday that the latest version of Adobe Digital Editions is sending metadata on ebooks that are read through the application to an Adobe server — in clear text.

I’ve personally verified the claim that this is happening, as have lots of other people. I particularly like Andromeda Yelton’s screencast, as it shows some of the steps that others can take to see this for themselves.

In particular, it looks like any ebook that has been opened in Digital Editions or added to a “library” there gets reported on. The original report by Nate Hofffelder at The Digital Reader also said that ebook that were not known to Digital Editions were being reported, though I and others haven’t seen that — but at the moment, since nobody is saying that they’ve decompiled the program and analyzed exactly when Digital Editions sends its reports, it’s possible that Nate simply fell into a rare execution path. UPDATE 10 October 2014: Yesterday I was able to confirm that if an ereader device is attached to a PC and is recognized by ADE, metadata from the books on that device can also be sent in the clear.

This move by Adobe, whether or not they’re permanently storing the ebook reading history, and whether or not they think they have good intentions, is bad for a number of reasons:

By sending the information in the clear, anybody can intercept it and choose to act on somebody’s choice of reading material. This applies to governments, corporations, and unenlightened but technically adept parents. And as far as state actors are concerned – it actually doesn’t matter that Digital Editions isn’t sending information like name and email addresses in the clear; the user’s IP address and the unique ID assigned by Digital Editions will often be sufficient for somebody to, with effort, link a reading history to an individual.

The release notes from Adobe gave no hint that Digital Editions was going to start doing this. While Amazon’s Kindle platform also keeps track of reading history, at least Amazon has been relatively forthright about it.

The privacy policy and license agreement similarly did not explicitly mention this. There has been some discussion to the effect that if one looks at those documents closely enough, that there is an implied suggestion that Adobe can capture and log anything one chooses to do with their software. But even if that’s the case – and I’m not sure that this argument would fly in countries with stronger data privacy protection than the U.S. – sending this information in the clear is completely inconsistent with modern security practices.

Digital Editions is part of the toolchain that a number of library ebook lending platforms use.

The last point is key. Everybody should be concerned about an app that spouts reading history in the clear, but librarians in particular have a professional responsibility to protect our user’s reading history.

What does it mean in the here and now? Some specific immediate steps I suggest for libraries is to:

Publicize the problem to their patrons.

Officially warn their patrons against using Digital Editions 4.0, and point to work arounds like pointing “adelogs.adobe.com” to “127.0.0.1” in hosts files.

If they must use Digital Editions to borrow ebooks, to recommend the use of earlier versions, which do not appear to be spying on users.

However, there are things that also need to be done in the long term.

Accepting DRM has been a terrible dilemma for libraries – enabling and supporting, no matter how passively, tools for limiting access to information flies against our professional values. On the other hand, without some degree of acquiescence to it, libraries would be even more limited in their ability to offer current books to their patrons.

But as the Electronic Frontier Foundation points out, DRM as practiced today is fundamentally inimical to privacy. If, following Andromeda Yelton’s post this morning, we value our professional soul, something has to give.

In other words, we have to have a serious discussion about whether we can responsibly support any level of DRM in the ebooks that we offer to our patrons.

But there’s a more immediate step that we can take. This whole thing came to light because a “hacker acquaintance” of Nate’s decided to see what Digital Editions is sending home. And a key point? Once the testing starting, it probably didn’t take that hacker more than half an hour to see what was going on, and it may well have taken only five.

While the library profession probably doesn’t count very many professional security researchers among its ranks, this sort of testing is not black magic. Lots of systems librarians, sysadmins, and developers working for libraries already know how to use tcpdump and Wireshark and the like.

So what do we need to do? We need to stop blindly trusting our tools. We need to be suspicious, in other words, and put anything that we would recommend to our patrons to the test to verify that it is not leaking patron information.

This is where organizations like ALA can play an important role. Some things that ALA could do include:

Establishing a clearinghouse for reports of security and privacy violations in library software.

Distribute information on ways to perform security audits.

Do testing of library software in house and hire security researches as needed.

Provide institutional and legal support for these efforts.

That last point is key, and is why I’m calling on ALA in particular. There have been plenty of cases where software vendors have sued, or threatened to sue, folks who have pointed out security flaws. Rather than permitting that sort of chilling effect to be tolerated in the realm of library software, ALA can provide cover for individuals and libraries engaged in the testing that is necessary to protect our users.