Ann Kilzer, Arvind Narayanan, Ed Felten, Vitaly Shmatikov, and I have released a new research paper detailing the privacy risks posed by collaborative filtering recommender systems. To examine the risk, we use public data available from Hunch, LibraryThing, Last.fm, and Amazon in addition to evaluating a synthetic system using data from the Netflix Prize dataset. The results demonstrate that temporal changes in recommendations can reveal purchases or other transactions of individual users.

To help users find items of interest, sites routinely recommend items similar to a given item. For example, product pages on Amazon contain a “Customers Who Bought This Item Also Bought” list. These recommendations are typically public, and they are the product of patterns learned from all users of the system. If customers often purchase both item A and item B, a collaborative filtering system will judge them to be highly similar. Most sites generate ordered lists of similar items for any given item, but some also provide numeric similarity scores.

Although item similarity is only indirectly related to individual transactions, we determined that temporal changes in item similarity lists or scores can reveal details of those transactions. If you’re a Mozart fan and you listen to a Justin Bieber song, this choice increases the perceived similarity between Justin Bieber and Mozart. Because similarity lists and scores are based on perceived similarity, your action may result in changes to these scores or lists.

Suppose that an attacker knows some of your past purchases on a site: for example, past item reviews, social networking profiles, or real-world interactions are a rich source of information. New purchases will affect the perceived similarity between the new items and your past purchases, possibility causing visible changes to the recommendations provided for your previously purchased items. We demonstrate that an attacker can leverage these observable changes to infer your purchases. Among other things, these attacks are complicated by the fact that multiple users simultaneously interact with a system and updates are not immediate following a transaction.

To evaluate our attacks, we use data from Hunch, LibraryThing, Last.fm, and Amazon. Our goal is not to claim privacy flaws in these specific sites (in fact, we often use data voluntarily disclosed by their users to verify our inferences), but to demonstrate the general feasibility of inferring individual transactions from the outputs of collaborative filtering systems. Among their many differences, these sites vary dramatically in the information that they reveal. For example, Hunch reveals raw item-to-item correlation scores, but Amazon reveals only lists of similar items. In addition, we examine a simulated system created using the Netflix Prize dataset. Our paper outlines the experimental results.

While inference of a Justin Bieber interest may be innocuous, inferences could expose anything from dissatisfaction with a job to health issues. Our attacks assume that a victim reveals certain past transactions, but users may publicly reveal certain transactions while preferring to keep others private. Ultimately, users are best equipped to determine which transactions would be embarrassing or otherwise problematic. We demonstrate that the public outputs of recommender systems can reveal transactions without user knowledge or consent.

Unfortunately, existing privacy technologies appear inadequate here, failing to simultaneously guarantee acceptable recommendation quality and user privacy. Mitigation strategies are a rich area for future work, and we hope to work towards solutions with others in the community.

Worth noting is that this work suggests a risk posed by any feature that adapts in response to potentially sensitive user actions. Unless sites explicitly consider the data exposed, such features may inadvertently leak details of these underlying actions.

The Center for Information Technology Policy (CITP) at Princeton University seeks candidates for positions as visiting faculty members or researchers, or postdoctoral research associates for the 2010-2011 academic year.

About CITP

Digital technologies and public life are constantly reshaping each other—from net neutrality and broadband adoption, to copyright and file sharing, to electronic voting and beyond.

Realizing digital technology’s promise requires a constant sharing of ideas, competencies and norms among the technical, social, economic and political domains.

The Center for Information Technology Policy is Princeton University’s effort to meet this challenge. Its new home, which opened in September 2008, is a state of the art facility designed from the ground up for openness and collaboration. Located at the intellectual and physical crossroads of Princeton’s engineering and social science communities, the Center’s research, teaching and public programs are building the intellectual and human capital that our technological future demands.

To see what this mission can mean in practice, take a look at our website, at http://citp.princeton.edu.

About the Search

The Center has secured limited resources from a range of sources to support visiting faculty, scholars or policy experts for up to one-year appointments during the 2010-2011 academic year. We are interested in applications from academic faculty and researchers as well as from individuals who have practical experience in the policy arena. The rank and status of the successful applicant(s) will be determined on a case-by-case basis. We are particularly interested in hearing from faculty members at other universities and from individuals who have first-hand experience in public service in the technology policy area.

The successful applicant(s) will conduct research, engage in public programs, and may teach a seminar during their appointment subject to review and approval by the Dean of the Faculty. They’ll play an important role at a pivotal time in the development of this new center. They may be appointed to a visiting faculty or visiting fellow position, a term-limited research position, or a postdoctoral appointment, depending on qualifications.

We are happy to hear from anyone who works at the intersection of digital technology and public life. In addition to our existing strengths in computer science and sociology, we are particularly interested in identifying engineers, economists, lawyers, civil servants and policy analysts whose research interests are complementary to our existing activities.

If you are interested, please submit a CV and cover letter, stating background, intended research, and salary requirements, to https://jobs.princeton.edu.

Princeton University is an equal opportunity employer and complies with applicable EEO and affirmative action regulations. For information about applying to Princeton and voluntarily self-identifying, please see http://www.princeton.edu/dof/about_us/dof_job_openings/

It’s Election Day in New Jersey. As usual, I visited several polling places in Princeton over the last few days, looking for unguarded voting machines. It’s been well demonstrated that a bad actor who can get physical access to a New Jersey voting machine can modify its behavior to steal votes, so an unguarded voting machine is a vulnerable voting machine.

This time I visited six polling places. What did I find?

The good news — and there was a little — is that in one of the six polling places, the machines were properly secured. I’m not sure where the machines were, but I know that they were not visible anywhere in the accessible areas of the building. Maybe the machines were locked in a storage room, or maybe they hadn’t been delivered yet, but anyway they were probably safe. This is the first time I have ever found a local polling place, the night before the election, with properly secured voting machines.

At the other five polling places, things weren’t so good. At three places, the machines were unguarded in an area open to the public. I walked right up to them and had private time with them. In two other places, the machines were visible from outside the building and protected only by an outside door with an easily defeated lock. I didn’t defeat the locks myself — I wasn’t going to cross that line — but I’ll bet you could have opened them quickly with tools you probably have in your car.

The final scorecard: ten machines totally unprotected, eight machines poorly protected, two machines well-protected. That’s an improvement, but then again any protection at all would have been an improvement. We still have a long way to go.

Freedom to Tinker is hosted by Princeton's Center for Information Technology Policy, a research center that studies digital technologies in public life. Here you'll find comment and analysis from the digital frontier, written by the Center's faculty, students, and friends.