PITTSBURGH—Figuring out what information websites are gathering about its users can be daunting, but a new privacy research project led by Carnegie Mellon University will make that task easier with computer tools that leverage the power of crowdsourcing, machine learning and natural language processing.

Working with law school researchers at Fordham and Stanford universities, computer scientists and behavioral economists at Carnegie Mellon will teach computer systems how to read and evaluate the lengthy, often-confusing and subject-to-change privacy policies now posted by major websites. These systems, together with crowdworkers, will then create user-friendly digests that highlight policy elements that matter most to people.

The 42-month, $3.75 million Usable Privacy Policy Project[1] is sponsored by the ational Science Foundation through its Secure and Trustworthy Cyberspace (SaTC) program[2]. It is one of three large Frontier awards the SF recently announced in support of collaborative, multi-university research and education activities to protect critical infrastructure and enable a more secure information society.

"People are increasingly aware that information about them — the sites they visit, the products they buy — is being collected, used, shared and recombined in all sorts of ways," said Norman Sadeh[3], a Carnegie Mellon professor of computer science and leader of the Usable Privacy Policy Project. "But they feel helpless. They have no practical way of finding out about these practices and making informed decisions. Hardly anybody reads privacy policies and, when they do, they usually can't answer even the most trivial questions about those policies."

Earlier attempts to solve this dilemma, whether by encouraging websites to post privacy policies in machine-readable language or by getting website operators to abide by new rules, have encountered significant resistance. Instead, Sadeh and his collaborators aim to work with what is already available — those rarely read, plain English privacy policies.

Crowdsourcing will be used to identify and extract those policy features that matter most to people. By itself, however, crowdsourcing would not scale to cover the breadth of the Internet and keep pace with changing policies. The researchers will therefore rely on computers to routinely scan through the policies, even though computers can't yet understand all the nuances of human language.

"We are going to develop algorithms that can automatically or semi-automatically read a privacy policy well enough to answer a few questions likely to be of interest to many users and also to policymakers," said oah Smith[4], associate professor of language technologies and machine learning at Carnegie Mellon. "This is an exciting opportunity to apply recent developments in robust natural language processing to an everyday dilemma."

One goal is to develop user interfaces or browser add-ons that can summarize the pertinent privacy characteristics of a website in a way that is easily understood. This might be as simple as a letter grade, said Joel Reidenberg, a Fordham University law professor. User studies will help fine-tune the new interfaces, ensuring that people understand and can effectively use privacy information.

Researchers also will develop methods using formal logic to provide deeper analysis of policies, identifying inconsistencies and conflicts that can inform ongoing legal and regulatory discussions. More information is available at the project website, http://www.usableprivacy.org[5].

The research team includes Lorrie Cranor[6], CMU associate professor of computer science and engineering and public policy, and Alessandro Acquisti[7], associate professor of information technology and public policy, who will contribute their expertise to the design and evaluation of novel privacy displays. Travis Breaux[8], CMU assistant professor of computer science, Aleecia McDonald, director of privacy at Stanford's Center for Internet & Society, and Fordham's Reidenberg will help in analyzing privacy policies and in informing ongoing public policy efforts in the area.