Share with others:

Long-winded online privacy policies that, for all most consumers know, could include permission to sell personal data or requests for the souls of their first born, could soon be stripped down to their bare essentials.

Consumers could see breakdowns of the rarely read policies within two years through the Usable Privacy Policy Project, an initiative of the National Science Foundation that is being spearheaded by Carnegie Mellon University.

The 42-month, $3.75 million project aims to use crowdsourcing, natural language processing and machine learning to build a program that analyzes pages of privacy policies for consumers. In addition to a research team led by CMU computer science professor Norman Sadeh, the project will feature work from law school researchers at Fordham and Stanford University.

Mr. Sadeh said a major Web browsing company already has expressed interest in using the final product that comes out of the research as an accessory to its products.

"We are going to develop algorithms that can automatically or semi-automatically read a privacy policy well enough to answer a few questions likely to be of interest to many users and also to policy makers," said Noah Smith, associate professor of language technologies and machine learning at Carnegie Mellon, in a press release. "This is an exciting opportunity to apply recent developments in robust natural language processing to an everyday dilemma."

Mr. Sadeh said the primary goal is to use crowdsourcing to determine the privacy issues that are most important to consumers, then create programs to examine whether companies' privacy policies address the issues in a meaningful manner.

"If you look at privacy policies you'll find very often that they have common patterns. They may be recycling text from a privacy policy from another site. We can look into patterns to identify text that deals with these types of issues, to see which sites engage in a certain practice, which one doesn't engage and which one does not say anything about the issue," he said.

Mr. Sadeh said by the project's end, the team is expected to create a user interface or browser add-on that scans a website's privacy policy and gives it a score based on how well it has explained its position regarding the sale of personal data and other privacy concerns.

"This gives, finally, power to customers to make informed decisions regarding what website they feel comfortable tracking them and which ones they don't," said Mr. Sadeh.

As the project grows and policies change, Mr. Sadeh said the program will use a feature that zooms in on important sections, then use crowdsourcing so that engaged users can read selections of a privacy policy and report the content of the message to researchers.

In addition to helping consumers, Mr. Sadeh said the project could also help policy makers determine which privacy concerns most affect consumers and to identify which company's privacy policies are compliant with current laws.

He also said the project will help researchers and policy makers follow the ever-changing formats of privacy policies to study exactly how much consumers are being asked to share.

"Companies like Google and Facebook are very innovative and as they innovate they need to continually revise their privacy policies. This can help [researchers] have a historic perspective on how these privacy policies change, what's been added, what's been relaxed, and enable policy makers and civilian stakeholders to act faster to respond to consumer's needs," he said.