Machine Interpretable Privacy Policies -- A fresh take on P3P

This work was conducted as part of the
PrimeLife project with
funding from the European Union's 7th Framework Programme.
The work reported is experimental and the examples shown are
ficticious, and taken from a working demonstator.

Introduction

The W3C Platform for Privacy Preferences (P3P) 1.0 was published
as a W3C Recommendation in July 2002 [1]. It defines a machine
interpretable format for websites to express their privacy
practices. A revised format (P3P 1.1) was published as a W3C
Note in November 2006, but failed to reach Recommendation status [2].

In summary, P3P describes the business name and address
responsible for the website, the dispute resolution procedures,
the means (if any) for users to access personal data collected by
the website, the kinds of data collected, the purposes it will be
used for, the data retention policy, and the recipients of the
data.

P3P supports a notice and consent model of privacy, where websites
describe their privacy policies and users can review the policy
and decide whether to walk away or to proceed to interact with the
site, and by so doing indicate their consent to that policy.

Rather than expecting users to review the privacy policy for each
website that they visit, a P3P enabled web browser performs an
automatic comparison of the user's recorded preferences with the
website's policy, and only alerts the user if there is a mismatch.

P3P provides plenty of flexibility in the representation of
privacy policies. This flexibility poses huge challenges for
expressing user preferences in a practical way for the purposes of
automatic comparison of preferences with policies. This problem
was recognized early on in the development of P3P, and partially
addressed through the introduction of compact policies. These were
intended to enable an efficient comparison process, but only cover
policy information related to cookies. The full P3P policy remains
the authoritative statement of policy.

Browser support for P3P has been largely limited to Microsoft's
Internet Explorer, which has included support for P3P compact
policies since IE6. Microsoft's dominant market share has encouraged
websites to implement P3P despite the lack of support from other
browser vendors.

A fresh take on P3P

With increasing public awareness of the amount of information
being collected by websites, it seems timely to consider new
approaches covering more than just cookies, whilst enabling a
practical treatment of the user interface for expressing privacy
preferences.

To investigate this, a Firefox extension was developed to look
at the issues involved. This had to support:

auto-generation of a human readable version of the policy

automatic comparison of the user preferences with the policy

automatic generation of a human readable report on any mismatches

user interface for viewing and changing user preferences

The scope was taken as the data that websites can collect from
HTTP request headers during a session. This includes the IP
address, cookies, the user agent header, information on user
preferences for language and data formats, the requested URL, the
date and time of day, and more.

To simplify the user interface for preferences, a subset of P3P
was chosen. This has the following object model:

The URI for the site's full (human readable) policy

The URI for instructions that users can follow to request or
decline to have their data used for a particular purpose
(optional)

The name of the business responsible for the website

The set of categories of collected data as defined by P3P 1.1

The set of purposes collected data can be used for as defined by P3P 1.1

The set of recipient types as defined by P3P 1.1

The data retention policy type as defined by P3P 1.1

Note this uses P3P's data categories rather than the taxonomy of
data items. This was found to be a much better fit to the needs for
describing the kinds of data collected from HTTP requests.

The simple object model allows the preferences user interface
to be provided as a set of grouped checkboxes, as shown below:

Accessing the policy and generating a human readable version

To reach a website, the user can type in a URL, follow a bookmark,
or follow a link from another site, e.g. on the results page from
query on a search engine like Google. The browser extension intercepts
the Firefox location change event and cancels the HTTP request before
it is sent. The extension then sends an HTTP HEAD request to the
website's root. The response is examined to find a refererence to
the site's generic privacy policy. This is represented as an HTTP
Link header (analogous to the HTML link element), e.g.

This header is easy to add to pages generated via PHP. The URI
for the policy is then dereferenced to obtain the policy itself.
Note P3P 1.0 defined a P3P HTTP header rather than using the
generic Link header. This is something that could be considered
if and when this work is brought into the standards track.

The object model for policies is decoupled from the on-the-wire
transfer format, but from a practical point of view it was easiest
to implement the transfer format with JSON [3]. Here is an
example policy in JSON:

Generating a human readable version of the privacy policy

The P3P 1.1 specification includes suggested text for each
element in the taxonomy. This was copied into JavaScript and used
to generate a human readable version of the policy. Here is an
example:

The same text was also used for constructing a dialog
summarising the mismatch between the user's preferences and
the website's policy, for example:

If the site's policy matched the user's preferences, or the
user decided to override the mismatch, the browser extension
then proceeds to relaunch the HTTP request for the original
URL.

The Firefox notification bar is shown when a site is found to
lack a privacy policy.

The Firefox notification bar is shown when a mismatch
is found.

Clicking "View details" brings up the warning dialog shown
earlier.

A local SQLite database was used to capture the user's
preferences, and to cache the policy for sites as a performance
optimization.

Anonymising Proxies

The act of making an HTTP HEAD request on a website's root
discloses the browser's external IP address. This can be avoided
by routing the request through an HTTP proxy. This could be
configured via a user preference.

Summary and suggestions for further work

This paper has described a fresh take on P3P that goes beyond
the limitations of compact policies, whilst still enabling a
simple user interface for setting preferences. The object model
lends itself to the use of JSON as a policy transfer format.
The restricted semantics for a machine readable policy covering
data collected in HTTP requests, is supplemented by a link to
the site's full human readable policy.

A further consideration is the privacy policy for other kinds
of personal information collected by websites, for example,
credentials coupled to a user's public or partial identity. Can
the P3P taxonomies be extended to support these?

P3P and the approach described in this paper are couched in
legal terms relevant to the obligations extended by websites to
their users. Websites also have the challenge of operationalizing
privacy policies when it comes to controlling access and usages
of personal data in the website's backend. This suggests the need
for transforming privacy policies into data handling policies.
The PrimeLife project is looking at extending the XACML access
control language to cover data handling policies, see H5.3.2 [4].

Widespread support for machine readable privacy policies is
likely to involve a legislative mandate with measures in place to
ensure that sites conform to the policies they disclose. However,
this would only apply to the countries with the corresponding
laws. A way is needed to allow the browser to verify the
jurisdiction a given website is subject to. This could take the
form of digital certificates issued by national agencies.

A separate issue is many people aren't sufficiently motivated
to set privacy preferences. One reason is the desire to just get
to the website in question without having to bother with reviewing
the policy. Another is a lack of knowledge sufficient for an
informed decision. This points the way to the use of independent
third parties for help with setting privacy preferences, and for
monitoring the data handling practices of websites. Some progress
has been made with the latter in terms of a browser extension
(Privacy Dashboard) that tracks what information is collected by
the websites you visit, together with a means to set your
preferences on a site by site basis [5].