Abstract

This document describes a framework for describing P3P policies according
to the semantics described in the P3P 1.0 specification, using the
technologies of W3C's Semantic Web initiative. It is supported by an open
source demonstration implementation of a P3P user agent using the
framework.

Status of This Document

This is a Draft for consideration of the P3P Specification Working
Group and of interested parties from the semantic Web community. This Note is
not intended to be a normative specification. Instead, it describes how a
semantic-web architecture may be applied to P3P. Its principal contributions
are an ontology which attempts to capture the semantics of P3P, a description
of some of the decisions taken in this process and an architecture for
user-agents using this code. It also shows how to implement this ontology in
RDF, how to deal with policy-references and a suggested new rule exchange
format using RDQL, which I have called SWAPPEL. An example implementation
based on a proxy architecture has also been built.

No commitment is made to update this Note. However, if you have comments,
please send them to the <giles.hogben@[non-spammers delete]jrc.it>

A list of current W3C Recommendations and other technical documents can be
found at http://www.w3.org/TR/.

1.0 Introduction

This document specifies an ontology and RDQL rule format for P3P and a
framework for using P3P within semantic web applications. The following is a
summary P3P taken from the P3P 1.0 specification document.

The Platform for Privacy Preferences Project (P3P) enables Web sites to
express their privacy practices in a standard format that can be retrieved
automatically and interpreted easily by user agents. P3P user agents will
allow users to be informed of site practices (in both machine- and
human-readable formats) and to automate decision-making based on these
practices when appropriate. Thus users need not read the privacy policies
at every site they visit.... It provides a way for a Web site to encode its
data-collection and data-use practices in a machine-readable XML format
known as a P3P policy.

The use of a formal ontology format such as OWL in conjunction with RDQL
(or other RDF query language) has several advantages over the standard
implementations of P3P semantics.

It allows for a flexible and general rule format which can search
for arbitrary rdf triples. These triples are easily mappable to english
language sentences in various registers (e.g. Legal, Business,
End-User)

It provides rich possibilities for reasoning about policies using
derivations from the P3P ontology, or additional ontologies (e.g.
Jurisdictional, data typing, legal etc...). For example legal hints and
auditing engines may be more easily created. This would also provide a
more solid basis for the several applications which have proposed to use
P3P in negotiation scenarios.

It allows the display of automated derivation explanations (why a
particular decision was made) using reasoning tracing.

Ontologies can formally define equivalences between concepts in
different domains (E.g. Legal, Business and End-User).

OWL has built in extensibility so that user engines can plug in
other ontologies to add semantic richness without altering the P3P
specification (e.g. Custom data types, geographical and jurisdictional
regions etc...)

OWL and RDF can describe much richer semantics than XML. For example
OWL describes the problematic relationships of the P3P1.0 data schema
with ease and in standard syntax.

This document builds on work done in An RDF Schema for P3P. Perhaps
the most notable improvement is the modelling of the P3P Base Data Schema as
a class hierarchy, we have cut the size of the the ontology by approximately
5 fold.

1.1 Background to standards used: P3P, RDF, RDFS, OWL and RDQL

These are the specifications used in this document. The diagram at the end
of this section gives some idea of how they fit together.

The Platform for Privacy Preferences
Project (P3P) enables Web sites to express their privacy practices in a
standard format that can be retrieved automatically and interpreted easily by
user agents. P3P user agents will allow users to be informed of site
practices (in both machine- and human-readable formats) and to automate
decision-making based on these practices when appropriate. Thus users need
not read the privacy policies at every site they visit.... It provides a way
for a Web site to encode its data-collection and data-use practices in a
machine-readable XML format known as aP3P
policy.

This is the language used to describe P3P
Policies. It can be used to describe arbitrary sentences in a logically
rigorous manner. It is the basis of the semantic web because it is the syntax
for describing how resources relate to each other.

RDF
Vocabulary Description Language (RDFS)
Used as part of the P3P ontology to represent the relations between the
classes.
RDFS provides a very simple description logic vocabulary based on RDF. It
defines classes and properties that may be used to describe classes,
properties and other resources (E.g. Subclassof, Subpropertyof).

A more sophisticated description logic syntax for RDF language for
describing RDF Vocabularies. While RDFS provides a basic vocabularies for
describing classes and subclasses, OWL provides a more complete
description framework. In the P3P ontology, we have used OWL's
disjunction property to describe certain properties of the P3P data
schema. This is the property that if a data type is for example Online,
it cannot be physical, which cannot be described using RDFS alone.

SWAPPEL is a prototype specification for privacy
preference rules for P3P policy evaluation and for preference exchange.
It consists of a HEAD based on the syntax of APPEL. It represents the
behavior, description, prompt and promptmessage as child tags of the rule
BODY instead of as attributes of the RULE tag. Instead of APPEL's bespoke
policy query matching syntax, it uses a standard query language to
match elements of the P3P policy. The query language may be specified,
but it is RDQL by default.

2.0 Overview and
Examples

This section provides an overview of an RDF Policy and SWAPPEL ruleset and
describes how they are used within a typical interaction. Note that the
concepts used in both the RDF Policy and in the SWAPPEL ruleset's RDQL query
are defined by the OWL ontology. The following diagram shows how the above
standards are used within the experimental implementation of P3P.

Figure 1. Component interactions

In short, The OWL ontology specifies the concepts that may be used within
any RDF P3P policy, in much the same way that the P3P 1.0 XML Schema
specifies the way XML tags may be used within P3P1.0. One important
difference is that the OWL ontology makes statements about the semantics of
the concepts used and is not purely syntactic. Using the OWL ontology, RDF
P3P policies are written which are then linked to resources using standard
P3P policy reference files. We have not specified an RDF syntax for Policy
reference files because we do not think the semantics of these are complex
enough to justify using RDF. The lighter weight XML syntax is adequate in
this case.

User agents define preferences using SWAPPEL, which define sets of rules
for matching against a resource's RDF P3P policy. These rules perform an RDQL
query on the policy and if they match, the behavior for the rule is
executed.

3.0 Sample RDF Policy and Explanation

The following is an example RDF Policy which uses the concepts defined in
the OWL ontology. RDF is a directed graph language for describing arbitrary
statements based on well-defined semantics. The following is a sketch of the
graph that is represented by the RDF below. Note that the unnamed nodes are
blank nodes say e.g. “a resource of type 'Entity'”. These are represented
in the RDF with

The policy describes an entity with a certain address and contact
information, which collects Http and Clickstream information, keeps it only
for the stated purpose and does not distribute it to third parties.

4.0 Example SWAPPEL Ruleset

This ruleset looks for any policy which descibes an entity collecting data
of type "Dynamic" for marketing purposes. The rule follows the Event
Condition Action (ECA) model, with the Event being assumed as a resource
access. The condition is contained in the HEAD section of the rule which is
an RDQL query. The action is contained in the BODY section of the rule which
specifies what action is to be performed on the condition that the HEAD
section of the rule matches. For simplicity I have used a
first-come-first-served model of conflict resolution (i.e. The first rule in
the sequence that fires is the one which is executed). The specification
follows the same conflict resolution model as APPEL. The behaviors and the
prompt mechanism are also the same as in APPEL, except that they are
contained as CDATA within tags named behavior, description etc... This gives
more flexibility for extension.

Note that since the Http and Clickstream data described in the policy are
subclasses of Dynamic data in the ontology, the decision engine (which can
make inferences based on the ontology) automatically returns a match for the
first rule against the above policy.

5.0 OWL P3P Ontology Notes

The following were important design decisions taken in modelling P3P
semantics with OWL. The owl ontology is given in section 9.

Concepts used in policy reference files are not included. There
appears to be little added-value in changing the policy reference file
syntax and associated protocols. Policies in RDF documents may still be
referred to

Inclusion of policy concept. We decided to include the concept of a
policy because of the need to describe the expiration date for the
policy. We link the policy with the statements it makes by the
relationship of Policy DESCRIBES Collectionpractice and Policy
Controller Entity. P3P 1.0 Statements are then RDF statements about the
data collection practices of this entity in the given context.

From a formal semantics point of view, ideally, a policy would be
linked to the statements in the policy by reifying the statements within
the policy and then specifying Policy <states>
<rdf:Statement>. However we considered this too cumbersome for
policy developers and parsers. We have therefore adopted the semantic of
Policy <states> Collection-practice <collects> Data etc... We
considered the semantic Policy <applies-to> Entity <performs>
Collection etc.... but this does not allow for the case where the same
document contains different policies which make different statements
about the same entity.

We wanted to avoid at all costs the situation of having a concept
for every possible data type in the P3P1.0 Base data schema because this
is highly redundant and makes the ontology extremely cumbersome. At first
sight, the bds is just a hierarchy of subclasses. However, on closer
inspection, this is not so. The BDS is modelled on “Data Structures”
- Structured data types which are subsumed by multiple superstructures.
So for example Email is subsumed by both User and ThirdParty. It is
therefore not a subclass of either. Furthermore the semantic of the P3P
policy is not “we collect all/some values from” a data class, but
“we may collect any value from” a data class. This turns out to be
very complex if not impossible to model using pure OWL, as it involves
hypothetical individuals.

We therefore decided to model it instead using a special P3P relation
“may-include-members-of” which holds between the classes of the data
typing ontology. We used OWL's transitiveproperty to model the resulting
Base Data Schema Hierarchy.

One further requirement was missing however. That is the requirement
to specialize the data types. Statements such as Data-Group x
<may-include-members-of> User && Data-Group x
<may-include-members-of> Email do not per se carry the semantic
that therefore Data-Group may not include members of all transitively
included members of User (e.g. contact info etc... ). To get round this
problem, we added the following custom rule to JENA's OWL-MICRO
ruleset:

This rule states that only the transitive closure of the leaf data-type
classes should be included in inference models.

This is a somewhat inefficient method of achieving the required results as
it involves creating transitive closure over the may-include-members-of
property and then deleting a large number of properties. We are in the
process of developing a rule which will replace this add and remove method by
a rule which selects only that part of the transitive closure which is
needed.

A prototype rule to achieve this is approximately [still to be tested]:

?X type Data-Group,

?X may-include A,

?X may-include B,

[A may-include C,C
may-include D -> A may-include D],

A may-include E,

F may-include G,

notEqual(E,A),

notEqual(E,B)

-> X may-include E

We also still need to add validation
rules to this section. (E.g. check that only allowed members of the
hierarchy are combined).

We still need to add validation rules to this section. (E.g. check
that only allowed members of the hierarchy are combined).

b. Categories are modelled as another kind of beast from data types in
the P3P1.0 base data schema, but in fact they are just another class of
data. The bubble-up rule used for categories in the P3P1.0 BDS appears to
be problematic but in fact it can be modelled simply by assigning the
leaves of the main hierarchy as “possible members” of the category
classes. For example loccode is subclassed to the physical category and
URI is assigned to the online category. The bubble-up rule then comes for
free as properties (including subclassing) are inherited upwards.

URI is modelled as a separate class when it is a datatype, within
the data typing schema, but when it is the address for example of the
disputes resolution service, it is modelled using rdf:Resource. The
reason is that we wanted to maintain the consistency of the data typing
schema, but we wanted to make the uri's used elsewhere immediately
machine-processable.

6.0 SWAPPEL Specification

6.1 Introduction

SWAPPEL is a prototype specification for privacy preference
rules for P3P policy evaluation and for preference exchange. It consists of a
HEAD based on the syntax of APPEL. It represents the behavior, description,
prompt and promptmessage as child tags of the rule BODY instead of as
attributes of the RULE tag. Instead of APPEL's bespoke policy query matching
syntax, it uses a standard query language to match elements of the P3P
policy. The query language may be specified, but it is RDQL by default. The
format is also conceived as a possible substitute for APPEL in relation to
P3P policies by using XPATH instead of RDQL as the query language. This is in
any case configurable.

6.2 Rule System

APPEL rules will be replaced by ECA rules of the form. Within
the Event Condition Action conceptual framework, the rule format is
summarized in pseudo-code as follows:

A default rule is always included in the set. This is a rule without a
condition (head) section.

We may consider adding a priority to the behavior to replace
the first-come-first-served conflict resolution algorithm currently used. For
a sample ruleset, see section 4.0 of this specification.

Rule Syntax

The RULESET Element

Description attribute

A short natural language explanation that can be displayed by
the user agent when the ruleset gets selected, or to help debugging a
rulefile.

The RULE Element

Basic element of the ruleset containing a HEAD (if not a
default rule) and BODY – specifying conditions under which a certain
behavior should be carried out by the calling program.

The behavior Element (mandatory element)

The value of this element denotes the behavior that should be
carried out by the calling program if the expressions match the evidence.

The description Element

A short natural language explanation that can be displayed by
the user agent when the rule gets executed, or to help debugging a rulefile.
Note that a separate promptmsg should be used in case the user should be
prompted for a decision.

The prompt Element

Indicates whether a prompt message should be displayed
to the user. If this element is not present, no prompt message is displayed.
If it is present, then its value is a short natural
language explanation or question that can be displayed by the user agent when
the user should be prompted for a decision. Note that the description field
can be used to hold a brief summary of the rule for debugging or
informational purposes.

The HEAD element

This expresses the condition which must be fulfilled for the
BODY element to be executed. The condition, contained in the value of this
tag is expressed in a query in the query language described by the language
attribute. If the query returns a non-null result set then the rule is
determined to have fired. The default language is RDQL.

Language attribute

The language of the query which determines the satisfaction
of the rule condition.

Issues

Using the term of RDQL syntax involves angle brackets which must be
escaped in XML – this is cumbersome for rule writers.