5. Data Schemas

A data schema is a
description of a set of allowed data types. P3P includes a way to
describe data schemas so that services can communicate to user agents
about the types of data they collect. A data schema is a hierarchical
set of data types, which are specific classes of data a
service might collect.

P3P
1.1. provides a new format for expressing P3P data schemas in a
simpler and more standardized way than P3P 1.0. The new format uses the
XML Schema Definition Standard (XSD)
format which can be validated against an XML schema. Backward compatibility is addressed as follows:
1. New custom data schemas SHOULD be written using only the new format.
2. Policy instances conforming to custom schemas written in the new format may be written using only the new format data elements, but MUST be published with the backward compatibility transform of the data elements in the policy so that the data elements are readable by P3P1.0 user agents. The schema itself may be published only in the new format however. This means that P3P1.0 user agents will be able to parse data elements but they will not be able to validate them. The backward compatibility transform translates the extension datatype elements into P3P1.0 compatible elements.

Datatype
sub-elements are organized into a class-like hierarchy of increasing
specificity. A data element automatically includes all of the data
elements below it in the hierarchy. For example, the data element
representing "the user's name" includes the data elements
representing "the user's given name", "the user's
family name", and so on. The hierarchy is mirrored by the
hierarchy of XML elements used to express them. For example,
<user><name><given/></name></user>,
<user><name><family/></name></user>
etc…

P3P
has defined an XML schema called the P3P
base data schemathat
includes a large number of data elements commonly used by services.
Note that the data element names specified in the base data schema or
in extension data schemas may be used for purposes other than P3P
policies. For example, Web sites may use these names to label HTML
form fields. By referring to data the same way in P3P policies and
forms, automated form-filling tools can be better integrated with P3P
user agents.

5.1. How to express data types in P3P Policies

5.1.1

The following is an
example instance of a data schema compliant datatype element, full
details of the data schema hierarchy are given in section 5.4.

This example shows the
following aspects of how to use a data schema in P3P 1.1

Root
is always an element name datatype.

Under
this are nested elements describing the types of data that the P3P
Statement is about. The hierarchy of these elements is describted in
detail in section 5.4

Greater levels of detail in the specification of a data type are
expressed by using allowed children of a particular class, according
the the scheme expressed in taken to mean subclassing so for
instance in the above example, clickstream data is a subclass of
dynamic data.

Natural language descriptors of the meaning of these elements are
found in the Data Schema and therefore should not be included in
policy instances. The human readable descriptor corresponds to an
XSD comment beneath the element it refers to, of format:

<annotation><documentation>

This element can
describe the capture of HTTP Protocol Information such as header
values

</documentation>

<appinfo>

HTTP Protocol
Information

</appinfo>

</annotation>

User agents should use these descriptors when rendering data types
in human readable format (for example in a human readable
translation of a P3P policy).

Most data
elements have categories assigned to them when they are defined in a
data schema. See 5.1.2 Categoriesfor more information on categories.

5.1.2 Categories in P3P Data Schemas

Categories may be
assigned at the lowest level of a data type definition. The purpose
of a category is to delimit a certain data type – for instance
clickstream data may be delimited as navigation, computer or
demographic type clickstream data.

In this capacity as
delimiters, the category can only be assigned as a leaf child (with
only other category siblings) of the hierarchy. for example the
following is correct syntax.

<datatype>

<dynamic>

<cookies>

<CATEGORY>preference</CATEGORY>

</cookies>

</dynamic>

</datatype>

Whereas the following is
not:

<datatype>

<dynamic>

<cookies/>

<CATEGORY>preference</CATEGORY>

</dynamic>

</datatype>

Categories
are simply appended to the lowest level of the hierarchy specified.
Most of the elements in the base data schema are so called "fixed"data elements: they belong to one or
at most two category classes. By assigning a category invariably to
elements or structures in the base data schema, services and users
are able to refer to entire groups of elements simply by referencing
the corresponding category. For example, using [APPEL],
the privacy preferences exchange language, users can write rules that
warn them when they visit a site that collects any data element in a
certain category.

If an element or
structure belongs to multiple categories, multiple elements
referencing the appropriate categories can be used. For example, the
following piece of XML can be used to declare that the data elements
in user.name have both category "physical" and
"demographic":

<datatype>

<dynamic>

<cookies>

<CATEGORY>preference</CATEGORY>

<CATEGORY>demographic</CATEGORY>

</cookies>

</dynamic>

</datatype>

Please
note that the category classes of fixed data elements/structures can
notbe overridden, for example by writing
APPEL rules or policies that assign a different category to a known
fixed base data element. User agents MUST ignore such categories and
instead use the original category (or set of categories) listed in
the schema definition. User agents MAY preferably alert the user that
a fixed data element is used together with a non-standard category
class.

5.7.2 Variable-Category Data
Elements/Structures

Not
all data elements in the base data schema belong to a pre-determined
category. Some can contain information from a range of categories,
depending on a particular situation. Such elements/structures are
called variable-category
data elements/structures(or
"variable data element" for short). Although most variable
data elements in the P3P base data schema are combined in the dynamicelement set, they can appear in any
data set, even mixed with fixed-category
data elements.

Variable
category elements are distinguished by the fact that the schema
defining them (including the base data schema) does list an explicit
category attribute, otherwise the element/structure becomes fixed.
For example when specifying the "Year" Data
Element,
which can take various categories depending on the situation (e.g.
when used for a credit card expiration date vs. for a birth date),
the following schema definition can be used:

<element name="year"/>
<!-- Variable Data Structure-->

5.7.3 Referencing External Schemas

External schemas may
reference elements in other schemas simply by referring to another
namespace. For example

<schema
targetNamespace='http://www.example.com/Report'

xmlns='http://www.w3.org/1999/XMLSchema'

xmlns:p3pBDS='http://www.w3.org/P3P/BDS.xml'>

<xs:element name="creditCard">

<xs:element
ref="p3pBDS:date"/>

</xs:element>

</schema>

But note that in contrast
to the P3P 1.0 data schema format, it is not possible to alter the
category assignations of reused elements. If this is required, they
must be redefined. Schema extensions may still reference elements in
other schemas (e.g. reuse of date element), but they may not assign a
category (you cannot change the properties of a referenced element
within XSD)]

Note
that while user preferences can list such variable data elements
without any additional category information (effectively expressing
preferences over anyusage of this element), services MUST
always explicitly specify the categories that apply to the usage of a
variable data element in their particular policy. This information
has to appear as a category element in the corresponding DATAelement listed in the policy, for
example as in:

<datatype>

<dynamic>

<cookies>

<CATEGORY>preference</CATEGORY>

</cookies>

</dynamic>

</datatype>

where
a service declares that cookies are used to recognize the user at
this site (i.e. category Unique
Identifiers).

If a service wants to
declare that a data element is in multiple categories, it simply
declares the corresponding categories as in:

<datatype>

<dynamic>

<cookies>

<CATEGORY>preference</CATEGORY>

<CATEGORY>uniqueid</CATEGORY>

</cookies>

</dynamic>

</datatype>

With
the above declaration a service announces that it uses cookies both
to recognize the user at this site andfor storing user preference data. Note
that for the purpose of P3P there is no difference whether this
information is stored in two separate cookies or in a single one.

Finally,
note that categories can be inherited as well: Categories
inherit downward when a field is structured, but only into fields
which have no predefined category. Therefore,
we suggest to schema authors that they do their best to insure that
all applicable categories are applied to new data elements they
create.

5.1.3 Natural Language description of data
elements

Natural language
descriptions of the meaning of data elements may be found within
<annotation> children of the element definition in the schema.
These may be of 2 kinds:

<description>:
a long description for documentation purposes

<appinfo>:
a short description for display in data capture summaries.

These descriptions
are intended to be used by user-agents in creating translations to
human readable policies. They should not however be included in
machine readable policies.

5.2 Defining new Schemas

Services may declare
new data elements by creating and publishing their own data schemas
expressed using XSD according to certain rules over and above the
rules of correct XSD syntax . This section describes the rules for
creating these schemas. These new data schemas are then referred to
simply using the xmlns attribute for the elements below the datatype
element. [NOTE – do we need to declare the BDS namespace every
time we use it]

5.2.1 Allowed Structure of XSD Schemas

New schemas can be
defined using the XML Schema Definition language [ref], they must
follow the following rules:

Elements which are
children of a given element must correspond to data subclasses. For
example <classicalmusicpreference> would be defined as an
allowed subelement of <musicalpreference>. Subclassing is
inherited so <baroquemusicpreference> is understood to be a
subclass of <musicalpreference> because it is a subelement of
<classicalmusicpreference> which is a subelement of
<musicalpreference>

5.2.2 Defining categories in custom schemas.

To declare that any
category may be used, just use the following:<element
name="CATEGORY" minoccurs="0" maxoccurs="*"
type="allCategories" />

Categories must follow a
bubble-up rule in that any categories assigned to a lower level
element must also be assigned to any of its ancestors. This mirrors
real-world semantics where possible categories assigned to classes
are upwards inherited.

Categories must be
defined as restrictions on a global data type for the schema which
is defined as a root level element. For example in the BDS, the
following element is defined:

Definition of allowed
categories for a particular datatype are then taken as restrictions
of this basic data type – for example:

<element name="CATEGORY"
minoccurs="0" maxoccurs="*">

<simpleType>

<restriction
base="allCategories">

<pattern value="computer"
/>

</restriction>

</simpleType>

</element>

Note that this must be
placed as a leaf node in the hierarchy (i.e. it should have no
children) and it must only have category siblings.

5.2.3 Natural Language Annotations in Data
Schemas

P3P data typing XSD's
alllow two annotation fields for natural language description. Below
each element description, schema creators MAY place an xsd:annotation
element with the following child elements:

<description>:
a long description for documentation purposes

<appinfo>: a
short description for display in policy summaries.

Services publishing a
data schema MAY wish to translate these fields into multiple
languages. The annotation element's contents MAY be translated, but
the element name MUST NOT be translated - this field needs to stay
constant across translations of a data schema.

If a service is
going to provide a data schema in multiple natural languages, then it
SHOULD examine the Accept-Language
HTTP request-header on requests for that data schema to pick the best
available alternative.

5.3 Persistence of data schemas

An
essential requirement on data schemas is the persistence
of data schemas:
data schemas that can be fetched at a certain URI can only be changed
by extending the data schema in a backward-compatibleway (that is to say, changing the data
schema does not change the meaning of any policy using that schema).
This way, the URI of a policy acts in a sense like a unique
identifier for the data elements and structures contained therein:
any data schema that is not backward-compatible must
therefore use a new different URI.

Note that a useful
application of the persistence of data schema is given for example in
the case of multi-lingual sites: multiple language versions
(translations) of the same data schema can be offered by the server,
using the HTTP "Content-Language"
response header field to properly indicate that a particular language
has been used for the data schema.

5.4 Structure of Base Data Schema

The XML schema is not
designed to be human readable, but in writing policies, it is
convenient to have a picture of the hierarchy of categories
available. The following gives a description of the different
elements available. This hierarchy and the XML schema specify for
each element what its possible parents and children may be. This is
clarified in the diagrams in Section 5.5. All P3P-compliant user
agent implementations MUST be aware of the Base Data Schema. Each
table below specifies the a level of the Base Data Schema. A
diagrammatic representation of the tree can be found at
http://p3p.jrc.it/P3PTaxonomy/dataschematransformer.xml. The
following also specifies the categories associated and the display
names shown to users. More than one category may be associated with a
fixed data element. However, each base data element is assigned to
only one category whenever possible. Data schema designers are
recommended to do the same.

The following four
classes are the root classes

5.4.1 Level 1.

5.4.1.1 Level 1 Elements 1. - User

The
userdata set includes general information
about the user. The following table gives its allowed subclasses.

Note, that each of these
possible subelements this data set includes elements that have
further possible subclasses coming under level 2.

5.4.1.2 Level 1 Elements 2. - Thirdparty

The
thirdpartydata set allows users and businesses
to provide values for a related third party. This can be useful
whenever third party information needs to be exchanged, for example
when ordering a present online that should be sent to another person,
or when providing information about one's spouse or business partner.
Such information could be stored in a user repository alongside the
userdata set. User agents may offer to
store multiple such thirdpartydata sets and allow users to select
the appropriate values from a list when necessary.

The
allowed subclasses of thirdpartydata set is identical to those of the
userdata set. See section Level 1 elements
User for details.

5.4.1.3 Level 1 Elements 3. – Business

The
businessdata set features a subset of userdata relevant for describing legal
entities. In P3P1.0, this data set is primarily used for declaring
the policy entity, although it should also be applicable to
business-to-business interactions. The following table gives its
allowed subclasses.

5.4.1.4 Level 1 Elements 4. – Dynamic

In
some cases, there is a need to specify data elements that do not have
fixed values that a user might type in or store in a repository. In
the P3P base data schema, all such elements are grouped under the
dynamicdata set. Sites may refer to the types
of data they collect using the dynamic data set only, rather than
enumerating all of the specific data elements.

These elements are often
implicit in navigation or Web interactions. They should be used with
categories to describe the type of information collected through
these methods. A brief description of each element follows.

clickstream

The
clickstreamelement
is expected to apply to practically all Web sites. It represents the
combination of information typically found in Web server access logs:
the IP address or hostname of the user's computer, the URI of the
resource requested, the time the request was made, the HTTP method
used in the request, the size of the response, and the HTTP status
code in the response. Web sites that collect standard server access
logs as well as sites which do URI path analysis can use this data
element to describe how that data will be used. Web sites that
collect only some of the data elements listed for the clickstreamelement MAY choose to list those
specific elements rather than the entire dynamic.clickstreamelement. This allows sites with more
limited data-collection practices to accurately present those
practices to their visitors.

http

The
httpelement
contains additional information contained in the HTTP protocol. See
the definition of the httpinfostructure for descriptions of specific
elements. Sites MAY use the dynamic.httpfield as a shorthand to cover all the
elements in the httpinfostructure if they wish, or they MAY
reference the specific elements in the httpinfostructure.

clientevents

The
clienteventselement represents data about how the
user interacts with their Web browser while interacting with a
resource. For example, an application may wish to collect information
about whether the user moved their mouse over a certain image on a
page, or whether the user ever brought up the help window in a Java
applet. This kind of information is represented by the
dynamic.clientevents data element. Much of this interaction record is
represented by the events and data defined by the Document Object
Model (DOM) Level 2 Events [DOM2-Events].
The clienteventsdata element also covers any other
data regarding the user's interaction with their browser while the
browser is displaying a resource. The exception is events which are
covered by other elements in the base data schema. For example,
requesting a page by clicking on a link is part of the user's
interaction with their browser while viewing a page, but merely
collecting the URL the user has clicked on does not require declaring
this data element; clickstreamcovers that event. However, the DOM
event DOMFocusIn(representing the user moving their
mouse over an object on a page) is not covered by any other existing
element, so if a site is collecting the occurrence of this event,
then it needs to state that it collects the dynamic.clientevents
element. Items covered by this data element are typically collected
by client-side scripting languages, such as JavaScript, or by
client-side applets, such as ActiveX or Java applets. Note that while
the previous discussion has been in terms of a user viewing a
resource, this data element also applies to Web applications which do
not display resources visually - for example, audio-based Web
browsers.

cookies

The
cookieselement
should be used whenever HTTP cookies are set or retrieved by a site.
Please note that cookiesis a variable
data elementand
requires the explicit declaration of usage categories in a policy.

miscdata

The
miscdataelement
references information collected by the service that the service does
not reference using a specific data element. Categories have to be
used to better describe these data: sites MUST reference a separate
miscdataelement
in their policies for each category of miscellaneous data they
collect.

searchtext

The
searchtextelement
references a specific type of solicitation used for searching and
indexing sites. For example, if the only fields on a search engine
page are search fields, the site only needs to disclose that data
element.

interactionrecord

The
interactionrecord
element should be used if the server is keeping track of the
interaction it has with the user (i.e. information other than
clickstream data, for example account transactions, etc).

5.4.2 Lower Level Reuseable Elements

The following
summarises the elements and their allowed children which are used at
various levels of the taxonomy for providing more detailed
description of data elements. Note the full structure of allowed
elements is summarized graphically in 5.6.

5.4.2.1 Date

The
dateelement and its children refer to
dates. Since date information can be used in different ways,
depending on the context, all dateinformation is tagged as being of
"variable" category (see Section
5.7.2).
For example, schema definitions can explicitly set the corresponding
category in the element referencing this data structure, where
soliciting the birthday of a user might be "Demographic and
Socioeconomic Data", while the expiration date of a credit card
might belong to the "Purchase Information" category.

The
"time zone" information is for example described in the
time standard [ISO8601].
Note that "date.ymd" and "date.hms" can be used
to fast reference the year/month/day and hour/minute/second blocks
respectively.

5.4.2.2 Names

The
element
and its childrenspecifies information about the naming
of a person or organization or person. An organization name will
generally not extend to the subelements listed below.

Name

Category

Allowed
Descendents defined by

Appinfo

prefix

Demographic and
Socioeconomic Data

unstructured

Name Prefix

given

Physical Contact
Information

unstructured

Given Name
(First Name)

family

Physical Contact
Information

unstructured

Family Name
(Last Name)

middle

Physical Contact
Information

unstructured

Middle Name

suffix

Demographic and
Socioeconomic Data

unstructured

Name Suffix

nickname

Demographic and
Socioeconomic Data

unstructured

Nickname

5.4.2.3 Logins

The
login
element and its childrenrefer to information (IDs and
passwords) for computer systems and Web sites which require
authentication. Note that this data element should not be used for
computer systems or Web sites which use digital certificates for
authentication: in those cases, the certificatestructure should be used.

login

Category

Allowed
Descendents defined by

Appinfo

id

Unique
Identifiers

unstructured

Login ID

password

Unique
Identifiers

unstructured

Login Password

The "id" field
represents the ID portion of the login information for a computer
system. Often, user IDs are made public, while passwords are kept
secret. IDs do not include any type of biometric authentication
mechanisms.

The "password"
field represents the password portion of the login information for a
computer system. This is a secret data value, usually a character
string, that is used in authenticating a user. Passwords are
typically kept secret, and are generally considered to be sensitive
information

5.4.2.4 Certificates

The
certificateelement
and its childrenrefer
to identity certificates (like, for example, X.509).

certificate

Category

Allowed
Descendents defined by

Appinfo

key

Unique
Identifiers

unstructured

Certificate Key

format

Unique
Identifiers

unstructured

Certificate
Format

The "format"
field is used to represent the information of an IANA registered
public key or authentication certificate format, while the "key"
field is used to represent the corresponding certificate key.

5.4.2.6 Contact Information

The
contactelement
and its childrenrefer
to contact information. Services can specify precisely which set of
data they need, postal, telecommunication, or online address
information.

contact

Category

Allowed
Descendents defined by

Appinfo

postal

Physical Contact
Information, Demographic and Socioeconomic Data

postal

Postal Address
Information

telecom

Physical Contact
Information

telecom

Telecommunications
Information

online

Online Contact
Information

online

Online Address
Information

5.4.2.7 Telephone Numbers

The
telephonenumelement
and its childrenrefer
to the characteristics of a telephone number.

telephonenum

Category

Allowed
Descendents defined by

Appinfo

intcode

Physical Contact
Information

unstructured

International
Telephone Code

loccode

Physical Contact
Information

unstructured

Local Telephone
Area Code

number

Physical Contact
Information

unstructured

Telephone Number

ext

Physical Contact
Information

unstructured

Telephone
Extension

comment

Physical Contact
Information

unstructured

Telephone
Optional Comments

5.4.2.8 Postal Information

The
postalelement
and its childrenrefer
to a postal mailing address.

postal

Category

Allowed
Descendents defined by

Appinfo

name

Physical Contact
Information, Demographic and Socioeconomic Data

personname

Name

street

Physical Contact
Information

unstructured

Street Address

city

Demographic and
Socioeconomic Data

unstructured

City

stateprov

Demographic and
Socioeconomic Data

unstructured

State or
Province

postalcode

Demographic and
Socioeconomic Data

unstructured

Postal Code

country

Demographic and
Socioeconomic Data

unstructured

Country Name

organization

Demographic and
Socioeconomic Data

unstructured

Organization
Name

The
"country" field represents the information of the name of
the country (for example, one among the countries listed
in [ISO3166]).

5.4.2.9 Telecommunication Information

The
telecomstructure specifies telecommunication
information about a person.

telecom

Category

Allowed
Descendents defined by

Appinfo

telephone

Physical Contact
Information

telephonenum

Telephone Number

fax

Physical Contact
Information

telephonenum

Fax Number

mobile

Physical Contact
Information

telephonenum

Mobile Telephone
Number

pager

Physical Contact
Information

telephonenum

Pager Number

5.4.2.10 Online Information

The
onlineelement
and its childrenrefer
to online information about a person or legal entity.

online

Category

Allowed
Descendents defined by

Appinfo

email

Online Contact
Information

unstructured

Email Address

uri

Online Contact
Information

unstructured

Home Page
Address

Elements for Access Logs

Two
structures used for representing forms of Internet addresses are
provided. The urielement
and its childrenrefer
to Universal Resource Identifiers (URI), which are defined in [URI].
The ipaddrelement
and its childrenrefer
to IP addresses and Domain Name System (DNS) hostnames.

The
authority of a URI is defined as the authoritycomponent in [URI].
The stem of a URI is defined as the information contained in
the portion of the URI after the authority and up to (and including)
the first '?' character in the URI, and the querystring is the
information contained in the portion of the URI after the first '?'
character. For URIs which do not contain a '?' character, the stem is
the entire URI, and the querystring is empty.

Since
URI information can be used in different ways, depending on the
context, all the fields in the uristructure are tagged as being of
"variable" category. Schema definitions MUST explicitly set
the corresponding category in the element referencing this data
structure.

5.4.2.12 ipaddr

The
ipaddrelement
and its childrenrefer
to the hostname and IP address of a system.

ipaddr

Category

Allowed
Descendents defined by

Appinfo

hostname

Computer
Information

unstructured

Complete Host
and Domain Name

partialhostname

Demographic

unstructured

Partial Hostname

fullip

Computer
Information

unstructured

Full IP Address

partialip

Demographic

unstructured

Partial IP
Address

The
hostnameelement is used to represent
collection of either the simple hostname of a system, or the full
hostname including domain name. The partialhostnameelement represents the information of
a fully-qualified hostname which has had at
leastthe
host portion removed from the hostname. In other words, everything up
to the first '.' in the fully-qualified hostname MUST be removed for
an address to quality as a "partial hostname".

The
fullipelement represents the information of
a full IP version 4 or IP version 6 address. The partialipelement represents an IP version 4
address (only - not a version 6 address) which has had at
leastthe
last 7 bits of information removed. This removal MUST be done by
replacing those bits with a fixed pattern for all visitors (for
example, all 0's or all 1's).

Certain Web sites are
known to make use not of the visitor's entire IP address or hostname,
but rather make use of a reduced form of that information. By
collecting only a subset of the address information, the site visitor
is given some measure of anonymity. It is certainly not the intent of
this specification to claim that these "stripped" IP
addresses or hostnames are impossible to associate with an individual
user, but rather that it is significantly more difficult to do so.
Sites which perform this data reduction MAY wish to declare this
practice in order to more-accurately reflect their practices.

5.4.2.13 Log Info

The
loginfoelement
and its childrenrefer
to information typically stored in Web-server access logs.

loginfo

Category

Allowed
Descendents defined by

Appinfo

uri

Navigation and
click-stream data

uri

URI of Requested
Resource

timestamp

Navigation and
click-stream data

date

Request
Timestamp

clientip

Computer
Information, Demographic and Socioeconomic Data

ipaddr

Client's IP
Address or Hostname

other.httpmethod

Navigation and
click-stream data

unstructured

HTTP Request
Method

other.bytes

Navigation and
click-stream data

unstructured

Data Bytes in
Response

other.statuscode

Navigation and
click-stream data

unstructured

Response Status
Code

The
resource in the HTTP request is captured by the urifield. The time at which the server
processes the request is represented by the timestampfield. Server implementations are free
to define this field as the time the request was received, the time
that the server began sending the response, the time that sending the
response was complete, or some other convenient representation of the
time the request was processed. The IP address of the client system
making the request is given by the clientipfield.

The
otherdata fields represent other
information commonly stored in Web server access logs.
other.httpmethodis the HTTP method (such as GET,
POST,
etc) in the client's request. other.bytesindicates the number of bytes in the
response-body sent by the server. other.statuscodeis the HTTP status code on the
request, such as 200, 302, or 404 (see section 6.1.1 of [HTTP1.1]
for details).

5.4.2.13 Log Info Other HTTP Protocol Information

The
httpinfoelement
and its childrenrefer
to information carried by the HTTP protocol which is not covered by
the loginfostructure.

httpinfo

Category

Allowed
Descendents defined by

Appinfo

referer

Navigation and
click-stream data

uri

Last URI
Requested by the User

useragent

Computer
Information

unstructured

User Agent
Information

The
useragentfield represents the information in
the HTTP User-Agentheader (which gives information about
the type and version of the user's Web browser), and/or the HTTP
accept*
headers.

The
refererfield represents the information
in the HTTP Refererheader, which gives information about
the previous page visited by the user. Note that this field is
misspelled in exactly the same way as the corresponding HTTP header.

The full structure of the
allowed hierarchy expressed by the data schema is summarized in the
following four tables:

5.5 P3P Base Data Schema Visual Representation

The following gives a
human readable view of the XML schema and shows which elements are
permitted in policies.

5.6 Using Data Elements

P3P offers Web sites a
lot of flexibility in how they describe the types of data they
collect.

Sites may
describe data generally using the <dynamic><miscdata/></dynamic>element and the appropriate
categories.

Sites may describe data specifically
using the data elements defined in the base data schema.

Sites may describe data specifically
using data elements defined in new data schemas.

Any of these three
methods may be combined within a single policy.

By
using the <dynamic><miscdata/></dynamic>element, sites can specify the types
of data they collect without having to enumerate every individual
data element. This may be convenient for sites that collect a lot of
data or sites belonging to large organizations that want to offer a
single P3P policy covering the entire organization. However, the
disadvantage of this approach is that user agents will have to assume
that the site might collect any data element belonging to the
categories referenced by the site. So, for example, if a site's
policy states that it collects <dynamic><miscdata/></dynamic>of the physical contact information
category, but the only physical contact information it collects is
business address, user agents will nonetheless assume that the site
might also collect telephone numbers. If the site wishes to be clear
that it does not collect telephone numbers or any other physical
contact information other than business address, than it should
disclose that it collects
<user><business-info><contact><postal/></contact></business-info></user>.
Furthermore, as user agents are developed with automatic form-filling
capabilities, it is likely that sites that enumerate the data they
collect will be able to better integrate with these tools.

By defining new data
schemas, sites can precisely specify the data they collect beyond the
base data set. However, if user agents are unfamiliar with the
elements defined in these schemas, they will be able to provide only
minimal information to the user about these new elements. The
information they provide will be based on the category and display
names specified for each element.

Regardless
of whether a site wishes to make general or specific data
disclosures, there are additional advantages to disclosing specific
elements from the <dynamic/>data set. For example, by disclosing
<dynamic><cookies/></dynamic>a site can indicate that it uses
cookies and explain the purpose of this use. User agent
implementations that offer users cookie control interfaces based on
this information are encouraged. Likewise, user agents that by
default do not send the HTTP_REFERER header, might look for the
<dynamic><http><referrer/></http></dynamic>element in P3P policies and send the
header if it will be used for a purpose the user finds acceptable.

5.7 Semantics of P3P Data Schemas

XML does not define a
formal semantics except when used within RDF. The use of XSD to
define classes of data types however necessarily implies a
correspondence between XML elements and real-world classes of data,
which is therefore a form of semantics. We have chosen not to use a
language with formal semantics because of the sparse support for such
languages as RDF and OWL. [This section could do with some
elaboration]. Some attempt has been made to translate the BDS into
RDF and this issue will be examined by the P3P 2.0 specification WG.