Original of 15 August 1997, revs. Sep 1999, Dec 2005, Aug 2006, 21 October
2013, 24 July 2016This document is published only in this form - as a page
on my personal web-site.Its
Google
citation-count passed 200 in mid-2016.

This paper provides a brief introduction to the topics of data
surveillance and information privacy, and contains my definitions of key terms
in the area. It is intended as a starter-resource for people who want to break
into the area; and as a reference resource for people who've already broken
in.

Introduction

We use a lot of words without thinking about what we mean by them. When the
words are 'eat' or 'zebra', it probably doesn't matter too much. But when we
use abstract words that are full of contentiousness, like 'discrimination' and
'ethnicity', all hope of a rational discussion disappears unless we achieve
some degree of similarity between our understanding of the terms.

'Privacy' is an abstract and contentious notion. This document provides
definitions of privacy and related terms, in a sequence that is intended to
provide an introduction to the topic.

Privacy

A more substantial treatment of what privacy means is in
Clarke
(2006). The purpose of the document that you're reading is to provide a
brief overview.

People often think of privacy as some kind of right. Unfortunately, the
concept of a 'right' is a problematical way to start, because a right seems to
be some kind of absolute standard. What's worse, it's very easy to get
confused between legal rights, on the one hand, and natural or moral rights, on
the other. It turns out to be much more useful to think about privacy as one
kind of thing (among many kinds of things) that people like to have lots of.

Privacy is the interest that individuals have in
sustaining 'personal space', free from interference by other people and
organisations.

[Clarification of July 2016: I extracted the definition in the early 1990s,
directly from Morison (1973). The original referred to "a 'personal space'".
But that could be misinterpreted as being a singular space, whereas Morison
meant it in a generic sense, or perhaps in a multi-dimensional sense. I
therefore prefer to omit the "a", in order to convey the broader notion of
"sustaining personal space".]

Drilling down to a deeper level, privacy turns out not to be a single interest,
but rather has multiple dimensions:

privacy of the person, sometimes referred to as 'bodily
privacy' This is concerned with the integrity of the individual's body.
Issues include compulsory immunisation, blood transfusion without consent,
compulsory provision of samples of body fluids and body tissue, and compulsory
sterilisation;

privacy of personal behaviour. This relates to all
aspects of behaviour, but especially to sensitive matters, such as sexual
preferences and habits, political activities and religious practices, both in
private and in public places. It includes what is sometimes referred to as
'media privacy';

privacy of personal communications. Individuals claim an
interest in being able to communicate among themselves, using various media,
without routine monitoring of their communications by other persons or
organisations. This includes what is sometimes referred to as 'interception
privacy'; and

privacy of personal data. Individuals claim that data
about themselves should not be automatically available to other individuals and
organisations, and that, even where data is possessed by another party, the
individual must be able to exercise a substantial degree of control over that
data and its use. This is sometimes referred to as 'data privacy' and
'information privacy'.

With the close coupling that has occurred between computing and
communications, particularly since the 1980s, the last two aspects have become
closely linked. This is the primary focus of public attention, and of this
document. It is useful to use the term 'information privacy'
to refer to the combination of communications privacy and data privacy.

[Addition of October 2013:] During the period since about 2005, a further
disturbing trend has emerged, which gives rise to a fifth dimension that wasn't
apparent when I structured this in the mid-1990s;

privacy of personal experience. Individuals gather
experience through buying books and newspapers and reading the text and images
in them, buying or renting recorded video, conducting conversations with other
individuals both in person and on the telephone, meeting people in small
groups, and attending live and cinema events with larger numbers of people.
Until very recently, all of these were ephemeral, none of them generated
records, and hence each individual's small-scale experiences, and their
consolidated large-scale experience, were not visible to others. During the
first decade of the 21st century, reading and viewing activities have migrated
to screens, are performed under the control of corporations, and are recorded;
most conversations have become 'stored electronic communications', each event
is recorded and both 'call records' and content may be retained; many
individuals' locations are tracked, and correlations are performed to find out
who is co-located with whom and how often; and events tickets are paid for
using identified payment instruments. This massive consolidation of
individuals' personal experience is available for exploitation, and is exploited

With the patterns becoming more complex, a list may no longer be adequate,
and a diagram may help understanding:

Why
Privacy is Important

There are many different reasons that people put forward to support the
proposition that privacy is important. The following provides a classification
and brief overview:

psychologically, people need private space. This applies
in public as well as behind closed doors and drawn curtains. We need to be
able to glance around, judge whether the people in the vicinity are a threat,
and then perform actions that are potentially embarrassing, such as breaking
wind, and jumping for joy;

sociologically, people need to be free to behave, and to
associate with others, subject to broad social mores, but without the continual
threat of being observed. Otherwise we reduce ourselves to the appalling,
unhuman, constrained context that was imposed on people in countries behind the
Iron Curtain and the Bamboo Curtain;

economically, people need to be free to innovate.
International competition is fierce, so countries with high labour-costs need
to be clever if they want to sustain their standard-of-living. And cleverness
has to be continually reinvented;

politically, people need to be free to think, and argue,
and act. Surveillance chills behaviour and speech, and threatens democracy.

Privacy
Protection

An important implication of the definition of privacy as an interest is that
it has to be balanced against many other, often competing, interests:

the privacy interests of one person or category of people may conflict
with some other interest of their own, and the two may have to be traded off
(e.g. privacy against access to credit, or quality of health care);

the privacy interest of one person or category of people may conflict with
the privacy interests of another person, or another category of people (e.g.
health care information that is relevant to multiple members of a family); and

the privacy interest of one person or category of people may conflict with
other interests of another person, category of people, organisation, or society
as a whole (e.g. creditors, an insurer, and protection of the public against
serious diseases).

Hence:

Privacy Protection is a process of finding
appropriate balances between privacy and multiple competing interests.

Because there are so many dimensions of the privacy interest, and so many
competing interests, at so many levels of society, the formulation of detailed,
operational rules about privacy protection is a difficult exercise. The most
constructive approach is to:

establish general principles;

apply the principles to all organisations;

create effective sanctions against non-compliance;

develop operational codes of practice, consistent with the Principles,
applying to specific industry sectors and to particular applications of
technology;

establish dispute-resolution procedures at the levels of individual
organisations and industry sectors; and

bind the framework together by making the principles, codes and sanctions
enforceable through quasi-judicial (tribunal) and court procedures.

Information
Privacy

The previous section introduced information privacy as a combination of the
privacy of personal communications and of personal data:

Information Privacy is the interest an individual
has in controlling, or at least significantly influencing, the handling of data
about themselves.

The term 'data privacy' is sometimes used in the same way.
'Data'
refers to inert symbols, signs or measures, whereas
'information'
implies the use of data by humans to extract meaning. Hence 'information
privacy' is arguably the more descriptive of the two alternatives.

The notion emerged during the mid-1960s, and the growth in its importance is
often perceived to be directly linked to concern about the accelerating
capability of computers, and their application to the processing of data about
people. In fact, there has been a tendency throughout the twentieth century
for organisations to increase:

in size;

in their 'social distance' from people; and

in the 'information intensity' which they apply in order to manage their
dealings with people.

The continuing increase in public concern about information privacy should
therefore be seen as a reaction to the ways in which information technology is
used by organisations, rather than to information technology itself.

Legislatures of countries on the Continent of Europe, and to some extent also
in North America, passed laws addressing information privacy, primarily during
the 1970s, though with some laggards deferring action until the 1980s or even
1990s. These laws mostly focus on 'data protection', i.e.
they protect data about people, rather than people themselves. This is
unfortunate because, although data protection is a more pragmatic concept than
the abstract notion of privacy (and it's therefore easier to produce results),
it's not what humans actually need.

Another term that has been used to describe current information privacy
protections is 'fair information practices legislation'.
Although it sounds good, this approach has failed to satisfy the need. It has
been used to legitimise existing practices, and to permit virtually any use of
data by virtually any organisation for virtually any reason, provided that it
is handled 'fairly'. Genuine privacy protection forces uses of data to be
justified.

Another related notion derives from the law of confidence:

Confidentiality is the legal duty of individuals
who come into the possession of information about others, especially in the
course of particular kinds of relationships with them.

Confidentiality is an incidental, and wholly inadequate, substitute for
proper information privacy protection.

Like 'privacy', 'confidentiality' is an anglo-saxon notion, and is not readily
translatable into other languages. The German Federal Constitutional Court, on
the other hand, has read into that country's constitution a right of
individuals to 'informational self-determination'. This would
appear to bring European thinking somewhat closer to the ideas discussed in
english-speaking countries.

A
Common Misuse of the Term 'Privacy'

The term 'privacy' is used by some people, particularly security specialists
and computer scientists, and especially in the United States, to refer to the
security of data against various risks, such as the risks of data being
accessed or modified by unauthorised persons. In some cases, it is used even
more restrictively, to refer only to the security of data during
transmission.

These aspects are only a small fraction of the considerations within the field
of 'information privacy'. More appropriate terms to use for those concepts are
'data security' and 'data transmission
security'.

The term 'confidentiality' is also sometimes used by computer scientists to
refer to 'data transmission security', risking confusion with obligations under
the law of confidence.

Data
Surveillance

Information privacy is valued very highly by individuals. But it is under
threat from particular kinds of management practices, and from advances in
technology. This section explains the concept of 'data surveillance'. To do
so, it is first necessary to define some underlying terms.

Surveillanceis the systematic investigation or
monitoring of the actions or communications of one or more persons.

The primary purpose of surveillance is generally to collect information
about the individuals concerned, their activities, or their associates. There
may be a secondary intention to deter a whole population from undertaking some
kinds of activity.

Two separate classes of surveillanceare usefully identified:

Personal Surveillance is the surveillance of an
identified person. In general, a specific reason exists for the investigation
or monitoring. It may also, however, be applied as a means of deterrence
against particular actions by the person, or represssion of the person's
behaviour.

Mass Surveillance is the surveillance of groups
of people, usually large groups. In general, the reason for investigation or
monitoring is to identify individuals who belong to some particular class of
interest to the surveillance organization. It may also, however, be used for
its deterrent effects.

The basic form, physical surveillance, comprises watching
(visual surveillance) and listening (aural surveillance). Monitoring may be
undertaken remotely in space, with the aid of image-
amplification
devices like field glasses, infrared binoculars, light amplifiers, and
satellite cameras, and sound-
amplification
devices like directional microphones; and remotely in time, with the aid of
image and sound-
recording
devices.

The notion of the 'panopticon' (Jeremy Bentham's 18th century
proposal for efficient prisons as an alternative to transporting felons to
colonies like what is now Australia) has re-surfaced in the writings of
Foucault, as a metaphor for what he sees as the prison-like nature of late 20th
century societies.

In addition to physical surveillance, several kinds of communications
surveillance are practised, including mail covers and telephone
interception.

The popular term electronic surveillance refers to both
augmentations to physical surveillance (such as directional microphones and
audio bugs) and to communications surveillance, particularly telephone taps.

These forms of direct surveillance are commonly augmented by the collection of
data from interviews with informants (such as neighbours, employers, workmates,
and bank managers). As the volume of information collected and maintained has
increased, the record collections of organizations have become an increasingly
important source. These are often referred to as 'personal data
systems'. This has given rise to an additional form of surveillance:

Data Surveillance (or Dataveillance)
is the systematic use of personal data systems in the investigation or
monitoring of the actions or communications of one or more persons.

Dataveillance is significantly less expensive than physical and electronic
surveillance, because it can be automated. As a result, the economic
constraints on surveillance are diminished, and more individuals, and larger
populations, are capable of being monitored.

Like surveillance more generally, dataveillance is of two
kinds:

Personal Dataveillance is the systematic use of
personal data systems in the investigation or monitoring of the actions or
communications of an identified person. In general, a specific reason exists
for the investigation or monitoring of an identified individual. It may also,
however, be applied as a means of deterrence against particular actions by the
person, or represssion of the person's behaviour.

Mass Dataveillance is the systematic use of
personal data systems in the investigation or monitoring of the actions or
communications of groups of people. In general, the reason for investigation
or monitoring is to identify individuals who belong to some particular class of
interest to the surveillance organization. It may also, however, be used for
its deterrent effects.

Dataveillance comprises a wide range of techniques. These include:

Front-
End
Verification. This is the cross-checking of data in an
application form, against data from other personal data systems, in order to
facilitate the processing of a transaction.

Computer
Matching. This is the expropriation of data maintained by two or more
personal data systems, in order to merge previously separate data about large
numbers of individuals.

Profiling
This is a technique whereby a set of characteristics of a particular class of
person is inferred from past experience, and data-holdings are then searched
for individuals with a close fit to that set of characteristics.

Dataveillance depends on the availability of data. Whereas the term
'personal data system' reflects the interests of a data collector, a term is
needed that relates to the interests of individuals:

Data
TrailThis is a succession of identified
transactions, which reflect real-world events in which a person has
participated.

Human
Identification

Identification is a process whereby a real-world entity is
recognised, and its 'identity' established. Identity is
operationalised in the abstract world of information systems as a set of
information about an entity that differentiates it from other, similar
entities. The set of information may be as small as a single code,
specifically designed as an identifier, or may be a compound
of such data as given and family name, date-of-birth and postcode of residence.
An organisation's identification process comprises the acquisition of the
relevant identifying information.

Contrary to the presumptions made in many information systems, an
entity does not necessarily have a single identity, but may have multiple
identities. For example, a company may have many business units,
divisions, branches, trading-names, trademarks and brandnames. And many people
are known by different names in different contexts.

A variety of person-identification techniques are available,
which can assist in associating data with them. Important examples of these
techniques are:

names - or what the person is called by other people;

codes - or what the person is called by an organisation;

knowledge - or what the person knows;

tokens - or what the person has;

biometrics - or what the person is, does, or looks like.

The term 'biometrics' is used to refer to those
person-identification techniques that are based on some physical and
difficult-to-alienate characteristic, such as:

social behaviour - how the person interacts with others
(e.g. habituated body-signals; general voice characteristics; style of
speech; visible handicaps; supported by video-film);

bio-dynamics - what the person does (e.g. the manner in
which one's signature is written; statistically-analysed voice
characteristics; keystroke dynamics, particularly in relation to login-id and
password);

Such schemes are used in many Europe countries for a defined set of
purposes, typically the administration of taxation, national superannuation and
health insurance. In some countries, they are used for multiple additional
purposes. There is deep concern in Anglo-American countries about such
schemes, as evidenced by
the
demise of the Australia Card proposal.

Unlike physical, communications and electronic surveillance, dataveillance does
not monitor the individual, but merely the shadow that the person casts in
data. A term is needed to refer to the subject of dataveillance:

The
Digital Persona is the model of an individual's public personality
based on data, and maintained by transactions, and used as a proxy for the
individual.

Like any mere model, a digital persona is a partial and inaccurate
reflection of a complex reality. Serious dangers arise when determinations are
made, and actions taken, about an individual, based on their digital persona.

A token that holds particular attraction as a tool in person-identification is
a chip-based card (smart-card). This may carry a person's Private
Key, enabling them to attach a
Digital
Signature to an electronic message. This has substantial
privacy
implications.

Authentication

Authentication is the process whereby a degree of
confidence is established about the truth of an assertion.

A common application of the idea is to the authentication of
identity (Clarke
1995,
1996e).
This is the process whereby an organisation establishes that a party it is
dealing with is:

a previously known real-world entity (in which case it can associate
transactions with an existing record in the relevant information system); or

a previously unknown real-world entity (in which case it may be
appropriate to create a new record in the relevant information system, and
perhaps also to create an organisational identifier for that party).

In addition, there are many circumstances in which organisations undertake
authentication of value, e.g. by checking a banknote for
forgery-resistant features like metal wires or holograms, and seeking
pre-authorisation of credit-card payments.

Another approach is the authentication of attributes, credentials or
eligibility. In this case, it is not the person's identity that is in
focus, but rather the capacity of that person to perform some function, such as
being granted a discount applicable only to tradesmen or club-members, or a
concessional fee only available to senior citizens or school-children, or entry
to premises that are restricted to adults only.

Anonymity,
Identification and Pseudonymity

An anonymous
record or transaction is one whose data cannot be
associated with a particular individual, either from the data itself, or by
combining the transaction with other data.

A great many transactions that people undertake are entirely anonymous,
including:

barter transactions;

visits to enquiry counters in government agencies and shops;

telephone enquiries;

cash transactions such as the myriad daily payments for inexpensive goods
and services, gambling and road-tolls; and

Some of the reasons that people use anonymity are of dubious social value,
such as avoiding detection of their whereabouts in order to escape
responsibilities. Other reasons are of arguably significant social value, such
as avoiding physical harm, enabling 'whistle-blowing', avoiding unwanted and
unjustified public exposure, and keeping personal data out of the hands of
intrusive marketers and governments.

Some categories of transactions, however, are difficult to conduct on an
anonymous basis, without one or perhaps both of the parties being known to the
other. Examples of transactions where there is a strong argument for
identification include:

an undertaking by a person to perform some action in the future, such as
repaying a loan;

the provision of a surety to enable the freeing of a person charged with a
crime;

the collection of a token intended to serve as evidence of identity (such
as a passport); and

the collection of a privacy-sensitive document (such as medical
information).

An identified record or transaction is one in
which the data can be readily related to a particular
individual. This may be because it carries a direct identifier of the person
concerned, or because it contains data which, in combination with other
available data, links the data to a particular person.

There is a current tendency for organisations to try to convert anonymous
transactions (e.g. visits to counters, telephone enquiries and low-value
payments) into identified transactions. The reasons for this trend include:

the increased scale of social institutions;

the increased distance of those institutions from people;

the reduced trust that each side exhibits in its dealings with the other;

the increased capability of information technology to support data
collection, storage, processing, discovery, use and disclosure; and

Hence a transaction is pseudonymous in relation to a particular party if the
transaction data contains no direct identifier for that party, and can only be
related to them in the event that a very specific piece of additional data is
associated with it. The data may, however, be indirectly associated with the
person, if particular procedures are followed, e.g. the issuing of a search
warrant authorising access to an otherwise closed index.

To be effective, pseudonymous mechanisms must involve legal,
organisational and technical protections, such that the link can only
be made (e.g. the index can only be accessed) under appropriate
circumstances.

Two closely related techniques are:

the authentication of people's eligibility rather than their
identity; and

the authentication of people's identity but without recording
it.

Pseudonymity is used in some situations to enable conflicting interests to
be satisfied; for example in collections of highly sensitive personal data
such as that used in research into HIV/AIDS. It is capable of being applied in
a great many more situations than it is at present.

Generalising from this example, pseudonymity is used to enable
theprotection of individuals who are at risk of undue
embarrassment or physical harm. Categories of such people range from
celebrities and VIPs (who are subject to widespread but excessive interest
among sections of the media and the general public) to protected witnesses,
'battered wives', celebrities under threat from 'stalkers', and people in
security-sensitive occupations.

Another application of pseudonymity is to reflect the various roles
that people play. For example, a person may act as their private
selves, an employee of an organisation, an officer of a professional
association, and an officer of a community organisation. In addition, a person
may have multiple organisational roles (e.g. substantive position, acting
position, various roles on projects and cross-organisational committees, bank
signatory, first-aid officer, fire warden), and multiple personal roles (e.g.
parent, child, spouse, scoutmaster, sporting team-coach, participant in
professional and community committees, writer of
letters-to-the-newspaper-editor, chess-player, participant in newsgroups,
e-lists, chat-channels).

In this context, the terms 'identity' and 'identifier' become awkward, because
a person may have multiple roles, and hence 'identities' and 'identifiers'. It
is therefore useful to have another word available that can be used to refer to
each of these virtual entities. The terms 'pseudonym' or 'pseudo-identity' are
tenable; but the term 'nym' appears to be gaining currency.

Morover, nyms may not be associable with a specific person, or may only be
associable with a specific person if particular conditions are fulfilled.
Hence a further and important application of pseudonymity is the use of
information technology to support multiple nyms. Under such
arrangements, a person sustains separate relationships with multiple
organisations, using separate identifiers, and generating separate data trails.
These are designed to be very difficult to link, but, subject to appropriate
legal authority, a mechanism exists whereby they can be linked.

In addition, a person may be able to establish multiple relationships
with the same organisation, with a separate digital persona for each
relationship. This may be to reflect the various roles the person
plays when it interacts with that organisation (e.g. contractor, beneficiary,
customer, lobbyist, debtor, creditor). Alternatively, it may merely be to put
at rest the minds of people who are highly nervous about the power of
organisations to bring pressure to bear on them.

In the new contexts of highly data-intensive relationships, and
Internet-mediated communications, pseudonymity and multiple digital personae
are especially important facets of human identification and information privacy.

Public
Opinion

Privacy
Protections

Governments throughout the world recognised that a problem existed, and a
great deal of legislation has been passed since the first statute in 1970.
Background is provided in
Clarke
(1998).

Technology is bringing with it ever more challenges, and privacy advocates
worldwide are actively seeking to sustain and extend privacy protections, in
order to cope with these intrusions.

Australian
Privacy Protection Laws

Australia was an early participant in and contributer to discussions about
privacy. Australian Parliaments, on the other hand, have been among the
tardiest in the world to establish legislative protections.

The content and infrastructure for these community service pages are provided by Roger Clarke through his consultancy company, Xamax.

From the site's beginnings in August 1994 until February 2009, the infrastructure was provided by the Australian National University. During that time, the site accumulated close to 30 million hits. It passed 50 million in early 2015.