Facial recognition takes previously anonymous images and conjures peoples' identities. It's an invaluable capability. Once they can pick out faces in crowds, trawling surreptitiously through anyone and everyone's photos, the social network businesses can work out what we're doing, when and where we're doing it, and who we're doing it with. The companies figure out what we like to do without us having to 'like' or favorite anything.

So Google, Facebook, Apple at al have invested hundreds of megabucks in face recognition R&D and buying technology start-ups. And they spend billions of dollars buying images and especially faces, going back to Google's acquisition of Picasa in 2004, and most recently, Facebook's ill-fated $3 billion offer for Snapchat.

But if most people find face recognition rather too creepy, then there is cause for optimism. The technocrats have gone too far. What many of them still don't get is this: If you take anonymous data (in the form of photos) and attach names to that data (which is what Facebook photo tagging does - it guesses who people are in photos are, attaches putative names to records, and invites users to confirm them) then you Collect Personal Information. Around the world, existing pre-biometrics era black letter Privacy Law says you can't Collect PII even indirectly like that without am express reason and without consent.

When automatic facial recognition converts anonymous data into PII, it crosses a bright line in the law.

A repeated refrain of cynics and “infomopolists” alike is that privacy is dead. People are supposed to know that anything on the Internet is up for grabs. In some circles this thinking turns into digital apartheid; some say if you’re so precious about your privacy, just stay offline.

But socialising and privacy are hardly mutually exclusive; we don’t walk around in public with our names tattooed on our foreheads. Why can’t we participate in online social networks in a measured, controlled way without submitting to the operators’ rampant X-ray vision? There is nothing inevitable about trading off privacy for conviviality.

The privacy dangers in Facebook and the like run much deeper than the self-harm done by some peoples’ overly enthusiastic sharing. Promiscuity is actually not the worst problem, neither is the notorious difficulty of navigating complex and ever changing privacy settings.

The advent of facial recognition presents far more serious and subtle privacy challenges.

Facebook has invested heavily in face recognition technology, and not just for fun. Facebook uses it in effect to crowd-source the identification and surveillance of its members. With facial recognition, Facebook is building up detailed pictures of what people do, when, where and with whom.

You can be tagged without consent in a photo taken and uploaded by a total stranger.

The majority of photos uploaded to personal albums over the years were not intended for anything other than private viewing.

Under the privacy law of Australia and data protection regulations in dozens of other jurisdictions, what matters is whether data is personally identifiable. The Commonwealth Privacy Act 1988 (as amended in 2014) defines “Personal Information” as: “information or an opinion about an identified individual, or an individual who is reasonably identifiable”.

Whenever Facebook attaches a member’s name to a photo, they are converting hitherto anonymous data into Personal Information, and in so doing, they become subject to privacy law. Automated facial recognition represents an indirect collection of Personal Information. However too many people still underestimate the privacy implications; some technologists naively claim that faces are “public” and that people can have no expectation of privacy in their facial images, ignoring that information privacy as explained is about the identifiability and identification of data; the words “public” and “private” don’t even figure in the Privacy Act!

If a government was stealing into our photo albums, labeling people and profiling them, there would be riots. It's high time that private sector surveillance - for profit - is seen for what it is, and stopped.

A Social Media Week Sydney event #SMWSydney
Law Lounge, Sydney University Law School
New Law School Building
Eastern Ave, Camperdown
Fri, Sep 26 - 10:00 AM - 11:30 AM

How can you navigate privacy fact and fiction, without the geeks and lawyers boring each other to death?

It's often said that technology has outpaced privacy law. Many digital businesses seem empowered by this brash belief. And so they proceed with apparent impunity to collect and monetise as much Personal Information as they can get their hands on.

But it's a myth!

Some of the biggest corporations in the world, including Google and Facebook, have been forcefully brought to book by privacy regulations. So, we have to ask ourselves:

what does privacy law really mean for social media in Australia?

is privacy "good for business"?

is privacy "not a technology issue"?

how can digital businesses navigate fact & fiction, without their geeks and lawyers boring each other to death?

In this Social Media Week Master Class I will:

unpack what's "creepy" about certain online practices

show how to rate data privacy issues objectively

analyse classic misadventures with geolocation, facial recognition, and predicting when shoppers are pregnant

critique photo tagging and crowd-sourced surveillance

explain why Snapchat is worth more than three billion dollars

analyse the regulatory implications of Big Data, Biometrics, Wearables and The Internet of Things.

We couldn't have timed this Master Class better, coming two weeks after the announcement of the Apple Watch, which will figure prominently in the class!

So please come along, for a fun and in-depth a look at social media, digital technology, the law, and decency.

Steve Wilson is a technologist, who stumbled into privacy 12 years ago. He rejected those well meaning slogans (like "Privacy Is Good For Business!") and instead dug into the relationships between information technology and information privacy. Now he researches and develops design patterns to help sort out privacy, alongside all the other competing requirements of security, cost, usability and revenue. His latest publications include:

The root cause of much identity theft and fraud today is the sad fact that customer reference numbers, personal identifiers and attributes generally are so easy to copy and replay without permission and without detection. Simple numerical attributes like bank account numbers and health IDs can be stolen from many different sources, and replayed with impunity in bogus transactions.

Our personal data nowadays is leaking more or less constantly, through breached databases, websites, online forms, call centres and so on, to such an extent that customer reference numbers on their own are no longer reliable. Privacy consequentially suffers because customers are required to assert their identity through circumstantial evidence, like name and address, birth date, mother’s maiden name and other pseudo secrets. All this data in turn is liable to be stolen and used against us, leading to spiraling identity fraud.

To restore the reliability of personal attribute data, we need to know their pedigree. We need to know that a presented data item is genuine, that it originated from a trusted authority, it’s been stored safely by its owner, and it’s been presented with the owner’s consent. If confidence in single attributes can be restored then we can step back from all the auxiliary proof-of-identity needed for routine transactions, and thus curb identity theft.

We could protect people against having their stolen identifiers used behind their backs. It shouldn't actually be necessary to re-issue every Korean's ID. Improvements may be made to the reliability of identification data without dramatically changing Relying Parties' backend processes. If for instance a service provider has always used SSN as part of its identification regime, they could continue to do so, if only the actual Social Security Numbers being received were reliable!

The trick is to be able to tell "original" ID numbers from "copies". But what does "original" even mean in the digital world? A more precise term for what we really want is pedigree. What we need is to be able to present attribute data in such a way that the receiver may be sure of their pedigree; that is, know that the attributes were originally issued by an authoritative body, that the data has been kept safe, and that each presentation of the attribute has occurred under the owner's control.

These objectives can be met with the help of smart cryptographic technologies which today are built into most smart phones and smartcards, and which are finally being properly exploited by initiatives like the FIDO Alliance.

"Notarising" attributes in chip devices

There are ways of issuing attributes to a smart chip device that prevent them from being stolen, copied and claimed by anyone else. One way to do so is to encapsulate and notarise attributes in a unique digital certificate issued to a chip. Today, a great many personal devices routinely embody cryptographically suitable chips for this purpose, including smart phones, SIM cards, "Secure Elements", smartcards and many wearable computers.

Consider an individual named Smith to whom Organisation A has issued a unique attribute N (which could be as simple as a customer reference number). If N is saved in ordinary computer memory or something like a magnetic stripe card, then it has no pedigree. Once the number N is presented by the cardholder in a transaction, it has the same properties as any other number. To better safeguard N in a chip device, it can be sealed into a digital certificate, as follows:

1. generate a fresh private-public key pair inside Smith’s chip
2. export the public key
3. create a digital certificate around the public key, with an attribute corresponding to N
4. have the certificate signed by (or on behalf of) organisation A.

The result of coordinating these processes and technologies is a logical triangle that inextricably binds cardholder Smith to their attribute N and to a specific personally controlled device. The certificate signed by organisation A attests to both Smith’s entitlement to N and Smith's control of a particular key unique to the device. Keys generated inside the chip are retained internally, never divulged to outsiders. It is not possible to copy the private key to another device, so the logical triangle cannot be reproduced or counterfeited.

Note that this technique lies at the core of the EMV "Chip-and-PIN" system where the smart payment card digitally signs cardholder and transaction data, rendering it immune to replay, before sending it to the merchant terminal. See also my 2012 paper Calling for a uniform approach to card fraud, offline and on. Now we should generalise notarised personal data and digitally signed transactions beyond Card-Present payments into as much online business as possible.

Restoring privacy and consumer control

When Smith wants to present their attribute N in an electronic transaction, instead of simply copying N out of memory (at which point it would lose its pedigree), Smith’s transaction software digitally signs the transaction using the certificate containing N. With standard security software, any third party can then verify that the transaction originated from a genuine chip holding the unique key certified by A as containing the attribute N.

Note that N doesn't have to be a customer number or numeric identifier; it could be any personal data, such as a biometric template, or a package of medical information like an allergy alert, or an interesting isolated (and anonymous) property of the user such as their age.

The capability to manage multiple key pairs and certificates, and to sign transactions with a nominated private key, is increasingly built into smart devices today. By narrowing down what you need to know about someone to a precise attribute or personal data item, we will reduce identity theft and fraud while radically improving privacy. This sort of privacy enhancing technology is the key to a safe Internet of Things, and fortunately now is widely available.

Addressing ID theft

Perhaps the best thing governments could do immediately is to adopt smartcards and equivalent smart phone apps for holding and presenting such attributes as official ID numbers. The US government has actually come close to such a plan many times; Chip-based Social Security Cards and Medicare Cards have been proposed before, without realising their full potential. These devices would best be used as above to hold a citizen's identifiers and present them cryptographically, without vulnerability to ID theft and takeover. We wouldn't have to re-issue compromised SSNs; we would instead switch from manual presentation of these numbers to automatic online presentation, with a chip card or smart phone app conveying the data through digitally signatures.

Follow along on Twitter at #CISmcc (for the Monterey Conference Centre).

The Cloud Identity Summit really is the top event on the identity calendar. The calibre of the speakers, the relevance and currency of the material, the depth and breadth of the cohort, and the international spread are all unsurpassed. It's been great to meet old cyber-friends in "XYZ Space" at last -- like Emma Lindley from the UK and Lance Peterman. And to catch up with such talented folks like Steffen Sorensen from New Zealand once again.

A day or two before, Ian Glazer of Salesforce asked in a tweet what we were expecting to get out of CIS. And I replied that I hoped to change my mind about something. It's unnerving to have your understanding and assumptions challenged by the best in the field ... OK, sometimes it's outright embarrassing ... but that's what these events are all about. A very wise lawyer said to me once, around 1999 at the dawn of e-commerce, that he had changed his mind about authentication a few times up to that point, and that he fully expected to change his mind again and again.

But seriously, one of the best (and pleasantly surprising) things about HIE Connect as the OIDF folks tell it, is the way its leaders unflinchingly take for granted the importance of privacy in the exchange of patient health records. Because honestly, privacy is not a given in e-health. There are champions on the new frontiers like genomics that actually say privacy may not be in the interests of the patients (or more's the point, the genomics businesses). And too many engineers in my opinion still struggle with privacy as something they can effect. So it's great -- and believe me, really not obvious -- to hear the HIE Connects folks -- including Debbie Bucci from the US Dept of Health and Human Services, and Justin Richer of Mitre and MIT -- dealing with it head-on. There is a compelling fit for the OAUTH and OIDC protocols here, with their ability to manage discrete pieces of information about users (patients) and to permission them all separately. Having said that, Don and I agree that e-health records permissioning and consent is one of the great UI/UX challenges of our time.

Justin also highlighted that the RESTful patterns emerging for fine-grained permissions management in healthcare are not confined to healthcare. Debbie added that the ability to query rare events without undoing privacy is also going to be a core defining challenge in the Internet of Things.

MyPOV: We may well see tremendous use cases for the fruits of HIE Exchange before they're adopted in healthcare!

In the afternoon, we heard from Canadian and British projects that have been working with the Open Identity Exchange (OIX) program now for a few years each.

Emma Lindley presented the work they've done in the UK Identity Assurance Program (IDAP) with social security entitlements recipients. These are not always the first types of users we think of for sophisticated IDAM functions, but in Britain, local councils see enormous efficiency dividends from speeding up the issuance of eg disabled parking permits, not to mention reducing imposters, which cost money and lead to so much resentment of the well deserved. Emma said one Attributes Exchange beta project reduced the time taken to get a 'Blue Badge' permit from 10 days to 10 minutes. She went on to describe the new "Digital Sources of Trust" initiative which promises to reconnect under-banked and under-documented sections of society with mainstream financial services. Emma told me the much-abused word "transformational" really does apply here.

MyPOV: The Digital Divide is an important issue for me, and I love to see leading edge IDAM technologies and business processes being used to do something about it -- and relatively quickly.

Then Andre Boysen of SecureKey led a discussion of the Canadian identity ecosystem, which he said has stabilised nicely around four players: Federal Government, Provincial Govt, Banks and Carriers. Lots of operations and infrastructure precedents from the payments industry have carried over.
Andre calls the smart driver license of British Columbia the convergence of "street identity and digital identity".

MyPOV: That's great news - and yet comparable jurisdictions like Australia and the USA still struggle to join governments and banks and carriers in an effective identity synthesis without creating great privacy and commercial anxieties. All three cultures are similarly allergic to identity cards, but only in Canada have they managed to supplement drivers licenses with digital identities with relatively high community acceptance. In nearly a decade, Australia has been at a standstill in its national understanding of smartcards and privacy.

For mine, the CIS Quote of the Day came from Scott Rice of the Open ID Foundation. We all know the stark problem in our industry of the under-representation of Relying Parties in the grand federated identity projects. IdPs and carriers so dominate IDAM. Scott asked us to imagine a situation where "The auto industry was driven by steel makers". Governments wouldn't put up with that for long.

Can someone give us the figures? I wonder if Identity and Access Management is already more economically ore important than cars?!

We live in an age where billionaires are self-made on the back of the most intangible of assets – the information they have amassed about us. That information used to be volunteered in forms and questionnaires and contracts but increasingly personal information is being observed and inferred.

The modern world is awash with data. It’s a new and infinitely re-usable raw material. Most of the raw data about us is an invisible by-product of our mundane digital lives, left behind by the gigabyte by ordinary people who do not perceive it let alone understand it.

Many Big Data and digital businesses proceed on the basis that all this raw data is up for grabs. There is a particular widespread assumption that data in the "public domain" is free-for-all, and if you’re clever enough to grab it, then you’re entitled to extract whatever you can from it.

In the webinar, I'll try to show how some of these assumptions are naive. The public is increasingly alarmed about Big Data and averse to unbridled data mining. Excessive data mining isn't just subjectively 'creepy'; it can be objectively unlawful in many parts of the world. Conventional data protection laws turn out to be surprisingly powerful in in the face of Big Data. Data miners ignore international privacy laws at their peril!

Today there are all sorts of initiatives trying to forge a new technology-privacy synthesis. They go by names like "Privacy Engineering" and "Privacy by Design". These are well meaning efforts but they can be a bit stilted. They typically overlook the strengths of conventional privacy law, and they can miss an opportunity to engage the engineering mind.

It’s not politically correct but I believe we must admit that privacy is full of contradictions and competing interests. We need to be more mature about privacy. Just as there is no such thing as perfect security, there can never be perfect privacy either. And is where the professional engineering mindset should be brought in, to help deal with conflicting requirements.

If we’re serious about Privacy by Design and Privacy Engineering then we need to acknowledge the tensions. That’s some of the thinking behind Constellation's new Big Privacy compact. To balance privacy and Big Data, we need to hold a conversation with users that respects the stresses and strains, and involves them in working through the new privacy deal.

The latest Snowden revelations include the NSA's special programs for extracting photos and identifying from the Internet. Amongst other things the NSA uses their vast information resources to correlate location cues in photos -- buildings, streets and so on -- with satellite data, to work out where people are. They even search especially for passport photos, because these are better fodder for facial recognition algorithms. The audacity of these government surveillance activities continues to surprise us, and their secrecy is abhorrent.

Yet an ever greater scale of private sector surveillance has been going on for years in social media. With great pride, Facebook recently revealed its R&D in facial recognition. They showcased the brazenly named "DeepFace" biometric algorithm, which is claimed to be 97% accurate in recognising faces from regular images. Facebook has made a swaggering big investment in biometrics.

Data mining needs raw material, there's lots of it out there, and Facebook has been supremely clever at attracting it. It's been suggested that 20% of all photos now taken end up in Facebook. Even three years ago, Facebook held 10,000 times as many photographs as the Library of Congress:

[Picture courtesy of the now retired 1000memories.com blog]

And Facebook will spend big buying other photo lodes. Last year they tried to buy Snapchat for the spectacular sum of three billion dollars. The figure had pundits reeling. How could a start-up company with 30 people be worth so much? All the usual dot com comparisons were made; the offer seemed a flight of fancy.

But no, the offer was a rational consideration for the precious raw material that lies buried in photo data.

Snapchat generates at least 100 million new images every day. Three billion dollars was, pardon me, a snap. I figure that at a ballpark internal rate of return of 10%, a $3B investment is equivalent to $300M p.a. so even if the Snapchat volume stopped growing, Facebook would have been paying one cent for every new snap, in perpetuity.

These days, we have learned from Snowden and the NSA that communications metadata is just as valuable as the content of our emails and phone calls. So remember that it's the same with photos. Each digital photo comes from a device that embeds within the image metadata usually including the time and place of when the picture was taken. And of course each Instagram or Snapchat is a social post, sent by an account holder with a history and rich context in which the image yields intimate real time information about what they're doing, when and where.

The hallmark of the Snapchat service is transience: all those snaps are supposed to flit from one screen to another before vaporising. Now of course that idea is contestable; enthusiasts worked out pretty quickly how to retrieve snaps from old memory. And in any case, transience is a red herring, perhaps a deliberate distraction, because the metadata matters more, and Snapchat admits in its Privacy Policy that it pretty well keeps the lot:

When you access or use our Services, we automatically collect information about you, including:

Usage Information: When you send or receive messages via our Services, we collect information about these messages, including the time, date, sender and recipient of the Snap. We also collect information about the number of messages sent and received between you and your friends and which friends you exchange messages with most frequently.

Log Information: We log information about your use of our websites, including your browser type and language, access times, pages viewed, your IP address and the website you visited before navigating to our websites.

Device Information: We may collect information about the computer or device you use to access our Services, including the hardware model, operating system and version, MAC address, unique device identifier, phone number, International Mobile Equipment Identity ("IMEI") and mobile network information. In addition, the Services may access your device's native phone book and image storage applications, with your consent, to facilitate your use of certain features of the Services.

Location Information: With your consent, we may collect information about the location of your device to facilitate your use of certain features of our Services, determine the speed at which your device is traveling, add location-based filters to your Snaps (such as local weather), and for any other purpose described in this privacy policy.

Snapchat goes on to declare it may use any of this information to "personalize and improve the Services and provide advertisements, content or features that match user profiles or interests" and it reserves the right to share any information with "vendors, consultants and other service providers who need access to such information to carry out work on our behalf".

So back to the data mining: nothing stops Snapchat -- or a new parent company -- running biometric facial recognition over the snaps as they pass through the servers, to extract additional "profile" information. And there's an extra kicker that makes Snapchats extra valuable for biometric data miners. The vast majority of Snapchats are selfies. So if you extract a biometric template from a snap, you already know who it belongs to, without anyone having to tag it. Snapchat would provide a hundred million auto-calibrations every day for facial recognition algorithms! On Facebook, the privacy aware turn off photo tagging, but with Snapchats, self identification is inherent to the experience and is unlikely to be ever be disabled.

As I've discussed before, the morbid thrill of Snowden's spying revelations has tended to overshadow his sober observations that when surveillance by the state is probably inevitable, we need to be discussing accountability.

While we're all ventilating about the NSA, it's time we also attended to private sector spying and properly debated the restraints that may be appropriate on corporate exploitation of social data.

Personally I'm much more worried that an infomopoly has all my selfies.

Many Big Data and online businesses proceed on a naive assumption that data in the "public domain" is up for grabs; technocrats are often surprised that conventional data protection laws can be interpreted to cover the extraction of PII from raw data. On the other hand, orthodox privacy frameworks don't cater for the way PII can be created in future from raw data collected today. This presentation will bridge the conceptual gap between data analytics and privacy, and offer new dynamic consent models to civilize the trade in PII for goods and services.

Abstract

It’s often said that technology has outpaced privacy law, yet by and large that's just not the case. Technology has certainly outpaced decency, with Big Data and biometrics in particular becoming increasingly invasive. However OECD data privacy principles set out over thirty years ago still serve us well. Outside the US, rights-based privacy law has proven effective against today's technocrats' most worrying business practices, based as they are on taking liberties with any data that comes their way. To borrow from Niels Bohr, technologists who are not surprised by data privacy have probably not understood it.

The cornerstone of data privacy in most places is the Collection Limitation principle, which holds that organizations should not collect Personally Identifiable Information beyond their express needs. It is the conceptual cousin of security's core Need-to-Know Principle, and the best starting point for Privacy-by-Design. The Collection Limitation principle is technology neutral and thus blind to the manner of collection. Whether PII is collected directly by questionnaire or indirectly via biometric facial recognition or data mining, data privacy laws apply.

It’s not for nothing we refer to "data mining". But few of those unlicensed data gold diggers seem to understand that the synthesis of fresh PII from raw data (including the identification of anonymous records like photos) is merely another form of collection. The real challenge in Big Data is that we don’t know what we’ll find as we refine the techniques. With the best will in the world, it is hard to disclose in a conventional Privacy Policy what PII might be collected through synthesis down the track. The age of Big Data demands a new privacy compact between organisations and individuals. High minded organizations will promise to keep people abreast of new collections and will offer ways to opt in, and out and in again, as the PII-for-service bargain continues to evolve.

With a bunch of exciting new members joining up on the eve of the RSA Conference, the FIDO Alliance is going from strength to strength. And they've just published the first public review drafts of their core "universal authentication" protocols.

An update to my Constellation Research report on FIDO is now available. Here's a preview.

The Go-To standards alliance in protocols for modern identity management

The FIDO Alliance – for Fast IDentity Online – is a fresh, fast growing consortium of security vendors and end users working out a new suite of protocols and standards to connect authentication endpoints to services. With an unusual degree of clarity in this field, FIDO envisages simply "doing for authentication what Ethernet did for networking".

Launched in early 2013, the FIDO Alliance has already grown to nearly 100 members, amongst which are heavyweights like Google, Lenovo, MasterCard, Microsoft and PayPal as well as a couple of dozen biometrics vendors, many of the leading Identity and Access Management solutions and service providers and several global players in the smartcard supply chain.

FIDO is different. The typical hackneyed elevator pitch in Identity and Access Management promises to "fix the password crisis" – usually by changing the way business is done. Most IDAM initiatives unwittingly convert clear-cut technology problems into open-ended business transformation problems. In contrast, FIDO's mission is refreshingly clear cut: it seeks to make strong authentication interoperable between devices and servers. When users have activated FIDO-compliant endpoints, reliable fine-grained information about their client environment becomes readily discoverable by any servers, which can then make access control decisions, each according to its own security policy.

In February 2014, the FIDO Alliance announced the release of its first two protocol drafts, and a clutch of new members including powerful players in financial services, the cloud and e-commerce. Constellation notes in particular the addition to the board of security leader RSA and another major payments card, Discover. And FIDO continues to strengthen its vital “Relying Party” (service provider) representation with the appearance of Aetna, Goldman Sachs, Netflix and Salesforce.com.

It's time we fixed the Authentication plumbing

In my view, the best thing about FIDO is that it is not about federated identity but instead it operates one layer down in what we call the digital identity stack. This might seem to run against the IDAM tide, but it's refreshing, and it may help the FIDO Alliance sidestep the quagmire of identity policy mapping and legal complexities. FIDO is not really about the vexed general issue of "identity" at all! Instead, it's about low level authentication protocols; that is, the plumbing.

The engineering problem underlying Federated Identity is actually pretty simple: if we want to have a choice of high-grade physical, multi-factor "keys" used to access remote services, how do we convey reliable cues to those services about the type of key being used and the individual who's said to be using it? If we can solve that problem, then service providers and Relying Parties can sort out for themselves precisely what they need to know about the users, sufficient to identify and authenticate them.

All of these leaves the 'I' in the acronym "FIDO" a little contradictory. It's such a cute name (alluding of course to the Internet dog) that it's unlikely to change. Instead, I overheard that the acronym might go the way of "KFC" where eventually it is no longer spelled out and just becomes a word in and of itself.

Facebook's business practices pose a risk of non-compliance with the Collection Limitation Principle (OECD Privacy Principle No. 1, and corresponding Australian National Privacy Principles NPP 1.1 through 1.4).

Privacy problems will likely remain while Facebook's business model remains unsettled, for the business is largely based on collecting and creating as much Personal Information as it can, for subsequent and as yet unspecified monetization.

If an OSN business doesn't know how it is eventually going to make money from Personal Information, then it has a fundamental difficulty with the Collection Limitation principle.

Introduction

Facebook is an Internet and societal phenomenon. Launched in 2004, in just a few years it has claimed a significant proportion of the world's population as regular users, becoming by far the most dominant Online Social Network (OSN). With its success has come a good deal of controversy, especially over privacy. Does Facebook herald a true shift in privacy values? Or, despite occasional reckless revelations, are most users no more promiscuous than they were eight years ago? We argue it's too early to draw conclusions about society as a whole from the OSN experience to date. In fact, under laws that currently stand, many OSNs face a number of compliance risks in dozens of jurisdictions.

Over 80 countries worldwide now have enacted data privacy laws, around half of which are based on privacy principles articulated by the OECD. Amongst these are the Collection Limitation Principle which requires businesses to not gather more Personal Information than they need for the tasks at hand, and the Use Limitation Principle which dictates that Personal Information collected for one purpose not be arbitrarily used for others without consent.
Overt collection, covert collection (including generation) and "innovative" secondary use of Personal Information are the lifeblood of Facebook. While Facebook's founder would have us believe that social mores have changed, a clash with orthodox data privacy laws creates challenges for the OSN business model in general.

This article examines a number of areas of privacy compliance risk for Facebook. We focus on how Facebook collects Personal Information indirectly, through the import of members' email address books for "finding friends", and by photo tagging. Taking Australia's National Privacy Principles from the Privacy Act 1988 (Cth) as our guide, we identify a number of potential breaches of privacy law, and issues that may be generalised across all OECD-based privacy environments.

Terminology

Australian law tends to use the term "Personal Information" rather than "Personally Identifiable Information" although they are essentially synonymous for our purposes.

Terms of reference: OECD Privacy Principles and Australian law

The Organisation for Economic Cooperation and Development has articulated eight privacy principles for helping to protect personal information. The OECD Privacy Principles are as follows:

1. Collection Limitation Principle

2. Data Quality Principle

3. Purpose Specification Principle

4. Use Limitation Principle

5. Security Safeguards Principle

6. Openness Principle

7. Individual Participation Principle

8. Accountability Principle

Of most interest to us here are principles one and four:

Collection Limitation Principle: There should be limits to the collection of personal data and any such data should be obtained by lawful and fair means and, where appropriate, with the knowledge or consent of the data subject.

Use Limitation Principle: Personal data should not be disclosed, made available or otherwise used for purposes other than those specified in accordance with [the Purpose Specification] except with the consent of the data subject, or by the authority of law.

At least 89 counties have some sort of data protection legislation in place [Greenleaf, 2012]. Of these, in excess of 30 jurisdictions have derived their particular privacy regulations from the OECD principles. One example is Australia.

We will use Australia's National Privacy Principles NPPs in the Privacy Act 1988 as our terms of reference for analysing some of Facebook's systemic privacy issues. In Australia, Personal Information is defined as: information or an opinion (including information or an opinion forming part of a database), whether true or not, and whether recorded in a material form or not, about an individual whose identity is apparent, or can reasonably be ascertained, from the information or opinion.

Indirect collection of contacts

One of the most significant collections of Personal Information by Facebook is surely the email address book of those members that elect to have the site help "find friends". This facility provides Facebook with a copy of all contacts from the address book of the member's nominated email account. It's the very first thing that a new user is invited to do when they register. Facebook refer to this as "contact import" in the Data Use Policy (accessed 10 August 2012).

"Find friends" is curtly described as "Search your email for friends already on Facebook". A link labelled "Learn more" in fine print leads to the following additional explanation:

"Facebook won't share the email addresses you import with anyone, but we will store them on your behalf and may use them later to help others search for people or to generate friend suggestions for you and others. Depending on your email provider, addresses from your contacts list and mail folders may be imported. You should only import contacts from accounts you've set up for personal use." [underline added by us].

Without any further elaboration, new users are invited to enter their email address and password if they have a cloud based email account (such as Hotmail, gmail, Yahoo and the like). These types of services have an API through which any third party application can programmatically access the account, after presenting the user name and password.

It is entirely possible that casual users will not fully comprehend what is happening when they opt in to have Facebook "find friends". Further, there is no indication that, by default, imported contact details are shared with everyone. The underlined text in the passage quoted above shows Facebook reserves the right to use imported contacts to make direct approaches to people who might not even be members.

Importing contacts represents an indirect collection by Facebook of Personal Information of others, without their authorisation or even knowledge. The short explanatory information quoted above is not provided to the individuals whose details are imported and therefore does not constitute a Collection Notice. Furthermore, it leaves the door open for Facebook to use imported contacts for other, unspecified purposes. The Data Use Policy imposes no limitations as to how Facebook may make use of imported contacts.

Privacy harms are possible in social networking if members blur the distinction between work and private lives. Recent research has pointed to the risky use of Facebook by young doctors, involving inappropriate discussion of patients [Moubarak et al, 2010]. Even if doctors are discreet in their online chat, we are concerned that they may run foul of the Find Friends feature exposing their connections to named patients. Doctors on Facebook who happen to have patients in their web mail address books can have associations between individuals and their doctors become public. In mental health, sexual health, family planning, substance abuse and similar sensitive fields, naming patients could be catastrophic for them.

While most healthcare professionals may use a specific workplace email account which would not be amenable to contacts import, many allied health professionals, counselors, specialists and the like run their sole practices as small businesses, and naturally some will use low cost or free cloud-based email services. Note that the substance of a doctor's communications with their patients over web mail is not at issue here. The problem of exposing associations between patients and doctors arises simply from the presence of a name in an address book, even if the email was only ever used for non-clinical purposes such as appointments or marketing.

Photo tagging and biometric facial recognition

One of Facebook's most "innovative" forms of Personal Information Collection would have to be photo tagging and the creation of biometric facial recognition templates.

Photo tagging and "face matching" has been available in social media for some years now. On photo sharing sites such as Picasa, this technology "lets you organize your photos according to the people in them" in the words of the Picasa help pages. But in more complicated OSN settings, biometrics has enormous potential to both enhance the services on offer and to breach privacy.

In thinking about facial recognition, we start once more with the Collection Principle. Importantly, nothing in the Australian Privacy Act circumscribes the manner of collection; no matter how a data custodian comes to be in possession of Personal Information (being essentially any data about a person whose identity is apparent) they may be deemed to have collected it. When one Facebook member tags another in a photo on the site, then the result is that Facebook has overtly but indirectly collected PI about the tagged person.

Facial recognition technologies are deployed within Facebook to allow its servers to automatically make tag suggestions; in our view this process constitutes a new type of Personal Information Collection, on a potentially vast scale.

Biometric facial recognition works by processing image data to extract certain distinguishing features (like the separation of the eyes, nose, ears and so on) and computing a numerical data set known as a template that is highly specific to the face, though not necessarily unique. Facebook's online help indicates that they create templates from multiple tagged photos; if a user removes a tag from one of their photo, that image is not used in the template.

Facebook subsequently makes tag suggestions when a member views photos of their friends. They explain the process thus:

"We are able to suggest that your friend tag you in a picture by scanning and comparing your friend‘s pictures to information we've put together from the other photos you've been tagged in".

So we see that Facebook must be more or less continuously checking images from members' photo albums against its store of facial recognition templates. When a match is detected, a tag suggestion is generated and logged, ready to be displayed next time the member is online.

What concerns us is that the proactive creation of biometric matches constitutes a new type of PI Collection, for Facebook must be attaching names -- even tentatively, as metadata -- to photos. This is a covert and indirect process.

Photos of anonymous strangers are not Personal Information, but metadata that identifies people in those photos most certainly is. Thus facial recognition is converting hitherto anonymous data -- uploaded in the past for personal reasons unrelated to photo tagging let alone covert identification -- into Personal Information.

Facebook limits the ability to tag photos to members who are friends of the target. This is purportedly a privacy enhancing feature, but unfortunately Facebook has nothing in its Data Use Policy to limit the use of the biometric data compiled through tagging. Restricting tagging to friends is likely to actually benefit Facebook for it reduces the number of specious or mischievous tags, and it probably enhances accuracy by having faces identified only by those who know the individuals.

A fundamental clash with the Collection Limitation Principle

In Australian privacy law, as with the OECD framework, the first and foremost privacy principle concerns Collection. Australia's National Privacy Principle NPP 1 requires that an organisation refrain from collecting Personal Information unless (a) there is a clear need to collect that information; (b) the collection is done by fair means, and (c) the individual concerned is made aware of the collection and the reasons for it.

In accordance with the Collection Principle (and others besides), a conventional privacy notice and/or privacy policy must give a full account of what Personal Information an organisation collects (including that which it creates internally) and for what purposes. And herein lies a fundamental challenge for most online social networks.

The core business model of many Online Social Networks is to take advantage of Personal Information, in many and varied ways. From the outset, Facebook founder, Mark Zuckerberg, appears to have been enthusiastic for information built up in his system to be used by others. In 2004, he told a colleague "if you ever need info about anyone at Harvard, just ask" (as reported by Business Insider). Since then, Facebook has experienced a string of privacy controversies, including the "Beacon" sharing feature in 2007, which automatically imported members' activities on external websites and re-posted the information on Facebook for others to see.

Facebook's privacy missteps are characterised by the company using the data it collects in unforeseen and barely disclosed ways. Yet this is surely what Facebook's investors expect the company to be doing: innovating in the commercial exploitation of personal information. The company's huge market valuation derives from a widespread faith in the business community that Facebook will eventually generate huge revenues. An inherent clash with privacy arises from the fact that Facebook is a pure play information company: its only significant asset is the information it holds about its members. There is a market expectation that this asset will be monetized and maximised. Logically, anything that checks the network's flux in Personal Information -- such as the restraints inherent in privacy protection, whether adopted from within or imposed from without -- must affect the company's futures.

Conclusion

Perhaps the toughest privacy dilemma for innovation in commercial Online Social Networking is that these businesses still don't know how they are going to make money from their Personal Information lode. Even if they wanted to, they cannot tell what use they will eventually make of it, and so a fundamental clash with the Collection Limitation Principle remains.

Acknowledgements

An earlier version of this article was originally published by LexisNexis in the Privacy Law Bulletin (2010).