'Personal data' in the UK, Anonymisation and Encryption

Update October 2011: the Information Tribunal in Beckles v IC, 16 Sept 2011, have confirmed, referring to the Department of Health case below, that the question is 'whether the individual(s) would be identifiable by members of the public, not armed with the further information held by the University [ie the controller], if the data were disclosed in the form proposed.... The question to be determined is therefore whether any individual would be identifiable from the data requested. That depends to a significant degree on the content of the data in question, which is dealt with briefly in the closed annex.... identifiable means identifiable, not by the requester, but by any third party who might relate such information to his or her knowledge and experience. Disclosure is to the public at large.' (paras 31-33).

That judgment has now been published, and is important regarding the interpretation of the 'personal data' definition in the UK.

This definition is critical when considering whether anonymised or encrypted personal data processed in the cloud should be treated as 'anonymous data', and therefore, eg, be transferable outside the EEA free of data protection law constraints – or whether such data would remain 'personal data' subject to the provisions of data protection legislation.

The UK legislation implementing the EU Data Protection Directive was the Data Protection Act 1998 (DPA), s.1(1) of which provides, in words that differ from the Directive's definition:

'"personal data" means data which relate to a living individual who can be identified–
(a) from those data, or
(b) from those data and other information which is in the possession of, or is likely to come into the possession of, the data controller,
and includes any expression of opinion about the individual and any indication of the intentions of the data controller or any other person in respect of the individual;'

The s.1(1) interpretation question is as follows. Suppose that a data controller holds information which is personal data. It then attempts to anonymise this information, and intends to disclose the resulting anonymised information to a third party. What is the status of that anonymised information? Is it still 'personal data', or can it be treated as anonymous data?

Now, para. (b) of the definition, above, requires that, when considering whether a living individual can be identified from the 'anonymised' data, account must be taken of 'other information' held by the data controller.

A strict, 'hard-line' interpretation of that provision might suggest that 'anonymised' information can never be treated as anonymous data for so long as the controller retains the original personal data from which the anonymised data were derived, because if you put together the anonymised data ('those data') with the original personal data ('other information') still possessed by the data controller, then of course people can still be identified – by the data controller, from the original personal data.

Or, suppose that the data controller has key-coded the original personal data (changed names to code numbers, with a 'key' showing which number corresponds to which name), and destroyed the original personal data, but still possesses the key. Again, people can be identified from the key-coded data in combination with the key. So, on the 'hard-line' view, the key-coded data would remain 'personal data'.

If encryption is applied to a data set, the whole data set would be transformed, and not just names within the data set. However, where the data controller possesses the decryption key, encrypted personal data might be viewed as similar to key-coded data, and if so would, on the hard-line view, always be considered 'personal data'.

So the recent judgment is very relevant to cloud computing as well as other areas of computing, and indeed more generally.

PLA

After the UK Department of Health changed their approach to the release of anonymised abortion statistics, the Pro Life Alliance requested from the Department, under the Freedom of Information Act 2000 (FOIA), anonymised statistics in the more detailed form in which they had previously been released.

The Information Tribunal considered that the requested information was 'personal data' in the hands of the Department of Health under s.1(1)(b) DPA. However, it held that, as it considered the possibility of identification by a third party from the requested statistics was 'extremely remote', the disclosure would not contravene the data protection principles of Sch 1 to that Act and was proportionate and justified when balanced against important legitimate public interests in disclosure.

On appeal, the judge held that the Tribunal's interpretation was wrong: the requested information was not 'personal data', and therefore the Tribunal should have held that the disclosure of the information to the public did not constitute the processing of personal data. (He went on to rule that, even if he were wrong and the information was 'personal data', the Tribunal had acted properly in its overall assessment, from the statistical evidence and its own judgement, that it was 'extremely remote' that the public to whom the statistical data was disclosed would be able to identify individuals from it, and in deciding that disclosure was justified under the DPA.)

He acknowledged that the CSA judgments were not easy to interpret, but concluded from the wording of Lord Hope's proposed order that Lord Hope had recognised that:

although the Agency [data controller] held the information as to the identities of the children to whom the requested information related, it did not follow from that that the information, sufficiently anonymised, would still be personal data when publicly disclosed. All members of the House of Lords agreed with Lord Hope's order demonstrating, in my view, their shared understanding that anonymised data which does not lead to the identification of a living individual does not constitute personal data… The status of information in the data controller's hands did not arise for decision in the CSA case. It was concerned with the implications of disclosure by the data controller… The opening sentence of paragraph 27 [of CSA] acknowledges that the Agency holds the key to identifying the children, but continues that, in his Lordship's opinion, the fact that the Agency had access to this information did not disable it from processing it in such a way consistent with recital 26 of the Directive, "that it becomes data from which a living individual can no longer be identified". That must relate to whether any living individuals can be identified by the public following the disclosure of the information. It cannot relate to whether any living individuals can be identified by the Agency, since that is addressed in the first sentence of the paragraph. Thus the order made by the House of Lords in the CSA case was concerned with the question of fact, whether barnardisation could preclude identification of the relevant individuals by the public.

(paras. 51-52)

Cranston J said that this conclusion reflected recital 26 of the Data Protection Directive, which recognises that the Directive does not apply to data rendered anonymous, giving that recital greater force than a suggestion that the Article 29 Working Party's opinion on 'personal data' required a broader initial interpretation of 'personal data' (para. 53)

Indeed, any other conclusion seemed to him to be:

divorced from reality. The Department of Health's interpretation is that any statistical information derived from reporting forms or patient records constitutes personal data. If that were the case, any publication would amount to the processing of sensitive personal data. That would be so notwithstanding the statistical exemption in Section 33, since that exemption does not exclude the requirement to satisfy Schedule 3 of the DPA. Thus, the statistic that 100,000 women had an abortion in a particular year woul d constitute personal data about each of those women, provided that the body that publishes this statistic has access to information which would enable it to identify each of them. That is not a sensible result and would seriously inhibit the ability of healthcare organisations and other bodies to publish medical statistics.

The APGER asked the MoD for information on individuals detained or captured by UK soldiers oper ating jointly with forces of another country in Iraq or Afghanistan including, in the case of Iraq, detentions or captures jointly with US forces, information on their subsequent transfer to Guantanamo Bay or other detention facilities.

The Tribunal did not think that the dates of detention and dates and locations of any transfers would enable identification of individuals and therefore constitute personal data, based on the content of the information (especially the shortness of the detention periods) and the absence of any evidence that individuals would be identifiable from the information by reason of other knowledge held in the relevant communities (para. 109).

The MoD had also argued, based on (b) of the 'personal data' definition, that information on the numbers of individuals transferred to particular detention facilities or particular kinds of detention facilities remained personal data, even when anonymised, 'because the individuals remained identifiable by the MOD from other information in the possession of the MOD (ie, the unredacted information)'.

In this context, the Tribunal considered CSA and said (para. 127) that:

'Anonymisation by redaction is itself a form of processing. If the data controller carries out such anonymisation, but also retains the unredacted data, or retains the key by which the living individuals can be identified, the anonymised data remains "personal data" within the meaning of paragraph (b) of the definition and the data controller remains under a duty to process it only in compliance with the data protection principles.'

They also said (para. 128), emphasis added:

'However, we remain concerned at the use of this analysis in such a way as would have the effect of treating truly anonymised information as if it required the protection of the DPA, in circumstances where that is plainly not the case and indeed would be absurd. Lord Hope's reasoning appears to lead to the result that, in a case where the data controller retains the ability to identify the individuals, the processing of the data by disseminating it in a fully anonymised form, from which no recipient can identify individuals, can only be justified by showing that it is effected in compliance with the data protection principles. Certainly the whole of the information still needs the protection of the DPA in the hands of the data controller, for as long as the data controller retains the other information which makes individuals identifiable by him. But outside the hands of the data controller the information is no longer personal data, because no individual can be identified. We therefore think, with diffidence given the difficulties of interpretation which led to such divergent reasoning among their Lordships, the best analysis is that disclosure of fully anonymised information is not a breach of the protection of the Act because at the moment of disclosure the information loses its character as personal data. It remains personal data in the hands of the data controller, because the controller holds the key, but it is not personal data in the hands of the recipients, because the public cannot identify any individual from it. That which escapes from the data controller to the outside world is only plain vanilla data. We think this was the reasoning that Baroness Hale had in mind, when she said at [92]:
"For the purpose of this particular act of processing, therefore, which is disclosure of these data in this form to these people, no living individual to whom they relate is identifiable".'

Also of interest is the MoD's further argument against the release of the requested information even with redaction of names, because it might constitute personal data of third parties within s.40(2) FOIA: 'where small numbers of persons were involved, redaction of the names was insufficient and that individuals would be identifiable from information known to the public in areas where the detainees had been located prior to their detention.' (para. 124)

There, the Tribunal noted that, while the Information Commissioner had referred in argument to whether there was an "appreciable risk" of identification, that did not appear to them to be the statutory test, which uses the phrase "can be identified".' However, in considering the facts of the matter, the Tribunal then said (para. 129) that 'On the evidence that we have received, our conclusion on the balance of probabilities is that publication of the information the subject of the MOD's appeal will not render individuals identifiable.' Thus, the test of identifiability applied in practice there was 'the balance of probabilities', which supports the suggestion in our paper of 'more likely than not' (p. 40, last paragraph).

Summary and comment

As mentioned in footnote 97 of our paper, a Scottish court had previously remarked that the 'hard-line' approach to s.1(1) DPA, whereby the original personal data would have to be destroyed before anonymised information could be released, seemed 'hardly consistent' with recital 26 of the Directive.

While the Tribunal in APGER based their decision on Lady Hale's judgment and Cranston J in PLA followed Lord Hope, both have now firmly rejected the hard-line interpretation. It is not yet known whether the Department of Health will be appealing PLA.

Pending the outcome of any appeal, it is at least now clear that in the UK a data controller should be able to anonymise originally-personal data and then disclose or process the anonymised data, as long as the data are sufficiently anonymised so that the public cannot identify living individuals from the anonymised data. It should not matter that the data controller itself can identify living individuals from the anonymised data and/or the original personal data.

This makes it much more likely that securely-encrypted personal data may be stored in the cloud as 'anonymous data', and should also mean sufficiently-anonymised personal data may be stored or otherwise processed in the cloud.

However, the difficulty of 'sufficiently' or 'fully' anonymising personal data still remains. How much and what anonymisation will be good enough? It seems data will not be 'personal data' if the likelihood of identification is 'ex tremely remote' (PLA), or perhaps if 'on the balance of probabilities' disclosure of that data will not render individuals identifiable (as applied in APGER).

Also, it's still not entirely clear how the data controller must handle the anonymised data. Consider key-coded data, on the assumption that key-coding sufficiently anonymises the data (which may itself be problematic). Under APGER, while the controller still holds the key-coded data and the key it must process that data only in compliance with the DPA – even though it may release the data without breaching the DPA because, on disclosure, the data would, in the hands of third parties, lose 'personal data' character. In contrast, Cranston J seems to consider that sufficiently anonymised personal data would not be personal data. The exact factual circumstan ces may well affect the position – key-coding is not the same as aggregation, and it may also make a difference whether the controller retains the original personal data and/or the key.

Note further, in relation to the status of the anonymisation or encryption process itself (discussed in our paper at 3.3.1), that the Tribunal in APGER has stated that 'Anonymisation by redaction is itself a form of processing.' (para 127)