In numbers

2.5 million - Number of Australians affected by the Department of Health data blunder

3 billion - Lines of data from the MBS and PBS schemes generated for 10% of the population

3 - Number of Australian Privacy Principles in the Privacy Act breached by the Department of Health.

The federal Department of Health "unintentionally" breached privacy laws when it published de-identified health records of 2.5 million people online, Australia's Privacy Commissioner has ruled.

About 1½ years ago, the department published de-identified health data of 10 per cent of the population from the Medicare Benefits Scheme (MBS) and the Pharmaceutical Benefits Scheme (PBS) on the government's open data website for "research purposes".

A month later, researchers at the University of Melbourne sounded the alarm that the data could be re-identified, saying they had pinpointed unique patient records matching seven well-known Australians, including three former or current MPs and an AFL footballer.

The Department of Health published Medicare Benefits Schedule and Pharmaceutical Benefits Schedule data for research purposes.Credit:Peter Braig

After a lengthy investigation, commissioner Timothy Pilgrim has concluded the department had failed to meet the high standard required by the Australian Privacy Principles (APPs), breaching the Privacy Act three times.

Advertisement

"The department breached APP 6 (only in relation to health providers) by disclosing such personal information for a purpose other than that for which it was collected," his report reads.

"It breached APPs 1 and 11 [because] the steps taken ... to confirm personal information was removed from the dataset prior to its publication were inadequate relative to the sensitivity of the information and the context of its release."

The researchers compared the dataset with other sources such as Facebook to re-identify records.Credit:AP

But Mr Pilgrim ruled out the notion that personal information of patients had been disclosed.

This particular finding has stumped the researchers - Dr Chris Culnane, Dr Benjamin Rubinstein and Dr Vanessa Teague from the university’s School of Computing and Information Systems - who easily re-identified records by cross-referencing the dataset with other sources such as Wikipedia, Facebook and news websites.

"The real privacy issue here is the 10 per cent of patients whose longitudinal billing records were published online, information that included 'management of second trimester labour', prescriptions for HIV patients, and a lot of other highly sensitive information," said Dr Teague.

Privacy Commissioner Timothy Pilgrim.

"We showed that a person's record could be easily re-identified given a few simple facts about them, such as the dates of childbirths or surgeries."

The department, which offered an enforceable undertaking, would not say whether the seven patients whose records were re-identified had been notified about the blunder.

"The Commissioner noted that any non-compliance was unintentional and that the department acted in good faith in the steps it took before release of the dataset to protect the information," the department said.

"It is important to note the department is not aware of any individual or provider having been identified through this release of data."

Mr Pilgrim said the enforceable undertaking, which will require the department to continue to review and enhance its data governance and release processes with oversight from the Office of the Australian Information Commissioner, was "an appropriate regulatory outcome".

The report showed the dataset was downloaded about 1500 times in the one month period it was available online.

The dataset included details for each MBS claim between 1984 and 2014 and each PBS claim between 2003 and 2014 made by the sample group.

About 3 billion lines of data were generated for 2.5 million Australians.

Dr Teague urged the department to notify the seven individuals, as well as the 10 per cent affected by the major error.

Loading

She said there was a risk of a repeat if agencies failed to understand how easy it was to re-identify data.

"This shows the surprising ease with which de-identification can fail, highlighting the risky balance between data sharing and privacy," she said.

"The lesson for the future is - don't publish or share sensitive, detailed unit-record level data about individual people without their consent 'on the basis that it is de-identified' because it is probably easily re-identifiable."