from the still-feeling-safe? dept

Privacy. Everybody talks about it. Grandstanding politicians make plenty of loud noises in the general direction of the internet, disparaging it for turning your perusal of Kim Kardashian-related articles into targeted ads for breast enhancement surgery and Kanye West tickets. Of course, while these politicians are making all this noise about your privacy, they're quietly signing off on efforts allowing them to sneak in the backdoor and raid your browser history.

Putting the government in charge of your privacy has never been a great idea. When HIPAA was enacted, its privacy requirements greatly affected the medical community. Like many regulatory acts, HIPAA both raised costs (additional paperwork and other compliance factors) and lowered quality (negatively affecting retrospective research and curtailing proactive follow up care).

Vioxx, the non-steroidal anti-inflammatory drug once prescribed for arthritis, was on the market for over five years before it was withdrawn from the market in 2004. Though a group of small-scale studies had found a correlation between Vioxx and increased risk of heart attack, the FDA did not have convincing evidence until it completed its own analysis of 1.4 million Kaiser Permanente HMO members. By the time Vioxx was pulled, it had caused between 88,000 and 139,000 unnecessary heart attacks, and 27,000-55,000 avoidable deaths.

Even the government's own regulators were stymied by HIPAA's privacy requirements, as was pointed out by Dr. Richard Platt, a drug risk researcher for the FDA:

The Vioxx debacle is a haunting illustration of the importance of large-scale data research. If researchers had had access to 7 million longitudinal patient record, a statistically significant relationship between Vioxx and heart attack would have been revealed in under three years. If researchers had had access to 100 million longitudinal patient records, the relationship would have been discovered in just three months. Of course, if public health researchers did post-market studies that looked for everything all the time, many of the results that look significant would be the product of random noise. But even if it took six months or one year to become confident in the results from a nation-wide health research database, tens of thousands of deaths may have been averted.

At least as troubling as the fact that several thousand deaths could have been prevented if HIPAA's restrictions and terms had not been so limiting is the fact that the privacy stipulations were put into place based on a faulty premise and the Dept. of Health and Human Services' misplaced confidence in the erroneous results.

The premise, as demonstrated by Massachusetts graduate student Latayna Sweeney, was that patient reidentification was possible using only voter registration records and Massachusetts Group Insurance Commission's (GIC) anonymized records. Sweeney was able to reidentify Governor Weld using voter record information, including birth date, name, address, zip code and sex and cross-referencing it with GIC's data. But, as Info/Law points out, Sweeney made a couple of errors, not the least of which was conflating two different terms:

Latanya Sweeney used census data to estimate that 87% of the population has a unique combination of 5-digit zip code, birthdate, and gender, and implied that the same sort of attack, using voter registration records or other public files. Phillip Golle's replication corrected the figure to 63%, though that's hardly comforting. But these uniqueness statistics are rather misleading. There is an important difference between distinguishability and identifiability. Distinguishability is a necessary condition to conduct the sort of matching attack that Ohm describes, but it is not sufficient. Latanya Sweeney conflated the two when she suggested that a unique individual can be identified by linking the unique combination of attributes to public records-voter registration records, e.g.. But public records are never complete. We know, for example, that a significant portion of the population is not registered to vote. How was Sweeney so sure that there was not another man who shared Gov. Weld's birth date and zip code who was not registered to vote?

Not only was the data set incomplete, but it was overly simplistic and off by a large margin:

Daniel Barth-Jones has recently uploaded a fascinating new article that revisits the famous Gov. Weld reidentification. To start with, Sweeney's estimate of the Cambridge population is way off. There were nearly 100,000 people living in Cambridge at the time of the William Weld attack. This should have been the first hint that Sweeney's methodology was overly simple. She reported a population of 54,000 because that is the number of Cambridge residents who were registered to vote. Sweeney used these records as if they described the entire population.

By comparing Sweeney's count of Cambridge voter registrants with U.S. Census records, Barth-Jones confirmed that many voting-age adults in Cambridge (about 35%) were not registered to vote. In William Weld's case, the census data show that approximately 174 men living in Weld's zip code were Weld's age. We don't know their precise birth dates, but we can calculate that the chance another man living in Weld's zip code shared his birthdate was about 35%. This is quite important all on its own to illustrate the difference between identifiability and distinguishability. Most of those 174 men had a unique combination of birth date, gender, and zip code, but each one of them was quite likely-35% likely-to be non-unique.

Sweeney presumably used the voter registration records to rule out the possibility that some of these 174 Cambridge men shared Gov. Weld's birth date. But even if Sweeney did indeed confirm that no other registered voter shared Weld's gender, zip, and birth date, she could not have been sure about the 50 or so Cambridge residents who were Weld's age and were not registered to vote. Thus, at best, Weld's chance of having a unique birth date, zip code, and gender combination is 87%. Put differently, the chance that Latanya Sweeney's matching attack would have been wrong using these three variables alone was 13%- much worse than traditional 5% statistical confidence.

Despite these erroneous assumptions based on incomplete data, the Dept. of Health and Human Services stated the study had shown that "97 percent of the individuals in Cambridge whose data appeared in a database which contained only their nine digit ZIP code and birth date could be identified with certainty." This completely ignores the fact that over a third of the population wouldn't even show up on the list.

But bad data and faulty research have never stopped governmental "progress." The threat of reidentification is low and any attacks remain purely speculative. But while bad regulations have a tendency to be able to weather even the toughest criticism without making the slightest concessions, HIPAA has one thing most bad regulations don't, as Info/Law points out: "a body count."

Re:

since when has any company or politician worried about the number of deaths in their related areas? what they concentrate on is how many were copyright infringement related and, but for the rebuttal of SOPA, how many less there would have been. bottom line being, had the public kept quiet instead of sticking up for themselves, there would have been more potential victims for the entertainment industries to prosecute

Re:

I would appreciate it if Volokh or Tim would elaborate on what IS allowed by HIPAA - for example, data with DOB/zipcode/gender might be enough to theoretically identify someone, but year of birth/zipcode/gender certainly wouldn't be - and the effectiveness of using whatever data can be gathered relative to the effectiveness of what data is not allowed.

Re: Re: Re:

like you pay for roads whether you use them or not, or schools or police or fire departments. In fact you may get through your entire life without using those, but sometime in your life you will use healthcare. Plus if government has been paying for healthcare for a long time we have all already been paying for it. Where do you think tax money comes from?

Re: Unless you just decide not to play the game

Re:

since when has any company or politician worried about the number of deaths in their related areas?

For companies, simply tell them: "Dead people don't pay", as in: "Sell cigarettes and/or alcohol to get them addicted, and they'll be making you a pretty penny for decades, until they die of lung cancer or liver failure. Sell them extremely deadly products, and you've lost a customer... for life (because they're dead)."

Re: Re: Re: Re:

Yes, except that those roads are -available- to everyone too, whether they use them or not. Same for schools, police, fire dept, etc. US gov healthcare only covers about 80% of of the population, and only a fraction of the time.

US health care is a big freaking mess, and no one wants to address any of the real problems with it. Instead they just sort of bury them under lesser, newer problems. Thus why we have Obamacare: people are fighting over the few trillion in relatively recent costs it added, while ignoring the tens or hundreds of trillions of future unpaid costs in Medicare and Medicaid.

Flame

This article flames the graduate student as much as the government. I find this irresponsible. The graduate student is paying to learn how to do research. Shouldn't the advisory committee, been at least partially responsible for ensuring that the student's thesis calculations were accurate?

Making mistakes is part of learning, and if you are paying to learn, shouldn't those being paid to teach help you find the mistakes you can learn from?

Finally, her research faulty or not (which is why all research is suppose to be peer reviewed) did recognize a significant issue with government policy.

I give my health records away for free to be correlated and used somehow, I try to anonymize it knowing full well that at some point I could have made some mistake that could allow it to be correlated to near certainty.

The thing is, people need to make a choice, if you want total privacy that will have costs, it comes to down to higher operational costs, less openness and usefulness of the data and it can be alleviated by people choosing what to release or not for better or for worse, or having everybody see your records and make decisions upon them that could have negative effects on your life.

Or we could build a society where that data is not relevant to anyone except researchers, what are the forces that are driving people to hide that information?

Insurance companies? Make it so that they can't discriminate against people, one way to do that is by legislation, a better way is to create and support competing insurance alternatives. If you have to pay for healthcare anyways why not do it voluntarily but give all you can to community based clinics? Donate money, time, food, clothes, learn to sew something so you can make those hospital garments and give it to them, volunteer to help, don't wait for others to do your job for you it is not going to happen, Fight against Obamacare in the point the it forces you to be insured, fight Obamacare if it doesn't allow you to bui8ld your own options, else it is irrelevant.

A great initiative was born on Patientslikeme, it does exactly what the government was unable to, it doesn't have any legislation yet forbing it from existence and it works, nothing is stopping people from doing something about it, only ourselves at the moment.

We know the government is slow, what can we do about it ourselves?
Pay attention to research done and see which drugs raise red flags and start building our own databases to where we can point doctors to our own collection data about them?

Are companies using that information to not hire people?
That may need legislation, although people can point and shame companies that do and others that don't work for that company can pressure it to change its ways or even force it out of business the first company that goes down because of crap like that will scare the others hopefully.

How can we correct not fight, but correct the factors that are making "privacy" so important?

I post anonymously online all of my medical records for researchers or people interested in knowing what to expect from the same problems I have, I don't do it for myself, I do it because somebody else did it and it helped me greatly, it is comforting to know what could happen to you even if it is horrible, some people may not want to know of course that is their choice and I can respect that, but I do it to those who like me find comfort in knowing what to expect on the road ahead of them, my health problems are a map, the collection of maps can show the probabilities of success, can be used to show what worked, why it worked and how it worked or didn't, even if it doesn't help me directly at the moment and I can't see any immediate benefits, I have let go of my fear of being exposed, and it is hard for me, I am almost pathological about privacy, that is why I always be an anonymous coward.

Obtuse refusal to understand?

I work for a healthcare organization. Our data is routinely used in research. Of course, HIPAA prevents identifying data from being included in records used for research, which means that research is simply done on anonymized records.

The government could have had all the records it wanted for looking at Viox except for two things: (1) Healthcare wants paid for research data, profit motive, and the government researchers couldn't get the money; and (2) Merck actively interfered with research in the Vioxx arena, because Merck knew the drug could not stand the scrutiny...profit motive again.

And as for the line about 100 million healthcare records? That is pure fantasy land: One in 60 people world-wide? One in 3 in the United States? Be serious. Even the concept is just as outrageous as as the record companies that claim billions lost to piracy.

It's more complicated than that

However, the overall percentage of people not registered to vote is not what matters here. Researchers should match against the part of the overall population that are similar to the target group. (in the case of the governer, men of the same age who live in the same kind of neighborhood - presumably upscale). This matters because many reports show that the percentage of non-voters is not even, but varies with e.g age, race and social class. Anyone know a report which puts a lot of work into analysing this issue fully?

And Finally...

Where you live correlates strongly with your age. We are all familiar with student areas around universities, and retirement villages, and the fact that families with young children like to live near good schools. This means that a lot more people than expected will share the same birth date as other(s) in their zip code.

So identification by birthday and zip code is likely to be far less accurate than the simplistic figures suggest. And, as diseases tend to favor people of specific age groups, the chance of several victims sharing a birthdate and zip code is *even* more likely than that.

I would expect a lot of cases where multiple disease victims are recorded as the same person.

Not so sure the Vioxx case is about personal privacy

Just as a note, the case cited could have been completely avoided had the company resposible not buried unfavourable results during the trial period. It has been shown that the efficacy and safety of the drug was over hyped to say the least. Perhaps we should be focusing on the 'privacy' given to drug companies rather than ourselves......... just a thought.

Re: Re:

Re: Re:

indeed.

the results are variable depending on the nature of the government in question.

the US's setup is a 'worst of both worlds' mess, i believe.

NZ's system is brilliant for emergency stuff, and Generally works for everything else... but has serious resource issues (mostly due to every second government or so having some brilliant idea about how they can save money by spending hundreds of thousands or more to re-arange everything and put a few dozen people or so out of work...)

though lately ACC (basically a compulsory accident/injury insurance plan) has been getting raked over the coals... but that's over the sort of stupid thing corporations pull off all the time too. fudging the numbers to make themselves look good, deliberately mis-classifying stuff (plausibly, so you couldn't prove it beyond reasonable doubt or anything, but it's wrong enough consistently enough that it's quite clear that it's happening. attributing new injuries to ongoing conditions the person in question doesn't have, for example, or claiming something is a part of aging when the person in question isn't old enough for that to be really valid and we Know it was a result of injury) so they don't have to pay out, that sort of stuff.

actually, the biggest problem is keeping trained staff in country. we apparently have a fairly high turnover. though i can't remember if that's due to low pay or the same problem the school system has.

(crazy long rant about school system which added at least half as much again to the length of this post redacted for the reader's sanity.)

Re:

Re: Re: Re:

the NZ public health system is similar.

our taxes are apparently surprisingly low, all things considered.

(which didn't stop our current moronic government lowering income taxes on the top bracket (more the '10%' than the '1%', but still) and raising the Consumer tax (which is a Regressive tax: the less you have the more of it that tax eats, unlike the Income tax, where the more you have the more of it the tax eats, but the tax rate goes up slower than your income does and has a maximum, unlike your income. this was supposedly going to leave the 'average New Zealander' better off... nevermind that their 'average' assumes an income higher than 2/3rds or so of the population actually has. and of the people who Were it was mostly by less than ten dollars a year or something equally pathetic. on the upside, they apparently DID cut down on the number of loopholes available for dodging taxes.)

Three thoughts

1) Government regulation of health privacy is bad, but providing the government with the fully identifiable longitudinal records of thousands of patients is okay?

2) You can still compile longitudinal records of patients for research purposes and stay within privacy restrictions. Sweeney's errors are pretty well known and are irrelevant here. You can use a one-way hash to de-identify longitudinal patient info (as many health analytics companies do) and you can also go through the IRB process. And don't public health researchers operate under looser privacy restrictions anyway? I think they do.

3) You is totally ignoring consistent research showing many patients refuse treatment for health conditions or lie to their health care providers when they feel that their privacy will be violated. No one has put together a "body count" for this behavior, but clearly a problem that weakens a health care system that relies on accurate information to provide good treatment.

Re: Three thoughts

Re:

Actually it can work much better than leaving in the hands of the private sector. The issue is there's a lot of corruption so even the most perfect system will not be much better.

See Brazil as an example. the public health care (SUS) is one of the most advanced and elaborate ones in the world. In theory. However, PMDB (political party like Reps and Democrats) managed to grab enough influence (via huge representation in municipal governments) and got itself sort of rooted in the health ministry for years now diverting shitloads of money. The result is a very problematic system that will work to you if you are lucky enough.

I think neither private nor public is a real solution when you weight human corruption in. Socialism is there to prove it. The idea is actually good but the execution (Communism) proved to be disastrous. Animal Farm is a good book that illustrates this.

Privacy Kills

Yes, I remember when those supporting the use of electronic health records told Congress that enacting the HITECH Act would save 100,000 American lives and $77 billion annually. We are still waiting for the first life and the the first dollar to be saved. Early studies out of Harvard Medical School and other researchers are showing that health IT is ADDING new medical errors and has added at least $6 billion to hospital costs annually, and that number is climbing every year. We now employ thousands of new people in the health care industry, not to look after patients, but to look after and care for the computers.

Don't blame privacy laws for bad methodology

Just because some government-agency provided bad data to researchers doesn't mean the privacy-provisions are a bad thing. Probably the agency just had no experience to apply it correctly.

Because, on the other hand, there is a BIG reason for better privacy-laws. Can you guess why Europe has a MUCH lower rate of identity-thefts than the USA? Precisely. Better privacy protection. Because if you have them, law-abiding companies will actually avoid having too much of data about you -- and if their databases get compromised, you will be hurt much less.

Of course, these laws don't help against criminals and nefarious government agencies, but the 90% of the people and companies which try to follow them make a big difference in reducing fraud.

The first people to profit from data-retention are always criminals. Even if the government does it, even if it's meant to combat crime -- first, it will create new crime.

Re: Re: Re: Coonskin?

Re: Re:

Yes, I have yet to see a good capitalistic explanation of why private industry would actually care to support most people (i.e. the ones that need it) without a large government or social stick around. Sure, socialising it too much cna be bad, and the execution regardless can suck - but seriously, the American system really resembles two wolves and a sheep voting for lunch.

Where 'communism' falls down (at least in the health sphere) is that it tends to get overly corrupt (only a few benefit, controling most of the resources) and leaving the masses with the dregs. Hmm, sounds just like the capitalistic version, now I think about it.

Re: Re:

Only if it was an American mule to whom 10% taxes are an 'undue burden' - unlike us Europeans who happily pay higher taxes so we - and everyone else - can get that surgery (or whatever) for as close to free as possible. Because we are humane and civilised.