I'm David Rosenthal, and this is a place to discuss the work I'm doing in Digital Preservation.

Saturday, May 30, 2015

The Panopticon Is Good For You

As Stanford staff I get a feel-good email every morning full of stuff about the wonderful things Stanford is doing. Last Thursday's linked to this article from the medical school about Stanford's annual Big Data in Biomedicine conference. It is full of gee-whiz speculation about how the human condition can be improved if massive amounts of data is collected about every human on the planet and shared freely among medical researchers. Below the fold, I give a taste of the speculation and, in my usual way, ask what could possibly go wrong?
All the following quotes are from the article:

In his keynote address, Lloyd Minor, MD, dean of the School of Medicine, defined a term, “precision health,”
as “the next generation of precision medicine.” Precision health, he
said, is the application of precision medicine to prevent or forestall
disease before it occurs. “Whereas precision medicine is inherently
reactive, precision health is prospective,” he said. “Precision medicine
focuses on diagnosing and treating people who are sick, while precision
health focuses on keeping people healthy.”

The fuel that powers precision health, Minor said, is big data: the
merging of genomics and other ways of measuring what’s going on inside
people at the molecular level, as well as the environmental, nutritional
and lifestyle factors they’re exposed to, as captured by both
electronic medical records and mobile-health devices.

This isn't just what would normally be thought of as medical data:

Precision health requires looking beyond medical data to behavioral
data, several speakers said. This is especially true in a modern society
where it is behavior, not infectious disease, that’s increasingly the
cause of disability and mortality, noted Laura Carstensen, PhD, professor of psychology and founding director of the Stanford Center on Longevity.

But not to worry, we can now collect all sorts of useful data from people's smartphones:

That’s where mobile devices for monitoring everyday behavior can be
useful in ways electronic health records can’t. Several speakers touched
on the potential for using mobile-health devices to survey behavior and
chronic disease and, perhaps, provide insights that could be used to
support better behavior.
...
By monitoring 24/7 which room of one’s home one is in at any given
minute over a 100-day period, you can detect key changes in behavior —
changes in sleep-wake rhythms, for instance — that can indicate or even
predict the onset of a health problem.

An expert in analyzing conversations, [Intel fellow Eric] Dishman recounted how he’d
learned, for example, that “understanding the opening patterns of a
phone conversation can tell you a lot,” including giving clues that a
person is entering the initial stages of Alzheimer’s disease.
Alternatively, “the structure of laughter in a couple’s conversation can
predict marital trouble months before it emerges.”

If only we could get rid of these pesky privacy requirements:

“Medical facilities won’t share DNA information, because they feel
compelled to protect patients’ privacy. There are legitimate security
and privacy issues. But sharing this information is vital. We’ll never
cure rare DNA diseases until we can compare data on large numbers of
people. And at the level of DNA, every disease is a rare disease: Every
disease from A to Z potentially has a genomic component that can be
addressed if we share our genomes.”

The potential benefits of having this data widely shared across the medical profession are speculative, but plausible. But its not speculative at all to state that the data will also be shared with governments, police, insurance companies, lawyers, advertisers and most of all with criminals. Anyone who has been paying the slightest attention to the news over the last few years cannot possibly believe that these vast amounts of extremely valuable data being widely shared among researchers will never leak, or be subpoenaed. Only if you believe "its only metadata, there's nothing to worry about" can you believe that the data, the whole point of which is that it is highly specific to an individual, can be effectively anonymized. Saying "There are legitimate security
and privacy issues. But ..." is simply a way of ignoring those issues, because actually addressing them would reveal that the downsides vastly outweigh the upsides.

Once again, we have an entire conference of techno-optimists, none of whom can be bothered to ask themselves "what could possibly go wrong?". In fact, in this case what they ought to be asking themselves is "what's the worst that could happen?", because the way they're going the worst is what is going to happen.

These ideas are potentially beneficial and in a world where data could be perfectly anonymized and kept perfectly secure for long periods of time despite being widely shared they should certainly be pursued. But this is not that world, and to behave as if it is violates the precept "First, do no harm" which, while strictly not part of the Hippocratic Oath, I believe is part of the canon of medical ethics.

25 comments:

I've heard some feedback that this isn't a problem because people consent to share their data. For example, the article says:

"But it may be that ordinary people will step up to share personal data on a scale that enables the aggregation of huge piles of data, both behavioral and genomic."

and:

"[23andMe] has accumulated not only extensive genotypes on some 950,000 people, she added, but also a great deal of customer-supplied health and behavioral data that researchers can sift through. “About 80 percent of our customers have consented to their data being used for research,” [Hagenkord] said."

But the point of collecting and using this big data is to change people's behavior in ways that enhance their well-being:

"Precision health requires looking beyond medical data to behavioral data, several speakers said. This is especially true in a modern society where it is behavior, not infectious disease, that’s increasingly the cause of disability and mortality, noted Laura Carstensen, PhD, professor of psychology and founding director of the Stanford Center on Longevity."

For example, changing behavior by getting people to exercise more. This is ethically OK. But in order to do that these doctors and researchers are encouraging people to behave in ways such as collecting and sharing extremely detailed personal information on-line that have a high probability of impairing their well-being. Surely this can only be ethical if the benefits of the behavior modification vastly outweigh the risks. Has such an assessment been made by people other than the proponents who are properly informed as to the risks to stored data?

Is a society with less disease and more exercise but which is rigidly controlled by authoritarian governments and in which everyone is at high risk of financial and other crime healthier than the one we have now? The work of Richard Wilkinson and Kate Pickett suggests that even if we look solely at health metrics such a society is likely to have worse overall health.

Another important precept in medical ethics is informed consent. Are those who supply the data actually being informed properly of the risks to them if the data leaks, and supplied realistic assessments of the probability of such a leak? Given the level of awareness of these risks demonstrated by the speakers at the meeting, I take leave to doubt it.

If you don't believe me when I write "But its not speculative at all to state that the data will also be shared with governments, police, insurance companies, lawyers, advertisers and most of all with criminals." you should read David Talbot's piece Cyber-Espionage Nightmare in MIT Technology Review which concludes thus:

The best option, then, could be to get sensitive data off the Internet entirely. There are downsides to that: if e-mail is not used as freely, or a database is offline, keeping up with the latest versions of reports or other data could be more time-consuming. But as Gligor says: “We must pay the cost of security, which is inconvenience. We need to add a little inconvenience for us to make things much harder for the remote attacker. The way to do that is to—how should I put it?—occasionally go offline.”

"For a security company, one of the most difficult things is to admit falling victim to a malware attack," Kaspersky researchers wrote in their report. "At Kaspersky Lab, we strongly believe in transparency, which is why we are publishing the information herein. For us, the security of our users remains the most important thing—and we will continue to work hard to regain your trust and confidence."

A new report into U.S. consumers’ attitude to the collection of personal data has highlighted the disconnect between commercial claims that web users are happy to trade privacy in exchange for ‘benefits’ like discounts. On the contrary, it asserts that a large majority of web users are not at all happy, but rather feel powerless to stop their data being harvested and used by marketers.

"Data theft poses an indefinite threat of future harm, as birthdate, full name and social security number remain a skeleton key of identity in many systems."

And also the legal maneuvering by companies desperate to avoid liability for these harms:

"The impossibility of forecasting what will happen to stolen data has intensified legal wrangling over the rights of data breach victims. The ability of consumers to sue for future harm has, in many cases, been limited by a Supreme Court ruling that on its face had little to do with big commercial breaches. ... Corporate lawyers and some legal scholars are hoping the court follows its logic in Clapper and decides that the plaintiffs lack standing because they have not suffered any injury yet."

The researchers also speculate that the developers behind Duqu and Stuxnet have a reliable supply of additional valid certificates to meet the needs of any future malware platforms.

"The fact that they have this ability and don't reuse their certificates like other APT groups means they probably [used them only for targeted attacks]," Costin Raiu, director of Kaspersky Lab's Global Research and Analysis Team, said during a conference call with reporters. "This is extremely alarming because it undermines all the trust we have in digital certificates. It means that digital certificates are no longer an effective way of defending networks and validating the legitimacy of the packages. It's also important to point out that these guys are careful enough not to use the same digital certificates twice."

The Vietnamese operator of websites selling personal information he exfiltrated from US companies entered a guilty plea to charges that he sold information on 62% of the American population. One identity theft operation, 62% of the population compromised.

"General medical records sell for several times the amount that a stolen credit card number or a social security number alone does. The detailed level of information in medical records is valuable because it can stand up to even heightened security challenges used to verify identity; in some cases, the information is used to file false claims with insurers or even order drugs or medical equipment. Many of the biggest data breaches of late, from Anthem to the federal Office of Personnel Management, have seized health care records as the prize."

At The Intercept, Peter Maas uses openly available information sources to deanonymize "The Socrates of SIGINT" from clues in Socrates' columns in the Snowden documents. The Philosopher of Surveillance is an impressive demonstration of how much a journalist and a researcher can find out about someone whose identity is supposed to be protected basically just using Google.

"More than a petabyte of data lies exposed online because of weak default settings and other configuration problems involving enterprise technologies.

Swiss security firm BinaryEdge found that numerous instances of Redis cache and store archives can be accessed without authentication. Data on more than 39,000 MongoDB NoSQL databases is similarly exposed.

More than 118,000 instances of the Memcached general-purpose distributed memory caching system are also exposed to the web and leaking data, according to Binary Edge. Finally, 8,000-plus instances of Elasticsearch servers responded to probes.

BinaryEdge concludes that it found close to 1,175 terabytes (or 1.1 petabytes) of data exposed online, after looking into just four technologies as part of an online scan."

The exposed data included:

"There are also a lot of usernames and passwords and also session tokens which could be used to take over active sessions. We also have databases from pharmaceutical companies, hospitals which are named 'patient' and 'doctor-list' and to finish we have banks as well, with databases named 'coin' and 'money'"

"Eighty-one percent of healthcare executives say their organizations have been compromised by at least one malware, botnet or other kind of cyberattack during the past two years, according to a survey by KPMG.

The KPMG report also states that only half of those executives feel that they are adequately prepared to prevent future attacks. The attacks place sensitive patient data at risk of exposure, KPMG said."

Have the patients of these organizations given informed consent to this sharing of their health information with the bad guys?

"But the fact that your signing up for 23andMe or Ancestry.com means that you and all of your current and future family members could become genetic criminal suspects is not something most users probably have in mind when trying to find out where their ancestors came from."

As Doctorow writes:

"This is the story of the next decade: companies that started out amassing huge databases of compromising information will be targeted: first by cops and spies (hi there, OPM!), then by civil litigants (something like 80% of all divorce cases now involve a subpoena to Facebook), then by criminals (hello, Ashley Madison!)."

"The Health Insurance Portability and Accountability Act, a landmark 1996 patient-privacy law, only covers patient information kept by health providers, insurers and data clearinghouses, as well as their business partners. At-home paternity tests fall outside the law’s purview. For that matter, so do wearables like Fitbit that measure steps and sleep, gene testing companies like 23andMe, and online repositories where individuals can store their health records."

Again, the point is the lack of informed consent. Customers are not informed that their data will be shared by the company, nor are they informed of the risk that it will be exposed by the kinds of crass stupidity and incompetence the article describes.

"one in three consumers will have their healthcare records compromised by cyberattacks in 2016."

Health care data is much more valuable than other kinds:

"Not only do healthcare records often have Social Security and credit card numbers, but they are also used by criminals to file fraudulent medical claims and to get medications to resell.

Healthcare fraud costs the industry from $74 billion to $247 billion a year in the U.S., according to FBI statistics. Fraudulent billing represents between 3% and 10% of healthcare expenditures in the U.S. each year"

"it’s often little-noticed smaller-scale violations of medical privacy — the ones that affect only one or two people — that inflict the most harm.

Driven by personal animus, jealousy or a desire for retribution, small breaches involving sensitive health details are spurring disputes and legal battles across the country...Even when small privacy violations have real consequences, the federal Office for Civil Rights rarely punishes health care providers for them. Instead, it typically settles for pledges to fix any problems and issues reminders of what the Health Insurance Portability and Accountability Act requires. It doesn’t even tell the public which health providers have reported small breaches — or how many....HIPAA does not give people the right to sue for damages if their privacy is violated. Patients who seek legal redress must find another cause of action, which is easier in some states than others....courts in Ohio, Minnesota and other states have ruled that health providers are not liable for the actions of workers who snoop in medical records outside the scope of their jobs....Since 2009, OCR has received information about 1,400 large breaches. During the same time, more than 181,000 breaches affecting fewer than 500 individuals have been reported."

"Backchannel's package on medical data and the health-tech industry profiles three people who were able to shake loose their own data and make real improvements in their lives with it: Marie Moe, who discovered that the reason she was having terrifying cardiac episodes was out-of-date firmware on her pacemaker; Steven Keating, who created a website with exquisitely detailed data on his brain tumor, including a gene-sequence that had to be run a second time because the first scan wasn't approved for "commercial" use, which included publishing it on his own site; and Annie Kuehl, whose advocacy eventually revealed the fact that doctors had suspected all along that her sick baby had a rare genetic disorder, which she only learned about after years of agonizing victim-blaming and terrifying seizures."

"Today, according to cybersecurity specialists, criminals hoping to scoop up valuable personal data are increasingly targeting health care companies — from local doctor’s offices to major health insurers.

More than 100 million health care records were compromised in 2015 alone. Federal records show that almost all of those losses came from just three attacks on health insurance providers: Anthem Inc., Premera Blue Cross, and Excellus Health Plan Inc."