Blog

The Unknown Unknowns

A recent NPR poll indicates that many Americans would be willing to share health data, including health records, for research purposes. NPR and Truven Health Analytics interviewed roughly 3,000 people to determine whether they’d be willing to share their anonymized health data with researchers. Just more than half (53%) said that they would. A large percentage, but still a decline from last summer, when a similar poll revealed that 68% of respondents would be willing to share their info for research.

Why the change? Part of it could be that when we asked the first time around the question came after others on the use of electronic medical records by doctors, employers, insurers and hospitals. The context might have affected how people responded. It could also reflect heightened sensitivity about data security … major privacy breaches have been a hot topic in American culture — from leaked pictures of celebrities to the extensive Sony hack.

While the 15% decline is interesting, the fraction of people who would make their private data available is still very high. I think that the majority of people are data altruists — people who believe that making their personal health data available will advance research on diseases like cancer, cardiac disease, and Alzheimer’s. We should applaud their willingness to share.

What concerns me are the “unknown unknowns”. It is unlikely that many of the respondents draw a clear distinction between more routine health data (such as blood pressure) and genomic data. Genomic data is not strictly individual — it contains very sensitive information about relatives as well, including predisposition to disease, physiological conditions, etc. By making a decision to share genomic data, you are also making a decision on behalf of your family members. Furthermore, data released today may yield more and more personal information as the science improves. A decision to share needs to be made in the realization that both uses and abuses of the data will increase over time. Finally, the combination of genomic data from one source with information from elsewhere poses a serious threat to privacy, one that worsens as more and more information is collected.

Faced with these unknown unknowns, a rational strategy would be to ensure that the use of sensitive genomic data is governed persistently, now and into the future. This is our approach with Genecloud. Allowing access to private information for one purpose today should not automatically mean that you accept all uses of your data going forward, or leave the data exposed for arbitrary data mining. We believe that providing security and privacy assurances will ultimately increase data sharing, to the benefit of all.