Science, Regulation, and the Epistemology of Big Data

The internet has been abuzz with the FDA's decision to order the personalized genomics firm 23andMe to stop selling their DNA Analysis Service. You can get a good overview of the dispute between the company and the regulatory agency here. What I'd like to do here is to explore a fascinating epistemic question that lies at the heart of this kerfuffle.

One thing that has made 23andMe such a high-profile company is its innovative business model. It can, and I believe should, be seen as a manifestation of a much larger cultural move that is often labeled big data. The company is making a financial bet on the idea that DNA sequences are not so different from hypertext in that both can be analyzed computationally using efficient search algorithms. It is not so surprising, then, to learn that the company's co-founder and CEO, Anne Wojcicki, is married to Sergey Brin, one of the co-founders of Google. And indeed, Google ranks among 23andMe's biggest corporate backers, having invested close to $4 million into the startup.

The company's main selling point is that it has built a vast database of genome sequences, which it has annotated with metadata gathered both from customer surveys and the published literature. Its business model, then, is not so different from that of Google, except that the data it trades in are of a different sort. Rather than compiling an ever larger database of email, instant messages, and search histories to more powerfully target advertising, it collects sequence and meta-data to produce ever more powerful assessments of what your DNA says about who you are, where you come from, and what (health risks) might lie in your future. The more people order the company's "spit test" (pictured above), the bigger the database becomes, and the more detailed and accurate its statistical assessment of your genetic risk factors will be.

In its official warning letter, the FDA made a number of interesting points, but let me just take on two of these in particular.

The first is broadly ontological, and it concerns the question of whether 23andMe falls under the agency's regulatory purview. In order to claim the authority to regulate its business practices, the FDA argues that 23andMe constitutes a medical device manufacturer. This is somewhat surprising, since the spit test merely purports to reveal the information already encoded in your genome. As such, it is rather different from interventionist machines like endoscopes and other things we normally think of as medical devices. Does the FDA's letter therefore imply that anyone offering to look at your back and check for moles, say, is also peddling a medical device?

The answer is "no." The reason, regulators claim, is that 23andMe does more than just sequence its customer's DNA. It also interprets those sequences, using its vast database to provide you with a statistical assessment of your risk factor for developing certain diseases.

And it is precisely these interpretations that has the FDA worried. This is because, in theory at least, these interpretations could be riddled with false positives or negatives. Imagine, for example, mistakenly being told you have a 75% chance of developing breast cancer. This might prompt you to have a preventative and, in this case, wholly unnecessary mastectomy. Or imagine the alternative: erroneously being told you have a low risk factor for developing breast cancer. This might cause you to forgo routine checkups, with potentially disastrous consequences.

The real problem, according to the FDA, is that 23andMe does not have any solid evidence that it's statistical analysis are, well, statistically sound. As the FDA's letter to Wojcicki states, "even after these many interactions with 23andMe, we still do not have any assurance that the firm has analytically or clinically validated the PGS for its intended uses."

What's at root in the disagreement here might be more than meets the eye. Although I cannot say so with certainty, I have a strong suspicion that what the FDA is really objecting to here is the use of big data techniques in biomedicine. Traditionally, if a company wants to bring a certain pharmaceutical or medical device to market, it is expected to conduct extensive clinical trials. In so doing, there might not be an expectation that every aspect of the drug or device's mechanism of action is fully known. However, there would at least be clinical or experimental data to support its safety and efficacy.

But these are precisely the kinds of data that big data will not produce. That is, big data companies like Google and 23andMe are in the business of data analysis, not data production. They expect their users to generate the data themselves. All that they do is to devise techniques (algorithms) with which to draw conclusions from those data.

If I'm right, there is a fundamental epistemic rift between the FDA and 23andMe. What's at stake in this rift, moreover, is no less than the question of whether big data approaches can legitimately be exported from scientific research and internet marketing to clinical medicine!

1 comments:

Nice post, although I think you over-philosophize your conclusion. Not sure what you mean by expecting users to generate the data themselves. The users in this case are those who spit in the tube. They generate the data the way I generate data when I get a blood test.

The issue is whether the 23andMe profile/analysis package constitutes a "medical device." FDA says it does, that the test doesn't fulfill the standards for one, and that the company makes claims that imply it does. In short, this is more about hype than epistemology.