The frontier for research on intellectual property is very poor. It seems to me that the reason for this is that data on where innovations come from and how they spread are very difficult to come by: patents are not great, as has been shown by a Petra Moser paper discussed earlier on this website. This paper by Heidi Williams, the best job market talk I saw last year, is an exception.

While data availability really restricts work on IP in general, Williams considers the human genome in particular. From the early 1990s to 2003, the Human Genome Project was a publicly-funded project to map the human genome, for future use in medical testing and medicine. From 1999 to 2001, a private effort, Celera, also mapped the human genome. Celera’s data was protected by click-wrap licenses (contract law, essentially), though there was no formal IP in the traditional sense, meaning that once the public effort sequenced a Celera gene, it became freely available to scientists. Many research labs and private companies paid Celera a significant amount of money to access the gene data as soon as Celera sequenced it, rather than waiting for the public effort and getting the gene data for free. Fortunately for us economists, the technique used to sequence genes is imprecise, and knowledge of what genes might be useful is also imprecise, so we can control for a selection bias whereby the public effort may have sequenced the most useful genes first. In 2003, two years after Celera’s data was sequenced, the public effort finished their own sequencing.

Williams collects data on research papers about genotype-phenotype links, as well as the existence of genetic tests for diseases associated with certain genes. She finds that Celera-held genes saw roughly 30 percent fewer scientific publications about genotype-phenotype links, and a similar decrease in availability of genetic tests. This effect was more pronounced for Celera genes that the public effort found in 2003 than for Celera genes that the public effort found in 2002. Further, this effect is long-lasting: even today, genes once held by Celera see fewer publications per year, perhaps reflecting increasing returns to R&D on genes that were available earliest from the public effort. In what strikes me as very careful work, Williams attempts to control for a number of potential selection problems, but every specification of the model gives the same qualitative result.

Should this result generalize to other fields, I see it as a striking argument against something like the Bayh-Dole Act, a law that incentivized publicly-funded research to be patented. The law and econ argument was that patenting would inspire research to be monetized, and further that downstream innovation might increase because such innovation would be protected by a license of the original patent. This paper suggests the opposite: increasing the price of using already discovered knowledge, at least in gene research, decreases downstream product development, and significantly so.

Beyond the results themselves, I like the presentation in this paper. Empirical estimation on most social science questions is hugely imprecise. For this reason, papers that claim results like “A 21.4% tax is optimal”, or “Research output is estimated to fall 45.3%” are just giving the reader a false sense of precision. Williams reports her results as “on the order of 30 percent”, leaving the exact numbers from the regression to tables in the appendix; this is as it should be.