Context
Policy and legislative efforts to improve the biomedical innovation process must rely on a detailed and thorough analysis of drug development and industry output.

Objective
As part of our efforts to build a publicly-available database on the characteristics of drug development, we present work undertaken to test methods for compiling data from public sources. These initial steps are designed to explore challenges in data extraction, completeness, and reliability. Specifically, filing dates for Investigational New Drugs (IND) applications with the U.S. Food and Drug Administration (FDA) were chosen as the initial objective data element to be collected.

Materials and methods
FDA’s Drugs@FDA database and the Federal Register (FR) were used to collect IND dates for the 587 NMEs approved between 1994 and 2014. When available, the following data were captured: approval date, IND number, IND date, source of information.

Results
At least one IND date was available for 445 (75.8%) of the 587 NMEs. The Drugs@FDA database provided IND dates for 303 (51.6%) NMEs and the Federal Register contributed with 297 (50.6%) IND dates. Out of the 445 NMEs for which an IND date was obtained, 274 (61.6%) had more than one date reported.

Discussion
Key finding of this paper is a considerable inconsistency in reliably available or reported data elements, in this particular case IND application filing dates as assembled from publicly-available sources.

Conclusion
Our team will continue to focus on finding ways to collect relevant information to measure impact of drug innovation.