The Big Data to Knowledge (BD2K) Initiative

February 4th, 2014

The topic of 'Big Data' (of all sorts) has become a hot one across the industrial, academic, and non-profit sectors. Recognizing the importance of biomedical Big Data to NIH, a Data and Informatics Working Group of the Advisory Committee to the NIH Director made a set of recommendations in 2012 that outlined programmatic ways for NIH to address the opportunities and challenges facing all biomedical researchers in accessing, managing, analyzing, and integrating the increasingly large amount of data. On the basis of that report, the Big Data to Knowledge (BD2K) Initiative was conceived.

The mission of BD2K is to enable a quantum leap in the ability of the biomedical research enterprise to maximize the value of the growing volume and complexity of biomedical data. New and powerful technologies, being used by a growing number of investigators, are resulting in the generation of increasingly large amounts of highly complex and diverse data. As a result, biomedical research is becoming increasingly data-intensive and data-driven. However, an insufficient availability of relevant software tools and, frequently, a lack of expertise often limit the ability of individual investigators to locate, analyze, and use these biomedical 'Big Data' to advance their research. BD2K aims to develop new approaches; useful standards; more effective methods, tools, and software; and improved competencies that will enhance the ability of all investigators to use biomedical 'Big Data' more effectively. To this end, BD2K is developing a program of research, implementation, and training in data science and other related fields, all of which are relevant to biomedical research.

BD2K is truly a trans-NIH initiative. All NIH Institutes and Centers, as well as the NIH Common Fund, are contributing to the funding for BD2K, starting with $24M in Fiscal Year 2014 and increasing to about $100M in Fiscal Year 2016. Staff members from almost every NIH Institute and Center are participating in BD2K's planning and implementation. BD2K is initially planned as a seven-year effort, through Fiscal Year 2020; toward the end of this period, there will be a rigorous review of the success of BD2K in meeting the 'Big Data' needs of the NIH-supported research community, with the results of that review determining the longer-term plans for trans-NIH data science activities.

Standing up BD2K has taken the hard work and dedication of many, and has been accomplished so rapidly that the Initiative is now poised to take off. The first set of planning workshops has been completed, more information has been obtained from the research community in the form of responses to Requests for Information, the first Funding Opportunity Announcements (FOAs) have been released (bd2k.nih.gov/funding_opportunities.html#sthash.c1MzcpY7.dpbs), and the first grant applications are under review. Several more funding opportunities are under development and will be released soon. All of this progress with BD2K was accomplished in the face of significant constraints resulting from sequestration and other budgetary uncertainties!

NHGRI has been heavily involved in the early stages of BD2K, and we will continue to be involved going forward. We are passionate about BD2K because of the major 'Big Data' challenges facing genomics and genomic medicine, but the fruits of BD2K will certainly benefit all of NIH and the larger biomedical research community.

For just over a year, I have served as Acting NIH Associate Director for Data Science, a newly created position to lead NIH's efforts in 'Big Data' and data science. Recently, Dr. Phil Bourne was named the first NIH Associate Director for Data Science (nih.gov/news/health/dec2013/od-09.htm). I am delighted to report that Phil will arrive in March, and he is excited to guide NIH's 'Big Data' enterprise, including leading the BD2K Initiative.

BD2K has created a positive buzz in the NIH research community. The BD2K Centers of Excellence FOA resulted in a very robust response, and the first awards will be made this summer. With more FOAs to be issued in the near future, we anticipate a growth in NIH-supported data science research and, eventually, a large benefit to the biomedical research community. To learn more about BD2K, visit bd2k.nih.gov. In these somewhat tumultuous budgetary times for NIH, it is truly gratifying to see this exciting and important effort launched.