The National Research Council’s Board on Research Data and Information (BRDI), which also serves as the U.S. National Committee for CODATA (USNC/CODATA), proposes to conduct a study of the various kinds of barriers to cooperation in scientific data activities between the institutions and individual researchers in the United States and the People’s Republic of China (the study), and to make recommendations to help eliminate or minimize the impact of such barriers. The study will be conducted jointly by USNC/CODATA and the Chinese National Committee for CODATA, under the auspices of their respective Academies of Sciences. The study will be performed over a period of two years, beginning January 1, 2010, pursuant to the following statement of task:

1. Identify areas of joint research in the earth and environmental, and the health and biomedical areas, that are particularly data intensive and would benefit from greater exchange and sharing of data. Describe why this would be important.

2. Characterize and analyze the barriers to data sharing or exchange based on scientific and technical; institutional and management; economic and financial; legal and policy; and normative and socio-cultural aspects.

3. Provide conclusions and recommendations for consideration by both countries to overcome the barriers in light of the findings under tasks 1 and 2 above.

A written report will be published in both the United States and China at the conclusion of the study and actively disseminated within each country’s relevant research and policy communities.

Intellectual Merit of the Proposed Activity

Rapidly changing technological capabilities for creating, manipulating, disseminating, and using digital scientific data are producing many new opportunities and challenges. The opportunities arise primarily in data-intensive research and applications, in the integration of heterogeneous data for new results, and in making vast amounts of factual information available for a broad spectrum of users for collaborative research. The inherent challenges are in effectively managing these data resources for optimal access and use, and for developing rational rules and structures for such processes.

Both the United States and China have vigorous ongoing and planned scientific data collection and related research activities. In recent years, the Chinese government has rapidly modernized its publicly funded research system and produced a great deal of scientific data. It also possesses large amounts of highly valuable historical data that are now being digitized. These data collections in China are also potentially valuable for cooperative studies of global health, earth and space sciences, geospatial data applications, and various types of basic and applied research generally. Improved access to those Chinese data resources and expertise is clearly in the interest of the U.S. research and applications communities, and vice versa. The proposed study will promote these interests by jointly studying the various scientific and technical; institutional and management; economic and financial; legal and policy; and normative and socio-cultural barriers to data sharing, and to bilateral cooperation in scientific data and information, and by making recommendations for removing or reducing such barriers to the relevant institutional and governmental managers in each country.

Broader Impacts of the Proposed Activity

Promoting greater access to Chinese scientific data and information is expected to provide a range of scientific, economic, and political benefits. The Chinese government has a major policy initiative underway called the Scientific Data Sharing Program, which was developed in part through a series of bilateral CODATA meetings. The establishment of more open access policies to government and government-funded data in China can have beneficial spillover effects in encouraging greater openness with other public information. Increased openness and improved management with regard to publicly funded Chinese data can also be of substantial benefit not only to U.S. researchers engaged in bilateral or global studies that require access to such data, but to the international scientific community in those fields.

BACKGROUND

As a major producer of scientific, technical, and medical data and scientific information, and as a partner for international cooperative research, China potentially has a great deal to offer to the world’s knowledge base and contribute to comprehensive and integrated solutions to global problems. Although China’s scientific and technical research capabilities are rapidly improving, significant problems remain with regard to its digital data management, access and sharing policies. These problems pose barriers to scientific cooperation, and inhibit more rapid progress and improved bilateral and international cooperation.

Between 2000 and 2005, the U.S. National Committee for CODATA and the Chinese National Committee for CODATA held a series of bilateral meetings with senior science officials and data managers from both countries to discuss various data management and policy issues. Particularly noteworthy in this regard was the high-level Scientific Data Sharing Program announced in February 2003 by the Chinese Ministry of Science and Technology (MOST), and supported by the National People's Congress. This science policy initiative was substantially shaped by and informally attributed to the results of the bilateral CODATA meetings. Also important in this regard was the more recent liberalization of the public sector information law with the enactment of national freedom of information act in 2008.

The two CODATA Committees held a bilateral workshop in 2004 in Beijing, focused on the preservation and sharing of scientific data and information. This meeting re-confirmed the commitment of the Chinese science policy community to promoting greater openness regarding Chinese research data (NRC 2006).,

In 2006, the two CODATA Committees, under the auspices of their respective Academies of Sciences, established the U.S. - China Roundtable on Scientific Data Cooperation (the Roundtable), to convene a series of meetings over an initial three-year period pursuant to the following statement of task:

1.Provide a unique bilateral forum for government, academic, and private-sector stakeholders in the United States and China to discuss and address scientific data practices and policies, pursuant to a mutually agreed agenda.

2.Serve as a catalyst and coordinating body for bilateral cooperation on scientific data practices and policies at the Academy and national level in each country, with appropriate recognition and representation of other thematically related bilateral and international activities.

The Roundtable participants identified four thematic areas of mutual interest in which there could be joint bilateral projects on data management and sharing. The four areas identified were: a) health and biomedical data, b) environmental and geospatial data, c) advanced cyber-infrastructure data applications, and d) scientific data policy.

Three Roundtable meetings have been convened in 2006, 2007, and 2009 (in Beijing, Washington, DC, and Qingdao, respectively) to discuss and propose joint projects in these thematic areas. The participants in these Roundtable meetings from each country included researchers, data managers, and decision makers from various government agencies, particularly those who have been either actively involved in bilateral projects in China or would like to initiate joint bilateral projects.

In the first two Roundtable meetings held in Beijing and Washington DC, the participants discussed and identified various technical, management, socio-cultural, and policy of barriers to data sharing. These barriers provided a major challenge to initiation and successful completion of various bilateral projects proposed by scientists on both sides. At the end of the most recent Roundtable meeting held in Qingdao in March 2009, the U.S. as well as the Chinese participants affirmed that there is a compelling need to have data sharing activities in support of scientific cooperation between U.S. and the Chinese scientists. They also recognized that this cooperation is being hampered by various barriers and that there is an urgent need to address them. The Chinese side proposed a joint, bilateral study that would make recommendations to help remove these barriers, or at least reduce their effect on scientific cooperation across the types of research fields that have been the focus of the bilateral Roundtable. Such a study will build on the experiences and network of experts engaged or affiliated with the Roundtable and accentuate the need for improved data sharing in joint cooperation to research policy-makers and to the scientific communities in each country in addressing high-priority research questions. A list of National Research Council references is included in the Bibliography at the end of the proposal.

The proposed study will identify and analyze barriers to scientific data cooperation from the perspective of several areas, as summarized briefly below.

1. Scientific and technical aspects. Sharing of data and information requires accommodating the needs and practices of different scientific disciplines, as well as encouraging the development of interdisciplinary research values and methods. There likely will be differences among the mandates and objectives of individual institutions for different types of data (e.g., observational vs. experimental, physical science vs. biological science, human subjects or not) and may have disparate procedures and metrics for data quality, and differing criteria for selecting data for sharing with others. The development of databases also depends largely on discipline-independent technology and infrastructure requirements appropriate to the different goals of access and sharing. These include, among others: the development and adoption of common metadata standards and practices; flexible search and retrieval capabilities; technological and semantic interoperability; and appropriately accommodating the evolution in technology (hardware and software), as well as data and information collected in proprietary formats and commercial databases. Some of the technological issues of data collection, storage, management, and sharing are still challenging in the developed world, and they can pose especially difficult hurdles in China.

2. Institutional and management aspects. Maintaining data and information as community resources for purposes of sharing and joint scientific activities requires the implementation of effective operational procedures and practices. The institutions need to manage the collection, storage and sharing of data effectively. Other operational issues may include properly managing the volume of data, which is enormous and growing, even in the developing world; coping with the diversity of sources, formats, and documentation; and maintaining a sufficiently long time horizon for access in the face of continually changing definitions, digital media and formats, and hardware and software obsolescence. Planning and developing requirements for data management and sharing must accommodate the continual change and evolution in the practice of science; the local variability in focus, practice, and available technology and other physical and human infrastructure; and the differing mandates and objectives of various data producing institutions, as well as a diversity of potential sharing partners, including scientists, educational institutions, and policy-makers within and outside each country.

3. Economic and financial aspects.Another major barrier to the establishment of a well-managed data center geared towards data sharing is adequate funding. Data centers are not among the most pressing recognized priorities in either country, despite the importance, and considerable potential contributions, of well-managed scientific information resources to research capacity building and to social and economic development. Creative and well-planned approaches could reduce the financial burden. Moreover, the potential social and economic returns from sharing of data that is relevant to high-priority national and global problems that need collaborative solutions can more than offset the relatively small financial costs of data sharing and exchanges.

4. Legal and policy aspects. Most research databases and data centers in China are managed directly by government ministries and subject to a relatively restrictive state information regime, whereas the situation in the United States is much more decentralized and diffuse, coupled with generally much more open access policies and laws. Laws and policies can provide a major challenge or opportunity for data sharing. They tend to be founded on deeply rooted political, institutional, and cultural factors, some of which apply to the overall public information regime and some of which are exacerbated in the scientific context by perceived political or economic sensitivities of the subject matter (e.g., domestic disease statistics, biodiversity information, environmental degradation or resource exploitation, or the disclosure of many otherwise personally or nationally sensitive facts). As noted above, the recent high-level approval of the liberalization of the government's laws and policies regarding access to government-produced and government-funded academic research data and other information has made this a very propitious time to examine the various barriers to data sharing policies and practices in China and the dynamic interplay with U.S. researchers.

5. Normative and socio-cultural aspects. Various underlying scientific and national norms and socio-cultural values can have important effects on data sharing behavior similar to the legal and policy influences. At the scientific community level, these factors include traditional data sharing behavior in each scientific sub-community, the adequacy of recognition of the importance of the scientific activities by the scientists and their institutions, and the existence or absence of financial incentives or rewards. Relevant cultural and normative values also exist at the broader societal level in each county that need to be understood.

PLANNED ACTIVITIES

This study will be conducted pursuant to the following Statement of Task:

1. Identify areas of joint research in the earth and environmental, and the health and biomedical areas, that are particularly data intensive and would benefit from greater exchange and sharing of data. Describe why this would be important.

2. Characterize and analyze the barriers to data sharing or exchange based on scientific and technical; institutional and management; economic and financial, legal and policy, and normative and socio-cultural aspects.

3. Provide conclusions and recommendations for consideration by both countries to overcome the barriers in light of the findings under tasks 1 and 2 above.

The study will be conducted jointly by the U.S. National Committee for CODATA and the Board on Research Data and Information, and the Chinese National Committee for CODATA under the auspices of their respective Academies of Sciences.

A U.S. study committee will be formed under the auspices of the NAS with expertise in the scientific, technical, institutional, management, financial, legal, policy, and sociological factors associated with scientific data cooperation between the United States and China. Another separate Chinese study committee will be established by the Chinese Academy of Sciences. The two committees will together constitute a joint study committee composed of an equal number of members from each nation to conduct all operational aspects of the joint study. The joint study committee will oversee all the study activities, including the conduct of the workshops, the meetings, the research activities, and drafting of the study report.

There will be two workshops and writing sessions forming the major part of the study, one in the United States and one in China, and a symposium to convey the results of the completed study in Beijing. In addition, there will be a specially designed questionnaire to cover all the areas of possible barriers across all disciplines. Additional research will be performed by the staff between the meetings.

Workshop #1: The first workshop will discuss the various scientific and national benefits of data sharing and identify the types of barriers experienced by the researchers and scientific institutions in the U.S. and China. There will be a number of invited presentations on different aspects of data sharing and the background issues by experts with past experiences in bilateral scientific projects or programs between the United States and China. Based on the deliberations of the first workshop the study committee will design a questionnaire to be sent to researchers, data managers, and policy makers in both the United States and China. It will also select some past, current, or failed bilateral research and data sharing projects that could be used as case studies to highlight the nature of the barriers and help to suggest some solutions to overcome these barriers. This workshop will be co-located with the annual meeting of the Roundtable scheduled to be held in the United States in the first half of 2010. The workshop will be followed by a drafting session of the joint study committee. The detailed outline for the report will be written following this first workshop and each chapter will be drafted by Chinese-US teams before the second workshop (some author teams will need the information that comes out of the questionnaires, but others will not).

Workshop #2: The second workshop will examine the results of the questionnaire and other fact finding, and further characterize the barriers in the categories mentioned earlier. It will distinguish between the barriers that are common to bilateral projects across all disciplines and those that arise only in a particular discipline or research context. It will also discuss the selected case studies presented at the first workshop in more detail to refine the understanding of the barriers and possible solutions. The second workshop is expected to be held in China in the second half of 2010. Another drafting session of the joint study committee will follow the workshop, focused on completing or reviewing the draft text and on developing consensus conclusions and recommendations for the study report. The report will then be finalized by email exchanges or the use of a password-protected wiki.

Symposium: The third and final meeting will be a one-day symposium to be held in conjunction with the U.S.-China Roundtable meeting in China to release the published report and discuss the study’s conclusions and the recommendations with high-level invited policy and research representatives. This symposium will take place in the second half on 2011. A smaller sponsor briefing and press conference will be held in Washington at the NAS just prior to the Beijing symposium to release the report in the United States.

Outreach and Communication

The study workshops, the final meeting, and of course the report of the study itself will constitute a major effort toward outreach and communication about these barriers to bilateral scientific cooperation within the research establishment and government officials of the two countries. A major effort will be made to identify the decision makers and policy makers at the institutional and government levels and to invite them to attend the workshops and the final meeting, as well as to subsequently follow up on implementing the study recommendations. The joint study committee co-chairs, members, and the project director also will be encouraged to give presentations subsequently about the results of the study at various professional meetings and conferences, including the intergovernmental working groups, the Joint Bilateral Commission under the Science and Technology Agreement between the United States and China, and events related to bilateral scientific cooperation.

Substantial efforts will be made to communicate information about the workshops, meetings, the questionnaire, and the results of the study via the websites of the collaborating organizations and the sponsors. Various media outlets also will be targeted in both countries.

Study Report Dissemination

The study report will be available to the public and widely disseminated, without restriction, including publication on the National Academies Press Web site (www.nap.edu). Print copies of the study report will be prepared in sufficient quantity to ensure its distribution to the sponsors and other relevant parties, in accordance with Academy policy. It is also expected that the Chinese collaborators will publish and disseminate the study report in the Mandarin language.

Collaborations with Other Organizations

Bilateral understanding of the barriers to data sharing and joint actions that need to be taken to resolve them are major reasons why this study is being planned. As has already been noted, the

study is a collaborative effort of the two national CODATA committees and their Academies. In addition, many governmental agencies and academic institutions in the U.S. have gained extensive knowledge and experience in bilateral projects in China and will be valuable resources for relevant information and expert participants. Some U.S. agencies, such as the NSF and the NIH, have offices in China to assist in bilateral scientific cooperation. These specialists will be valuable resources for guidance on all aspects of the study, including identification of the issues, workshop speakers and attendees. Other bilateral research projects and organizations will be consulted as well.

The study will be conducted with the participation of many Chinese governmental agencies and laboratories, quasi-governmental laboratories and institutions managed by the Chinese Academy of Sciences (CAS), and academic institutions. These organizations have already participated in the three U. S. - China Roundtable meetings held so far, and have indicated an interest in overcoming the barriers to data sharing cooperation. The study participants will include experts in earth, environmental, and geospatial data, and in health and biomedical data, in the following institutions, among others: the Ministry of Science and Technology of China (MOST), the China Earthquake Administration (CEA), the National Earthquake Response Support Services, the China Meteorological Administration (CMA), the CAS Institute of Remote Sensing Applications, the CAS Center for Earth Observation and Digital Earth (CEODE), the CAS Institute of Geographic Sciences and Natural Resources Research, the CAS Institute of Oceanology, the China Academy of Engineering, the China Academy of Medical Sciences, the China Academy of Traditional Chinese Medicine Science, the Office of State Administration of Traditional Chinese Medicine, the Chinese Center for Disease Control and Prevention (CDC), the Institute of Basic Medicine, the Neuroinformation Center of People’s Liberation Army General Hospital, the Beijing Union Medical College and Hospital, the Beijing Institute of Genomics, the Shanghai Institutes for Biological Sciences, the Shanghai Institute of Life Science, the Shanghai Institute of Technical Physics, the Qingdao Institute of Bioenergy and Bioprocess Technology, the CAS Institute of Microbiology, and the CAS Computer Network Information Center(CNIC). Universities that have been involved so far include Peking University, Beijing Normal University, the College of Biomedical Engineering & Instrument Science of Zhejiang University, the College of Life Sciences of Nankai University, the West Environmental Health and Hygiene Institute of Northwest University for Nationalities, West China Medical School of Sichuan University, the Research Institute of Information Technology of Tsinghua University, and the University of Electronic Science and Technology of China.

Federal Advisory Committee Act (FACA)

The Academy has developed interim policies and procedures to implement Section 15 of the Federal Advisory Committee Act, 5 U.S.C. App. § 15. Section 15 includes certain requirements regarding public access and conflicts of interest that are applicable to agreements under which the Academy, using a committee, provides advice or recommendations to a Federal agency. In accordance with Section 15 of FACA, the Academy shall submit to the government sponsor(s) following delivery of each applicable report a certification that the policies and procedures of the Academy that implement Section 15 of FACA have been substantially complied with in the performance of the contract/grant/cooperative agreement with respect to the applicable report.

Public Information about the Project

In order to afford the public greater knowledge of Academy activities and an opportunity to provide comments on those activities, the Academy may post on its website (http://www.nas.edu) the following information as appropriate under its procedures: (1) notices of meetings open to the public; (2) brief descriptions of projects; (3) committee appointments, if any (including biographies of committee members); (4) report information; and (5) any other pertinent information.

Responsible Staff in the United States

Paul F. Uhlir, J.D.

Director, Board on Research Data and Information

and the U.S. National Committee for CODATA

The National Academies

Washington, DC

puhlir@nas.edu

Estimate of Costs

Funding in the amount of $_________in partial support for the study is requested from the _____ during the period January 1, 2010 to December 31, 2011. The total estimated cost for this study is $ ______for this period.

Bibliography of National Research Council Reports

The Socioeconomic Effects of Public Sector Information on Digital Networks (2009).

Ensuring the Integrity, Availability, and Stewardship of Research Data in the Digital Age (COSEPUP, 2009) [the BRDI Director worked on this project].