Related Links

Data Management and Sharing

What is Data Management?

Data Management is the process of controlling the information generated during a research project, including the storage, access and preservation of data throughout the research life cycle and beyond. Any research project will involve some level of data management; the outcome of the research depends in part on how well this data is managed.

Many federal funding agencies are now starting to require formalized data management plans. Regardless of funding, a written plan should be in place for all research projects, and shared with all key personnel involved with the project.

Effective Data Management Practices include:

Designating the responsibilities of every individual involved in the study,

Determining how data will be stored and backed up, including long-term archiving,

Implementing the data management plan, and

Deciding how data will be dealt with through each modification of the study.

General Roles and Responsibilities for Data Management

Principal Investigator (PI): The primary owner of the data. The PI is responsible for identifying an information custodian, developing a written data management plan, enacting processes necessary to confirm compliance with the plan, and for ensuring data is retained and shared according to sponsor and university requirements.

Division of Research (DOR): Responsible for the development of and review of compliance concerns related to a campus-wide policy for data management, particularly with regard to compliance with federal grant requirements and sponsored project agreements. A DOR Data Management policy is currently under development.

Data Management Tools and Best Practices

Developing a data management plan does not have to be complicated. The DMP Tool website has numerous templates by discipline for creating data management plans.

A Word about Metadata

The word "metadata" means "data about data." It gives context to your research data by providing descriptive detail about it. It articulates a context for objects of interest -- "resources" such as MP3 files, library books, or satellite images -- in the form of "resource descriptions.” It encompasses the following:

names, labels and descriptions for variables, records and their values

explanation of codes and classification schemes used

codes of, and reasons for, missing values

derived data created after collection, with code, algorithm or command file used to create them

weighting and grossing variables created and how they should be used

data listing with descriptions for cases, individuals or items studied, for example for logging qualitative interviews

What Information Technology Security resources are available on campus?

Members of the IT Security Team are available before, during, and after the conduct of your research project to advise investigators and departments/colleges/centers on proper measures to protect valuable data.

Consulting with IT Security as you are writing your proposal is the ideal time to ensure these protections are put in place and considered when building the research budget.

Best Practices Data Storage/Archiving

Data must be archived in a controlled, secure environment in a way that safeguards the primary data, observations, or recordings. The archive must be accessible by scholars analyzing the data, and available to collaborators or others who have rights of access. Primary research data should be stored securely for sufficient time following publication, analysis, or termination of the project. The number of years that data should be retained varies from field to field and may depend on the nature of the data and the research.

Sustainable data management is crucial to the value of research and crucial to ensuring continued scholarship. Typically, in data storage, there is an access copy, for use, and an archival copy, essentially for preservation and back-up purposes. Backing up data cannot be overemphasized, just as natural disasters and breakdowns in systems and software cannot be predicted. Back up your data early and often.

Choosing data formats and software depends mostly on the preference of the researcher but can often be dictated by discipline-specific standards and customs. While ensuring the long-term usability and sustainability of data requires attention to standard and interchangeable software, there are also Preferred Formats (from the UK Data Archive) for data creation and preservation.

For more information about selecting data formats and software with respect to sustainability, see "Sustainable Data Formats" (University of Wisconsin-Madison).

Long-Term Data Storage: Close attention to storage, back-up, security, and sustainability of your data means you lessen the risks of compromising their quality and accessibility over the long term. Issues related to storage include considering how rapidly data are expected to increase over the lifetime of the research project. Part of answering this question involves determining whether data will be collected in automated ways, which potentially steps up the scale of data collection, or whether staff on the project will be gathering data themselves (e.g., via inputting in a database, or a lab notebook). Options for short-term storage include hard disk drives and portable media (e.g., DVDs and CDs).