Why Manage and Publish Data?

Increase the visibility of your research: Making your data available to other researchers through widely-searched repositories can increase your prominence and demonstrate continued use of the data and relevance of your research.

Save time: Planning for your data management needs ahead of time will save you time and resources in the long run.

Simplify your life: Enabling a repository to house and disseminate your data lets you focus on your research rather than responding to requests or worrying about data that may be housed on your web site.

Preserve your data: Only by depositing your data in a repository can you be sure that they will be available to you and other researchers in the long-term. Doing so safeguards your investment of time and resources (including any work done for you by graduate students) and preserves your unique contribution to research.

Increase your research efficiency: Have you ever had a hard time understanding the data that you or your colleagues have collected? Documenting your data throughout its life cycle saves time because it ensures that in the future you and others will be able to understand and use your data.

Documentation: Managing and documenting your data throughout its life cycle ensures that the integrity and proper description of your data are maintained.

Meet grant requirements: Many funding agencies now require that researchers deposit in an archive data which they collect as part of a research project.

Facilitate new discoveries: Enabling other researchers to use your data reinforces open scientific inquiry and can lead to new and unanticipated discoveries. And doing so prevents duplication of effort by enabling others to use your data rather than trying to gather the data themselves.

Support Open Access: Researchers are becoming increasingly more aware of the need to manage their work and consider issues of scholarly communication. The Open Data movement advocates for researchers to share their data in order to foster the development of knowledge.

Evaluating Data Needs

What type of data will be produced?

Gather a clear picture of what your data will look like. Is it, for example, numerical data, image data, text sequences or modeling data? Knowing exactlywhat kind of data you have will inform many decisions you need to make about storage, backups and more. Image data requires a lot of storage space, so you'll want to decide which of your images, if not all, you want to retain, and where such large datasets can be housed. As for backing up your data, your research center or university may have the ability to help you. On the other hand, if you are storing images, you may quickly exceed your institution's limit for backing up individual laboratories or groups.

How much of it, and at what growth rate?

Once you know what kind of data you're producing, you'll be able to assess the growth rate. For example, are you gathering data by hand or using sophisticated instrumentation that is able to capture a lot of data at once? Will there be more data as time goes on? If so, you will need to plan for the growth. What amounts to enough storage this year may not be sufficient for next year.

Will it change frequently?

The answer to this question impacts how you organize the data as well as the level of versioning you will need to undertake. Keeping track of rapidly changing datasets can be a challenge, so it's imperative you begin with a plan that will carry you through the data management process.

Who is it for?

Who is your audience for the data? How will they use the data? The answer to this question will tell you how to structure the data and where to distribute it, among other things.

Who controls it (PI, student, lab, UC, funder)?

Before you spend a lot of time figuring out how you're going to store the data, name it, etc. you need to know if you have the authority to control it.

How long should it be retained? (e.g. 3-5 years, 10-20 years, permanently)

Not all data needs to be retained indefinitely. Figure out what's important to keep and make sure your plan for those datasets is solid.

Why Practice Good Data Management?

Data Sharing and Management Snafu in 3 Short Acts

Data Planning Checklist

Managing your data before you begin your research and throughout its life cycle is essential to ensure its current usability and long-run preservation and access. To do so, begin with a planning process.

What type of data will be produced? Will it be reproducible? What would happen if it got lost or became unusable later?

How much data will it be, and at what growth rate? How often will it change?

Who will use it now, and later?

Who controls it (PI, student, lab, MIT, funder)?

How long should it be retained? e.g. 3-5 years, 10-20 years, permanently

Are there tools or software needed to create/process/visualize the data?