Other resources

The ETDplus project has published a Data Organization and a Version Control guidance brief. These are short "how to" documents written for a student audience, designed to assist students with data management issues related to their theses and dissertations.

What Should I Focus on When Organizing Data?

There are some fundamental decisions that you need to make when you start your research, and data organization should be within this set. The choices that you make will vary based on type of research that you do, but everyone must address the same issues. Consider the following things as you organize your data:

File version control (see tools at right)

Directory structure & file naming conventions (see below)

File naming conventions for specific disciplines (see right)

File structure

Use same structure for backups

File Naming Best Practices

File names should provide context for the files that they name, and distinguish them from files that may be similar. Many files are used independently of their file or directory structure, so provide sufficient description in the file name.

When using sequential numbering, make sure to use leading zeros to allow for multi-digit versions. For example, a sequence of 1-10 should be numbered 01-10; a sequence of 1-100 should be numbered 001-010-100.

No special characters: & , * % # ; * ( ) ! @$ ^ ~ ' { } [ ] ? < > -

Use only one period and before the file extension (e.g. name_paper.doc NOT name.paper.doc OR name_paper..doc)

example: Project_instrument_location_YYYYMMDD[hh][mm][ss][_extra].ext

Directory Structure Naming Conventions

The structure of directories/folders for organizing the files should also have a clear, documented naming convention.

The top-level folder or directory should include the project title, unique identifier, and date (year).

Directories/folders within the substructure should be divided by a common theme. For example. each folder may contain a run of an experiment or a different version of each dataset.