Preparing data for deposit

"Guidance on preparing and managing data"

Whether depositing large-scale survey data in our curated collection or smaller research collections via our self-deposit system, ReShare, data creators should consult our guidance below on preparing data. Ideally this should be prior to the start of fieldwork or data collection. In additon to the summary points noted here, we also provide comprehensive best practice guidance aimed at individual researchers and research support staff which can be found on our Manage data web pages.

We run a programme of regular training workshops covering key areas of managing and sharing research data. Please also get in touch with us if you would like to discuss any of these issues further.

Data files

Share

share this page

Preparing data files

Allow sufficient time during and towards the end of a project for these preparations. Build in quality control checks for your data capture and cleaning processes:

use consistent and meaningful file names that reflect the file content, avoiding spaces and special characters; if data are sensitive or restricted, indicate this in the file name

use meaningful and self-explanatory variable names, codes and abbreviations

ensure internal consistency checks are completed

ensure variable and value labels are complete and consistent, both questionnaire and derived variables

remove all your own temporary, administrative or dummy variables created for internal purposes/not of use to researchers

ensure no repetition of variables, especially redundancy in derived variables

check that the level of detail included in the data is suitable for the agreed access arrangements and licensing

apply an appropriate level of anonymisation e.g. serial numbers anonymised so that they cannot be linked to other sources, any top coding applied, cases removed

provide anonymised Primary Sampling Unit information if possible so that researchers can incorporate the sampling design into their analyses

check that any textual variables included are suitable for dissemination e.g. no disclosive information or internal comments in free-text variables

ensure consistent treatment and labelling of missing values

include weights as variables but do not apply them in the deposited data files

A selection of key data, typically from government departments, are made available through our Nesstar service. These requires additional processing work to render them suitable for user-friendly online browsing, including:

variable and value labels must be clear and consistent, avoiding truncation of variable and value labels

non-compliant characters, such as &, @ and <>, should be removed

question text should be made available in as structured a format as possible, e.g. XML or spreadsheet.