Preserving your data

The University of Oxford policy mandates the preservation of research data and records for a minimum of 3 years after publication. Many funders specify longer time-frames. The AHRC specifies a minimum of 3 years; the EPSRC requirement is to preserve data securely for a minimum of 10 years after the expiry of any privileged access period agreed by the EPSRC. See the Funder Requirements section for more information.

Preservation means the storage of a project’s digital outputs in such a way that they remain usable, understandable, and accessible – beyond the end of funding, and ideally for the long term. In practice, therefore, preservation is often achieved by depositing the digital material in an archive/repository during the project, or shortly afterwards. Often, charges made by the archive for preparing and ingesting the data can be directly costed into your grant application (NB. such charges usually need to be paid within the lifetime of the project and not after it has finished).

Think about preservation at the start of your research project – the data you will be working with, the digital outputs that will be produced, where they will be stored, in what way and for how long. Contact Us early on in your project for advice.

Can I preserve my data at Oxford?

Yes. Work is in progress to enable simple deposit and long-term preservation of datasets at Oxford via the Bodleian ORA-Data archive. If you have any questions about depositing your data at Oxford, please contact us.

There are also a number of other services available which can handle the preservation of research outputs for you, including (national and international) disciplinary repositories and data centres/archives. More information on these services can be found on our How to share page.

If you deposit data in another archive, you are strongly encourage to add a metadata record to ORA-Data. This helps make your data more findable, and will contribute to building a comprehensive catalogue of Oxford-created datasets.

Some funders require data produced in the course of research they have funded to be offered to a specific repository – for example, NERC supports a network of Data Centres, and the ESRC funds the UK Data Archive.

However, it is likely that you as the award holder will still be expected to deal with any copyright/third party issues that concern your research.

Many journals require that data underpinning an article be made available via a data archive for long-term accessibility. It is also recommended that your data have a persistent identifier (such as a DOI), to enable reliable citation.

If you are not using an established subject or national data archive, the Bodleian Libraries’ ORA-Data may be able to assign a DOI to your archived data. Contact us for help.

I am collaborating with a university overseas on a project and my data is not in the UK – what do I need to consider?

If your data is stored outside the UK, you will need to ensure that it is held somewhere where the legal safeguards are at least equivalent to the UK requirements.

If you are using a cloud storage solution, you will need to be aware of the legal jurisdiction covering your cloud storage provider.

If you are working with personal data, you should be aware that under the UK Data Protection Act, stricter legal requirements apply if personal data is stored outside the European Economic Area.

Do I have to preserve all digital data from my research – or just those data underpinning publication?

Note, however, that even if you are not required to preserve data, there may be benefits to doing so: it may have reuse value for you or for other researchers. In general, you should aim to justify any decision to discard a digital output.

What format should data be preserved in?

Ideally, data should be preserved in non-proprietary formats, to reduce the risk of the data files becoming unreadable in the future (because, for example, a particular piece of software is no longer available). It’s important to think about how future changes in software and hardware might affect access to the data, and to take steps to avert any problems.

The UK Data Archive has a useful summary of optimal file formats for long-term preservation of data.

What about ethical and legal issues concerning retention of data – how does this conflict with preservation?

Data may need to be anonymised so that individuals, organisations or businesses cannot be identified. Alternatively, sensitive and confidential data may be safeguarded effectively by regulating or controlling access to data or use of them.

Many repositories will allow you to submit your data at the end of a project (when it is easiest to pull the data together), but to embargo (restrict access to) its release for a few years.

In some situations, repositories may allow you to store part of a set of materials publicly and will maintain the other, more sensitive parts while keeping them hidden and inaccessible. For more information, see the UK Data Archive’s advice on Access Control.

How do I ensure my data are preserved securely?

How do I ensure my data is understandable and usable into the future?

It’s important to ensure that effective documentation is being created. This should describe not only the data itself (including explanations of abbreviations, coding, or jargon), but also the methods used and the key decisions behind them. If changes have occurred in working practices during the course of the research project, these need to be documented too. For more information, see the section on Organising Your Data.

How do I preserve data to be shared – are there any additional stages of data preparation needed to allow re-use by others?

How do I cover the costs of data preservation?

Most funders will cover appropriate costs of preparation and ingest of digital outputs that are incurred within the funding period. It is therefore important to address the issue of data preservation right at the start of your project (at the data management planning stage), so that costs can be included in grant applications.

Our Data Deposit Decision Tree outlines some of the questions involved in your choice of which research data to preserve and identification of a suitable repository. (Click for a larger version.)