Monday, 5 October 2015

Digital Data in Environmental Archaeology 2 – Open Data

This is a short account of the reasons why I think that environmental archaeology data should be stored and disseminated as open data (i.e. data that is freely available in accessible formats, usually digital, under licences that allow it to be re-used). I’ve provided an outline of some of the methods that I have used below.

Research is a process that builds on the results of the past, and in the case of environmental archaeology it can often be a useful process to incorporate results from many different sites into one larger dataset, and to analyse this to see if new patterns and insights emerge. To move the study of environmental archaeology forward, I think it is important to ensure that results are stored and disseminated in a way that allows other researchers to re-use data.

Making data accessible

If a researcher wants to re-use archaeobotanical data from one of my reports, no doubt they could re-type all the information that is available in printed formats or in PDFs. But it would be much better if the data was made available digitally. Much of the raw data in environmental archaeology (certainly in archaeobotany) is prepared in spreadsheets and I have spreadsheets that date back to 1998. How long will I be able to access these using more modern software packages? And is it realistic to expect me to convert and update the files each time there is a new iteration of spreadsheet software?

Fortunately many software packages have some built in backwards compatibility. The best way to ensure that the data in my spreadsheets (and in databases) is readable into the future is actually to convert it into a very old format, a .csv file. Comma Separated Value files (.csv) provide a very simple means of structuring data. CSV is a de facto standard for saving tabular data and it supported by a huge number of applications. This means that if you save your tabular data as a .csv file, most programmes will be able to access the data (and the more accessible your data, the more likely it is to be preserved into the future).

How to convert your spreadsheet to a .csv file

The easiest way to save your data in. csv format is to open your preferred spreadsheet application, click on “Save as” and scroll down the list of options until you find .csv. This file should contain all your basic data, organised simply and clearly (leave pie-charts out). It should be kept as the preservation copy of your data.

N.B. Preserving text files is different. Save your report as a .pdf, as this is a relatively stable and supported format. For added accessibility it is a good idea to save text as .txt files (go to “Save as” and select the .txt option). This will preserve the text but won’t preserve any added graphs and images, and it won’t preserve formatting.

Licensing your data so that it is available for re-use

Open data is distributed so that it can be re-used. This usually means publishing your data under an open licence, such as one of the Creative Commons licences. These are licences that provide an extension to copyright, allowing you to give permission in advance for people to re-use your material, and allowing you to stipulate the conditions under which this re-use can take place. Creative Commons offer several different ways for you to share your material, from a completely open licence (CC-0) to more restrictive licences that stipulate that the material must be cited as your original content (CC-By).

How to assign an open licence to your work

If you use repositories such as Zenodo or Figshare the service asks you to assign a licence to your material as part of the upload process. Alternatively, you can download the appropriate text and HTML code for each licence from the Creative Commons website (http://creativecommons.org/choose/).

About the author

Penny Johnston is an archaeobotanist with an interest in digital data and preservation. She has her own blog (http://archbotarchive.blogspot.ie/) about her digital archiving practices/experiments, but this, like the archive, has languished somewhat over the past year or so because of time constraints. However, there is a lots of information there about archaeobotanical remains from Cork, and these are all disseminated online in accessible and open formats, using Creative Commons licences.

About this blog

This blog was established by environmental archaeologists working in Ireland (there are many of us, working in third level institutions, in companies and operating as sole traders). We set up the blog because we think what we do is fascinating, and we want to share it with a wider audience!

Environmental archaeology is the study of human-environment interactions through the scientific investigation of ancient remains. The remains often derive from archaeological excavations. Environmental archaeologists analyse a broad variety of material, including remains of plants, wood, animals, insects and many other types of material. These analyses reveal what people ate in the past, how they organised their economies, and how people interacted with their local environments and wider landscapes.

You have an opportunity to ask us questions via the comments section. We hope you follow this blog and enjoy it.