Extracting trends (VI) and national synthesis update

This post follows on from my previous posts on trend surface modelling (I)(II)(III)(IV)(V) and my posts on synthesis of multiple datasets using grid squares (I)(II)(III)(IV).

As our HER dataset is now nearly complete (only Merseyside is expected from now on; North Somerset and Bath & North East Somerset are unable to provide data), we are finally able to begin attempting to study the data which we have gathered on a nationwide scale. Broad period classifications (Prehistoric; Bronze Age; Iron Age; Roman; early medieval; uncertain; “bad date” [i.e. outside our period]) were calculated for the HER data using a script (based upon the multitude of period designations applied by individual HERs or upon start / end dates) and the data was converted to shapefile format and merged into a single point layer. This shapefile layer can then be very coarsely queried to produce distributions of records of different periods.

As an initial method for investigating this mass of data (around 400,000 records), I experimented with the production of a few trend surfaces. First, one for all of the data received:

Trend surface for all EngLaId HER data

I think that there are two major factors at play in this trend. The first is the general bias in English archaeology towards greater density of (probably) settlement and (certainly) fieldwork in the south and east of the country. The second (possibly more dominant?) is the variation in recording methods used across the country. Even where the same software is used, different HERs catalogue their data somewhat differently: some like to split everything up into individual periods and types, others like to collate into multi-period sites; some cast their nets wide to include as much data as possible (e.g. PAS data, MORPH data), others like to only include sites of certain and clear provenance. This means that the density of data across the country is as much about modern practice as it is about activity in the ancient past.

We can then produce similar surfaces for our broad periods (all to the same numerical scale):

Trend surface for Bronze Age HER dataTrend surface for Iron Age HER dataTrend surface for Roman HER dataTrend surface for early medieval HER data

These four surfaces still reflect to some extent the differences seen in modern practice, but they are closer to the genuine distribution of past activity. The Bronze Age surface seems to be biased towards uplands and towards Wessex. The Iron Age surface has a clear bias towards the south east. The Roman surface is biased towards lowland Britain but also towards the pockets of military activity in the north of England. The early medieval surface is biased towards the eastern parts of England.

However, the distributions behind all of these trends are still heavily influenced by modern archaeological and CRM practices. This is only going to get worse when we begin to produce duplication in our dataset by building in English Heritage NRHE data and other datasets. As discussed in previous posts, one way in which to minimise these modern effects and reduce the influence of duplication is to collate data by 1 by 1 km grid cells. This requires the application of a thesaurus containing simplified monument terms and the step already undertaken of assigning standardised period terms. The result is a tessellation of 1 x 1 km grid squares across England recording the presence of different types of archaeological site for each of our broad periods, which we can then query and use to produce maps.

As an example, I constructed a few more trend surfaces, based upon the presence / absence of evidence for sites within our broad “domestic and civil” category. This category includes: town, burh, civitas capital, colonia, hamlet, village, vicus, canabae legionis, oppidum, hillfort, anything defined using the word “settlement”, midden, timber platform (several of these sub-types belong to more than one broad category). We can then look at how the underlying trends behind this category changed over time (these trend surfaces are logistic rather than linear, reflecting the probability of binary presence / absence relationship rather than density):

There is still some bias in these trend surfaces from the amount of data recorded by different modern archaeological entities (e.g. Northamptonshire is a very “completist” HER, which partially accounts for it showing up so strongly in many of the trend surfaces seen in this blog post), but the patterns are still quite interesting. The Bronze Age is heavily influenced by the very high number of records present on Dartmoor and Bodmin Moor. The Iron Age is probably mostly interesting for the low probability area across the “waist” of England from Cheshire to Lincolnshire. The Roman is pretty much how I would expect it: high likelihood in the lowland zone and around Hadrian’s Wall (this includes “native” sites [whatever that means!] of Roman period date). The early medieval is fairly flat, showing settlement across the country with greatest probability in central and eastern England (the peaks in Devon possibly need further investigation).

All of this is just a very preliminary, very coarse analysis of what is a very large and detailed set of data. Some interesting patterns are beginning to emerge, but these may diminish as we continue to work on our material.