Regionality & complexity

This post follows on in part from a post I wrote a couple of years ago on regionality. It will also begin with an apology: the maps presented here will be very difficult for colour blind readers to understand, for which I am sorry. Unfortunately, the technique involved is somewhat limited in terms of control of colour (as it requires three colour channels), so it is not possible (or at least very difficult) to improve the maps to make them more legible for colour blind readers. As such, I would not propose publishing these particular visualisations in any formal setting, but hopefully I can get away with it in a blog post!

Before we get to the maps themselves, I shall describe briefly the mapping technique involved, which is partly inspired by the work of a former colleague of mine at the University of Leicester, Martin Sterry (departmental webpage; academia.edu). Essentially, this method can be used to describe the relationship between three different spatial variables that can be mapped as density surfaces. First, we create density surfaces (KDE here) for each variable and then we combine them into an RGB image using the Composite Bands tool in ArcGIS, with the first layer forming the red channel, the second layer forming the green channel, and the third layer forming the blue channel. However, RGB images (so-called “additive colours”, which work from black by adding light in the red, green, and blue channels), can be rather dark / muddy, so I then converted the images (using “Invert” in Photoshop) to CMY images instead (so-called “subtractive colours” where one works from white by subtracting light in the cyan, magenta, and yellow channels: this is how colour printers work). To do so cleanly, one must set up one’s map document so that anything one wishes to be white in the final image is black in the map document and vice versa. The same applies to greys, which must be set to their inverse (e.g. a 30R 30G 30B grey as seen below for Wales / Scotland / Man should be set to 225R 225G 225B, being 255-30 in each case). This may sound somewhat complicated but the end result is as follows:

Magenta tones represent high values in Channel 2, e.g. “enclosed farmsteads” in the first example below.

Yellow tones represent high values in Channel 3, e.g. “unenclosed farmsteads” in the first example below.

Blue tones represent high values in Channels 1 and 2.

Red tones represent high values in Channels 2 and 3.

Green tones represent high values in Channels 1 and 3.

Dark grey / black tones represent high values in all three Channels.

White or pale tones represent low values in all three Channels.

Here is a close up of the colour category zones for the first two examples below:

I began by examining the three main categories of Roman farmstead defined by the Roman Rural Settlement Project (RRSP) at Reading, using their excellent data that is available online (Allen et al. 2015). As they defined only three specific categories, this is an ideal dataset to map in this way. For a first attempt, I made three KDE layers using a 10km kernel (or search window) to structure the size of the clusters in the resulting output, then combined them as described above. When plotted against the regions defined based upon variation in their data by the RRSP team (Smith et al. 2016: Chapter 1), we can see that there is a degree of agreement between the regions and the clustering of particular colours:

However, there is also clearly considerably more complexity to the data than a simple regional classification might suggest (as the RRSP team would certainly acknowledge, so this is not intended as a criticism in any way). If we construct a new model using a wider kernel (in this case 50km), we can get a really nice sense of regional variation in the data without the need to draw lines on a map:

There is some interesting structure in this model. For example, one can see a focus on enclosed farmsteads in the north and west, so-called complex farmsteads in parts of the southern and eastern midlands (largely alongside enclosed farmsteads), with quite a different focus on enclosed and unenclosed farmsteads in the south east. The strong peak in enclosed farmsteads in south Yorkshire / the north midlands is also quite striking. Although it relies too much on good colour vision in a reader, I think this model and technique works quite well here, so I decided to apply it to another dataset: our own.

Before we get to the next stage, here is a close-up of the colour category zones for the next two maps (with RO = Roman; PR = Prehistoric; EM = early medieval):

Based on another technique which we published recently (Green et al. 2017), the following two maps are created from a measure of the “complexity” of our datasets: specifically the number of different types of site / monument (based upon our thesaurus of types; see Portal to the Past) per 1x1km square. This measure was calculated for each square for each time period in our database and then density surfaces created for each time period (using a 5km kernel in this instance). A shortcoming of the mapping technique comes into play here: it can only map three categories at once. As such, we had to combine the Bronze Age and Iron Age models into a composite model for later prehistory. The three time period based complexity models were then combined into a single image as previously:

There are various nice patterns in this dataset, including the clear strength of prehistory and the early medieval in the south western peninsula, the intense focus on major river valleys (partly due to the large gravel quarry excavations in those areas), and the appearance of Roman roads highlighted in magenta. The Roman period also looks quite dominant generally, with lots of pinks, blues, and reds visible on the map. There is also a very clear difference in intensity between eastern / southern England and northern / western England.

It is possible to lessen the effects of regional and period based variation, by constructing a series of larger kernel density surfaces and using these to “correct” for regional variation in the period based models. This produces a new model which reflects complexity on a more local scale. Essentially, the first model can be thought of as a model of “globally” scaled (by which I mean the whole of the dataset, not the whole of the planet) complexity and the new model can be thought of as a model of locally scaled complexity:

This model also shows some interesting patterns. It is much less dominated by single periods in particular regions, with Roman dominance mostly along the Roman roads and Hadrian’s Wall. There are also some nice dark areas, which show high levels of local complexity across all three time periods. These cluster mostly along rivers again or around the large Roman towns, along with a similar cluster in southern Yorkshire / the north Midlands to that seen in the RRSP data.

As with all models of English archaeology, the images presented here represent a very complex data history, being influenced by both where more (and more visible archaeologically) activity took place in the past and where more modern archaeological activity takes place in the present (largely driven by development). They also, as previously noted, come with considerable caveats in regards to legibility, due to the relatively large minority of people with restricted colour vision (c.8-10% of men, and maybe 1% of women). The technique is also restricted by its inability to map more than three variables, but more than three variables would probably overcomplicate matters even if it were possible. However, I hope that this post gives a sense of the variation and complexity in the English archaeological record, locally, regionally, and nationally.

EngLaId is now winding down, having officially ended at Christmas, so this will probably be the last substantive post on technique or data for a while. We will however announce here when any new publications come out, including our main books.