More on landscape morphology and synthesis of multiple datasets

During a small symposium we held last week on the subject of “scale”, I spoke briefly to Andrew Lowerre of English Heritage about my previous post on landscape morphology. He pointed me towards some of the more complex measures of landscape “bumpiness” than simply using average slope. Investigating further, I came across an apparent description of Riley’s Terrain Ruggedness Index (or TRI), which seemed like a good, relatively simple measure. This is fairly simple to calculate in ArcGIS if you were to follow the instructions taken from the webpage just linked. For England, using the OS OpenData 50m DEM, the result should look something like this:

“Riley’s” Terrain Ruggedness Index by 50m grid square

If you compare the values on this TRI map (between 0 and 18.9 metres) against the classification given on the webpage linked above, you would end up classifying all of England as “level”, which seems fairly ridiculous, but this is down to a few obvious factors. First, a 50m cell size is quite small (Riley et al. used 1000 by 1000 m cells), so changes in terrain are going to be relatively small as well: changes in elevation of greater than 900 vertical metres across 50 horizontal metres are presumably only seen in landscapes like the Alps or Yosemite. Second, the DEM was interpolated by the OS from their contour dataset, which inevitably will have resulted in some smoothing of landscape features. Third (and most importantly), I have investigated further since and discovered that the maths / process are not actually correct for Riley’s TRI: when I found the original article of Riley et al. (1999), they are in fact calculating their TRI differently (theirs being the sum of the absolute [i.e. removing any negative sign] difference between a cell and all of its eight neighbours, whereas this result is the square root of the difference between the lowest and highest neighbouring cell value). This is a good lesson to learn (that I should have learnt already) in not trusting random websites found as the result of a Google search!

However, visual examination suggests the result is somewhat robust on its own terms (when compared against the results from the implementation of TRI in GDAL [which looks somewhat similar on the map, but has a range of 0 to 155m], and which uses the mean difference between a central pixel and its surrounding cells based upon Wilson et al. 2007, i.e. Riley’s sum but divided by 8 to produce the mean). In other words, even if the numbers are wrong, the relationship between the value in two cells should still be approximately correct, e.g. a cell with a TRI of “8m” ought to be more rugged than a cell with a TRI of “4m” even if the value of their TRIs ought to be much higher. I am not at all satisfied with this long term, but it will suffice for the purposes of the rest of this exercise. The Wilson et al. / GDAL implementation of TRI seems the most sensible to use in the long run, probably:

GDAL Terrain Ruggedness Index by 50m grid square

I have also been experimenting more with our synthesis methodology. This is based around the idea of applying a thesaurus to each of our input datasets to simplify their terminologies and then collating results by grid square. We have been working at length on defining a thesaurus that strikes the right balance between simplicity and complexity, so that it is interpretively useful but also computationally feasible. Our current thesaurus, based in part on groupings in the EH NMR thesaurus, is as follows:

This is not a fixed list and we will undoubtedly continue to refine it over the coming months. The result of applying the thesaurus to a dataset is a new field containing a string of values based on the list above. The values are codes defining a particular period / type of site (sites can have multiple periods / types), e.g. RO04A would be a Roman villa site or IA07A would be an Iron Age hillfort. When we then apply a tessellation of grid squares across our input datasets, we can collate results using these codes to remove duplication, resulting in a map showing the presence of each type of site by grid square across England.

I have also been looking at different tessellations of grid squares. Until now, I had been using squares of 2 by 2 km dimension, as that seemed a sensible resolution for looking at patterns on the scale of England, but when applying this methodology in my latest tests, I have come to the decision that the resolution is too coarse. This is particularly compounded by the issue that when a data point falls on the intersection between four grid squares, it is registered as falling within all four squares, with the result that some sites have undue prominence in the synthesised dataset. I tried to find a way around this by using two overlapping tessellations with the origin of each cell in set B being at the central point of each cell in set A (as suggested to me at CAA by an audience member), but soon came to realise that merging these two datasets (accepting the presence of a site type only if it was in both overlapping cells) would simply replicate the result of applying a 1 by 1 km grid square tessellation layer.

Therefore, to decrease the coarseness of my grid square layer, I have now started using a tessellation of 1 by 1 km grid squares. The results are much more aesthetically pleasing. As a large number of our points will fall on the origins of these squares (where data locations are only known to the nearest kilometre square), I have also offset this new 1 by 1 km grid square layer by 500m east and 500m north. Therefore, these points that fall on the 1000m intervals will only be counted in a single grid square. This does mean that they might be slightly misplaced (by up to 500m east and 500m north) if they fell somewhere towards the northeastern corner of their kilometre square in actuality, but this is a very minor spatial misrepresentation on the scale of all of England.

As a test, I have run this synthesis methodology on the data received to date from English Heritage. This is the National Record of the Historic Environment (NRHE), which contains the former NMR records and records of sites found through the National Mapping Programme. Once processed, it consists of point, line and polygon shapefiles with associated attribute data. We currently have data for EH’s South West, South East, Eastern, and East Midlands regions. Here are some example results, with filled squares showing the presence of each period / type of site, bearing in mind that the empty swathes of the country to the north and west for each distribution are due to the lack of data for now (click on an image to zoom in):

Distribution of Bronze Age stone circles by 1km grid square based upon EH NRHE dataDistribution of Iron Age hillforts by 1km grid square based upon EH NRHE dataDistribution of Roman villas by 1km grid square based upon EH NRHE dataDistribution of early medieval inhumation cemeteries by 1km grid square based upon EH NRHE data

Once the NRHE dataset is complete and once we also bring in the HER data that we are currently gathering for all of England, the distributions plotted will be much more complete, but I think that the result is beginning to show some promise.

Taking this further, we can then compare these distributions statistically against the various measures of landscape morphology calculated for each grid square, being the mean elevation and the mean TRI (i.e. the incorrect version originally calculated, see above). Elevation has a maximum value of 903.8m and the TRI a maximum value of 9.95m (when aggregated out from 50m cells to 1000m cells). Here are the results for each of the distributions shown above (bear in mind these are the results for 1km by 1km squares recording the presence of one or more of each of these types):

As these distributions are incomplete, we should not read too much into these results, but some patterns do seem obvious. Hillforts (and stone circles) tend to be at higher elevations (logically) and on more rugged terrain, but have outliers right down to 0m elevation (these tend to fall along the coasts which are, thus, more likely to be edge effects than genuine 0m OSD hillforts, e.g. where a monument polygon overlaps a grid square with almost nothing but sea in it [I can partly correct for this by clipping my DEM to the coastline at a later date], but one is in northern Cambridgeshire [Stonea Camp]: should such a site really be defined as a “hill” fort?). By contrast, villas and early medieval cemeteries tend towards lower, flatter landscapes.

These are generally logical and fairly obvious results that we might expect to see without calculating any statistical measures, but it is still a useful exercise to run these analyses to try to confirm our intuitive assumptions and to attempt to discover any unusual cases that do not match what we might have expected (such as “hillforts” at sea level). When applied more extensively to more complete datasets for each of the thesaurus types defined above and for each period (where types exist in more than one period), we might discover some interesting patterns.

Obviously, this methodology remains a work in progress and I will continue to refine it over the coming months as more data comes in. This includes revising our thesaurus as research questions and the nature of our datasets becomes clearer and deciding on a better measure of terrain ruggedness (probably being the GDAL version).

10 thoughts on “More on landscape morphology and synthesis of multiple datasets”

A couple of comments:
I think the suggested classification bands for TRI in Riley, et al.’s original paper are intended only for their results in Montana, so they really would not be applicable in England!

Thanks for the references. I quite like TRI because it’s simple (relatively) and it seems to make intuitive sense to average it out to coarser resolutions, which I’m not sure would be the case with some of the other measures. I’ll read up on them though and have a think!

And, yes, my Riley TRI now comes out as between 0 and 1237 metres, which is closer to the Riley et al. classification, but still sufficiently smaller to make an English bespoke classification logical rather than just accepting one for the USA.

To understand the meaning of a difference between cells, like TRI, or to imagine terrain, one has to take into account a size of cell, what makes such measures useless for comparing DEMs with different resolutions. Another issue is that many DEMs use the degree as a unit so N-S and E-W sizes are different. Slope is a measure which takes into account cell size and microDEM takes into account differences in sizes for different directions. microDEM has 12 algorithms for slope and very good Help about them:http://www.usna.edu/Users/oceano/pguth/website/microdem/microdem.htm

I have worked on this a bit further and now discovered a way to create Riley TRI and Wilson TRI maps in ArcGIS without venturing near the command line. Essentially, you first use the Shift tool to create 8 new raster grids shifted a distance equivalent to one pixel in each of the eight cardinal directions, i.e. for a 50m resolution DEM, you would create 8 grids as follows (x,y): nw (-50, +50); nc (0, +50); ne (+50, +50); wc (-50, 0); ec (+50, 0); sw (-50, -50); sc (0, -50); se (+50, -50). To calculate the Riley TRI (i.e. the sum of the absolute difference between a cell and each of its eight neighbours), you then use the following equation in the Raster Calculator (where “dem” is your DEM layer): Abs(“dem”-“se”) + Abs(“dem”-“sc”) + Abs(“dem”-“sw”) + Abs(“dem”-“ec”) + Abs(“dem”-“wc”) + Abs(“dem”-“ne”) + Abs(“dem”-“nc”) + Abs(“dem”-“nw”). In their original paper, Riley et al. squared all of the entries and then square rooted the result (in the erratum), but this was just to remove any negative signs, so Abs does the same thing. To obtain the Wilson TRI, we then return to the Raster Calculator and divide the Riley TRI by 8 (i.e. to produce the mean difference between a cell and each of its eight neighbours). I did this using my OS OpenData DEM and the result appears to be identical (if we ignore the decimal points) to that produced by the TRI command in GDAL, proving both that I have worked out how these two TRIs should be calculated and that this ArcGIS methodology works!