Processing raster NMP tiles

One of the most important datasets with which we are working on this project is the English Heritage (EH) administered National Mapping Programme (NMP), which has been working all across England to map archaeological features seen in aerial photographs. This project has been ongoing for about 20 years and, as a result, the methodologies used by those undertaking the work has evolved over time. More recent NMP projects were undertaken in CAD software, with the result that the drawings produced exist as vector data, which is very flexible for GIS mapping purposes.

However, I have been thinking over the last few days about how to deal with the older areas of the NMP that exist as scanned raster images of line drawings. This raster NMP data is less flexible, in that raster imagery does not scale well when looking at broader regional patterns (when compared to vector data) and it is difficult to display any background data behind raster tiles (making the white areas invisible tends to make the black lines fade, as obviously does making the layer partly transparent). I have also been talking recently via email with Helen Wickstead about some of her ideas for examining the topological relations in field system layouts and this type of analysis would be impossible with raster scanned maps.

Example of raster NMP tiles

As a result, I decided to try to convert the raster NMP tiles received from EH to date into vector data. I came up with quite a simple system to do so, after a bit of trial and error. First, we reclassify the data in the raster tile so that the white areas have a value of ‘NoData’ and the black areas (i.e. the drawings) retain a value of 1 (for most tiles, the black areas already have a value of 1 and the white of 0, but this is reversed for most of Kent and some tiles in Lincolnshire). The next stage is then to use the Raster to Polygons tool to convert this result into a polygonal vector layer. Ideally we would want to convert them to lines rather than polygons, but they are too ‘thick’ for this to work (more on which later).

The result includes a large amount of small polygons which add little to the overall picture other than making the layer render on screen slowly. If we assume that these original drawings, which were done at a scale of 1:10,000, were undertaken with a minimum pen width of 0.1mm, then any polygon smaller than 3.14m² would be smaller than the head of a pen and, as such, unlikely to be deliberately plotted. Therefore, we then calculate the area of the polygons in the layer, select those with an area of greater than 3m² and export that as our final result. For the datasets processed to date, these small polygons amounted to about 20% of the total, which have been removed to no obvious visual effect.

Obviously, this is a lot of work to undertake when dealing with large sets of 5 x 5 km tiles for large swathes of England. As such, I experimented with the Model Builder in ArcGIS to see if I could automate the process. The process had to be split into two stages, due to the software getting confused by the new file names. Stage 1 iterates through a folder of raster layers, reclassifies them and then converts them to polygons:

Stage 1 of the model to convert raster NMP tiles to vector data

For some reason, the Model Builder was not attaching projection information to the output layers, despite the fact that I had specified this under the tool’s instructions. The projection needs to be defined to accurately calculate the area of extent of the polygons. As such, Stage 2 iterates through the data output by Stage 1, defines the projection for each layer as British National Grid, and then calculates the area of extent of each polygon:

Stage 2 of the model to convert raster NMP tiles to vector data

We then merge all of the layers output into a new composite layer, use the Select by Attribute tool to select all of those polygons with an area of more than 3m² and export the results. Here we can see the vectorised result for the area of raster NMP shown above:

Vectorised version of raster NMP tiles

I am reasonably satisfied with this result, but there are a few things I would still like to do to tidy up the layer. That is to remove the grid marks (or grid lines in the case of some NMP projects, e.g. Dartmoor and most of Kent) and, ideally, to remove the pieces of text on some of the images (particularly Dartmoor). I cannot (yet) think of an automated way to deal with the latter, but removing the grid marks / lines can be automated, albeit with a somewhat imperfect result.

Removing grid marks: input data

We begin with a layer of grid lines that fall on each of the 1000m points east and north of the origin of the British National Grid, created using the Geospatial Modelling Environment software:

Removing grid marks: 1000m grid lines

We then create buffers 5m either side of these grid lines (i.e of 10m width). This covers most of each grid mark / line (for areas of raster NMP that include drawn grid lines, the next two steps in the process are not needed, we would skip straight to the erasing):

Finally, we use the Erase tool in ArcGIS to remove the areas of the vectorised raster NMP layer covered by the grid mark (or line) mask (at this stage, I also clipped the layers to the coastline for coastal areas):

Removing grid marks: the result after erasing polygons under cross mark mask

The result is clearly imperfect, but does look cleaner than the original result. For some areas, bits of grid mark or line still appear within the resulting layer, where they slipped outside the mask, but these can be cleaned up manually where necessary. The main problem with this automated approach is that it can cut out small areas of the drawings that relate to archaeological data where they overlap grid marks / lines, but these should be fairly minimal, and largely invisible at broader scales of rendering.

Vectorised version of raster NMP tiles with grid marks removed

By way of comparison, here is an area where vector and raster NMP data meet:

We can see that the vector NMP areas have more attribute data attached to them, so that we can plot this data in different colours according to the morphology of the feature (i.e. bank, ditch, etc.), but the vectorised raster NMP areas are now easier to plot alongside the vector areas, especially on these type of broader landscape scales.

I am quite pleased with this result, and it certainly helps greatly with the plotting of the areas of the NMP which exist as raster data, but it does not help greatly with the previously suggested idea of examining the topological relationships within field systems. I think that there may be a way to produce a line (rather than polygon) result by processing the original data using the Thin algorithm in GRASS GIS, but this would need care and is probably only feasible for quite restricted areas. I shall return to this idea at a later date!