Geocoding of Laboratory Data

This is a general outline of what I use and how I have geocoded laboratory results. I have a paper in press in the Annals of Clinical Biochemistry on the geocoding of vitamin D

The main computer I use is a Mac Pro with 2 x 2.8 Quad core Intel Xeon processors. Inside its expanded a little for speed and has a 256 GB solid state drive as a boot drive and then 2 x 1 TB drives in a Striped Raid setting. This isn’t strictly necessary but I’ve found that having a decent amount of computing power is recommended when working with big data sets and getting decent performance on SQL queries. I’m sure windows computers can cope but why not start a movement for more flexible computing in the NHS and ask for a Mac!

(ps I was hoping there would be a new Mac Pro in the WWDC 2012 announcements but no joy).

I also use an iMac so you don’t have to go for the “truck”.

1st Get Your Data

The 1st step in the process is to get the data from the LIMS system. Here at Heartlands we use a telepath system (original flavour!) we have two methods for extracting data from the system.

List generation routines

A full SQL map

The key point with this step is that your lims system should have a link to the HISS/PAS as this allows you to get some more details about the patient from the linked database. They key piece of information we require is the patient postcode.

We are aiming for data in a simple .csv (comma separated value) format along the lines of:

You will also need a MySQL graphical interface if you aren’t keen on using the command line. There is one freely available from MySQL itself http://www.mysql.com/downloads/workbench/ or there will be a number of 3rd party ones also available. I use Razor SQL http://www.razorsql.com largely because it also connects to other SQL databases including Cache which is the SQL map into our telepath system.

It should be noted that on occasion it is necessary to resort to the command line for some queries (especially the ones that take a long time or return big datasets as they can cause out of memory errors for the graphical systems)

In most systems importing csv data into the MySQL database is simple as there are wizards which help the process. Heres a short video of my doing it with some sample data.

Part 2 starting the Geocoding Process

Now that we have the data in the data base we need to start the geocoding process.

Georeferencing can be achieved in a number of ways, the most common of which are post-code referencing and area referencing.

Area Referencing

An area reference code is a code that identifies an observation with a defined geographical area(8). Super Output Areas (SOA) are a geography for the collection and publication of small area statistics. The SOA layers form a hierarchy based on aggregations of Output Areas (OAs). They are better for statistical comparison as they are of much more consistent size and each layer has a specified minimum population to avoid the risk of data disclosure. SOAs are not be subject to frequent boundary change and therefore allow statistical comparison over time. In addition, they build on the existing availability of data for OAs. Currently there are:

175,434 OAs in England and Wales (165,665 in England; 9,769 in Wales).

34,378 Lower Layer SOAs (32,482 in England; 1,896 in Wales).

7,193 Middle Layer SOAs (6,780 in England; 413 in Wales).

SOAs are intended as a purely statistical geography. They are based on the boundaries of the post-code based Output Area geography from which they are constructed. OAs are based on post-codes as at Census Day (29 April 2001), and are constrained by the 2003 statistical ward & parish boundaries(9) Post-codes

The post-code is a widely used and understood geo-reference and has a number of advantages because of this. It is freely available and is critical part of any patient record. There are a number of disadvantages to post-codes in that post-codes were designed to allow the efficient delivery of mail, a post-code can straddle two or more other geographies and can change or be removed. In addition, their use could have potential privacy implications as they are related to individual or groups of houses and locations.

What we need to do is convert the Postcode data we have to LSOA or similar area referencing.

Fortunately the NHS connecting for health project has a database which can convert Postcodes to this standard geography. This data is available from http://nww.connectingforhealth.nhs.uk/ods/downloads/officenatstats (Note this is an nww link and therefore you wont be able to connect to this site unless you’re connected to the N3 network i.e usually a computer connected to the NHS network). On the day I looked at this site, the file you need is:

gridall.zip, Complete Gridlink NHS Postcode File, weighing in at 39Mb

This data is supplied as a CSV file and again you will need to download and add to your MySQL database as previously. When you’ve done this, you should have something along the lines of:

The meaning of these fields is explained in the accompanying documentation but, the two we are interested in are:

UnitPostCodeVar and LSOA

I’ve found I get better matches if I use the UnitPostCodeVar field rather than the UnitPostCode field. I have no idea why but its worth trying both to see which matches most.

The other tip I would give here if you are a database expert (or know one) is to get them to create some indexes for you. These indexes speed up the queries significantly. I index both postcode fields and LSOA

Linking to two datasets together.

To link the 2 data sets together (Our lab data and the Gridall data) we have to create a join query in MySQL. My preference is to get the query to create a new table at the same time.

“The Esri shapefile or simply a shapefile is a popular geospatial vector data format for geographic information systems software. It is developed and regulated by Esri as a (mostly) open specification for data interoperability among Esri and other software products.[1] Shapefiles spatially describe geometries: points, polylines, and polygons. These, for example, could represent water wells, rivers, and lakes, respectively. Each item may also have attributes that describe the items, such as the name or temperature”.

You can use a database system such as PostgreSQL (http://www.postgresql.org) which “understands” geography as its possible to load gospatial extensions. This is extremely powerful and allows you to do a number of calculations like how many x in a particular geographical area.

This will produce a shapefile with the geometry data and your laboratory data embedded in it.

Map Drawing

There are a number of map drawing packages on the market, both commercial and open source. I use the open source QGIS software http://www.qgis.org which is available for all platforms.

Once installed its also easy to use and extensible with a number of 3rd party plugins which allow further functionality.

This short video outlines the processes involved:

Key points are: 1. Make sure your coordinate reference system is set to do conversions on the fly 2. Make sure you use British National Grid as the CRS for the shapefile layer 3. Make sure you use the appropriate CRS for the mapping provider you use. 4. Use the Openlayers overview and openlayers plugins to get the physical maps whereby you can overlay your shape file data.

About the Author

I'm clinical lead for Biochemistry and Blood Sciences and I am responsible for R&D in the department of Biochemistry and Immunology and I have developed a number of projects utilising LC/MSMS, including, “LCMSMS methods for catecholamine metabolites”, “measurement of drugs of abuse”, “25-OH Vitamin D”, “8-Isoprostane”, “Vitamins A and E” and “Prednisolone and other steroids”

I have supervised 6 students for the research component of their MSc and have gained grants for the study of glycated proteins in diabetes, hepcidin measurements in haemochromatosis, lab to lab communication and e-learning.

Protection of Personal Information – Laboratory Medicine comply with the Trust Data Protection Policy and have procedures in place to allow the Directorate and it’s employees to comply with the Data Protection Act 1998 and associated best practice and guidance.

The Trust Laboratories at Heartlands Hospital, Good Hope Hospital and Solihull Hospital were awarded UKAS (United Kingdom Accreditation Service) accreditation to the internationally recognised ISO 15189 standard in May 2015. For a list of accredited tests and other information please visit the test database http://www.heftpathology.com/frontpage/test-database.html.Tests not appearing on this scope are either under consideration or in the process of accreditation and so currently remain outside of our scope of accreditation. However, these tests have been validated to the same high standard as accredited tests and are performed by the same trained and competent staff.