Lesson 4

Lesson 4. Spatial is not special: how to manage the locations data in a spatial database: PostGIS

A wildlife tracking data management system must include the capability to explicitly deal with the spatial component of movement data. GPS tracking data are sets of spatio-temporal objects (locations) and the spatial component must be properly handled. You will now extend the database adding spatial functionalities through the PostgreSQL spatial extension called PostGIS. PostGIS introduces the spatial data types (both vector and raster) and a large set of SQL spatial functions and tools, including spatial indexes. This possibility essentially allows you to build a GIS using the existing capabilities of relational databases. In this lesson, you will implement a system that automatically transforms the GPS coordinates generated by GPS sensors from a pair of numbers into spatial objects.

Topic 1. Transforming GPS Coordinates into a Spatial Object

Introduction

At the moment, your data are stored in the database and the GPS positions are linked to individuals. While time is properly managed, coordinates are still just two decimal numbers (longitude and latitude) and not spatial objects. It is therefore not possible to find the distance between two points, or the length of a trajectory, or the speed and angle of the step between two locations. In this chapter, you will learn how to add a spatial extension to your database and transform the coordinates into a spatial element (i.e. a point).
Until few years ago, the spatial information produced by GPS sensors was managed and analyzed using dedicated software (GIS) in file-based data formats (e.g. shapefile). Nowadays, the most advanced approaches in data management consider the spatial component of objects (e.g. a moving animal) as one of its many attributes: thus, while understanding the spatial nature of your data is essential to proper analysis, from a software perspective spatial is (less and less) not special. Spatial databases are the technical tool needed to implement this perspective. They integrate spatial data types (vector and raster) together with standard data types that store the objects' other (non-spatial) associated attributes. Spatial data types can be manipulated by SQL through additional commands and functions for the spatial domain. This possibility essentially allows you to build a GIS using the existing capabilities of relational databases. Moreover, while dedicated GIS software is usually focused on analyses and data visualization, providing a rich set of spatial operations, few are optimized for managing large spatial data sets (in particular, vector data) and complex data structures. Spatial databases, in turn, allow both advanced management and spatial operations that can be efficiently undertaken on a large set of elements. This combination of features is becoming essential, as with animal movement data sets the challenge is now on the extraction of synthetic information from very large data sets rather than on the extrapolation of new information (e.g. kernel home ranges from VHF data) from limited data sets with complex algorithms.

Example

The first step to do in order to spatially enable your database is to load the PostGIS extension, which can easily done with the following SQL command (many other extensions exist for PostgreSQL):

CREATE EXTENSION postgis;

Now you can use and exploit all the features offered by PostGIS in your database. The vector objects (points, lines, and polygons) are stored in a specific field of your tables as spatial data types. This field contains the structured list of vertexes, i.e. coordinates of the spatial object, and also includes its reference system. The PostGIS spatial (vectors) data types are not topological, although, if needed, PostGIS has a dedicated topological extension.

With PostGIS activated, you can create a field with geometry data type in your table (2D point feature with longitude/latitude WGS84 as reference system):

You can now populate it (excluding points that have no latitude/longitude):

UPDATE
main.gps_data_animals
SET
geom = ST_SetSRID(ST_MakePoint(longitude, latitude),4326)
WHERE
latitude IS NOT NULL AND longitude IS NOT NULL;

At this point, it is important to visualize the spatial content of your tables. PostgreSQL/PostGIS offers no tool for spatial data visualization, but this can be done by a number of client applications, in particular GIS desktop software like ESRI ArcGIS 10.x or QGIS. QGIS is a powerful and complete open source software. It offers all the functions needed to deal with spatial data. QGIS is the suggested GIS interface because it has many tools specifically for managing and visualizing PostGIS data. Especially remarkable is the tool DB Manager.
Now you can explore the GPS positions data set in QGIS (see figure below). The example is a view zoomed in on the study area rather than all points, because some outliers are located very far from Monte Bondone, affecting the default visualization. In the background you have the Google satellite layer loaded using the OpenLayer Plugin.

You can also use ArcGIS ESRI 10.x to visualize (but not natively edit, at least at the time of writing this text) your spatial data. Data can be accessed using “Query layers”. A query layer is a layer or stand-alone table that is defined by a SQL query. Query layers allow both spatial and non-spatial information stored in a (spatial) DBMS to be integrated into GIS projects within ArcMap. When working in ArcMap, you create query layers by defining a SQL query. The query is then run against the tables and views in a database, and the result set is added to ArcMap. Query layers behave like any other feature layer or stand-alone table, so they can be used to display data, used as input into a geoprocessing tool, or accessed using developer APIs. The query is executed every time the layer is displayed or used in ArcMap. This allows the latest information to be visible without making a copy or snapshot of the data and is especially useful when working with dynamic information that is frequently changing.

Exercise

Find the distance of all locations of animal 2 to the point (11.0620855; 45.9878812)

Topic 2. Automating the Creation of Points from GPS Coordinates

Introduction

Working with massive data sets (i.e. many sensors at the same time) in near real time requires that routinely operations are done automatically to save time and to avoid errors of manual processing. Here you create a new function to update the geometry field as soon as a new record is uploaded.

Example

You can automate the population of the geometry column so that whenever a new GPS position is updated in the table main.gps_data_animals, the spatial geometry is also created. To do so, you need a trigger and its related function. Here is the SQL code to generate the function:

COMMENT ON FUNCTION tools.new_gps_data_animals()
IS 'When called by a trigger (insert_gps_locations) this function populates the field geom using the values from longitude and latitude fields.';

And here is the SQL code to generate the trigger:

CREATE TRIGGER insert_gps_location
BEFORE INSERT
ON main.gps_data_animals
FOR EACH ROW
EXECUTE PROCEDURE tools.new_gps_data_animals();

You can see the result by deleting all the records from the main.gps_data_animals table, e.g. for animal 2, and reloading them. As you have set an automatic procedure to synchronize main.gps_data_animals table with the information contained in the table main.gps_sensors_animals (supplementary code in Lesson 3), you can drop the animal 2 record from main.gps_sensors_animals and this will affect main.gps_data_animals in a cascade effect (note that it will not affect the original data in main.gps_data):

DELETE FROM
main.gps_sensors_animals
WHERE
animals_id = 2;

There are now no rows for animal 2 in the table main.gps_data_animals. You can verify this by retrieving the number of locations per animal:

SELECT
animals_id, count(animals_id)
FROM
main.gps_data_animals
GROUP BY
animals_id
ORDER BY
animals_id;

The result should be:

animals_id

count

1

2114

3

2106

4

2869

5

2924

Note that animal 2 is not in the list. Now you reload the record in the main.gps_sensors_animals:

You can see that records have been re-added to main.gps_data_animals by reloading the original data stored in main.gps_data, with the geometry field correctly and automatically populated (when longitude and latitude are not null):

SELECT
animals_id, count(animals_id) AS num_records, count(geom) AS num_records_valid
FROM
main.gps_data_animals
GROUP BY
animals_id
ORDER BY
animals_id;

The result is

animals_id

num_records

num_records_valid

1

2114

1650

2

2624

2196

3

2106

1828

4

2869

2642

5

2924

2696

You can now play around with your spatial data set. For example, when you have a number of locations per animal, you can find the centroid of the area covered by the locations:

SELECT
animals_id,
ST_AsEWKT(
ST_Centroid(
ST_Collect(geom))) AS centroid
FROM
main.gps_data_animals
WHERE
geom IS NOT NULL
GROUP BY
animals_id
ORDER BY
animals_id;

The result is

animals_id

centroid

1

SRID=4326;POINT(11.056405072 46.0065913348485)

2

SRID=4326;POINT(11.0388902698087 46.0118316898451)

3

SRID=4326;POINT(11.062054399453 46.0229784057986)

4

SRID=4326;POINT(11.0215063307722 46.0046905791446)

5

SRID=4326;POINT(11.0287071960312 46.0085975505935)

In this case you used the SQL command ST_Collect. This function returns a GEOMETRYCOLLECTION or a MULTI object from a set of geometries. The collect function is an 'aggregate' function in the terminology of PostgreSQL. That means that it operates on rows of data, in the same way the sum() and mean() functions do. ST_Collect and ST_Union are often interchangeable. ST_Collect is in general orders of magnitude faster than ST_Union because it does not try to dissolve boundaries. It merely rolls up single geometries into MULTI and MULTI or mixed geometry types into Geometry Collections. The contrary of ST_Collect is ST_Dump, which is a set-returning function.

Exercise

Find the centroids all all the points for each month for animal 2

Topic 3. Creating Spatial Database Views

Views are queries permanently stored in the database. For users (and client applications), they work like normal tables, but their data are calculated at query time and not physically stored. Changing the data in a table alters the data shown in subsequent invocations of related views. Views are useful because they can represent a subset of the data contained in a table; can join and simplify multiple tables into a single virtual table; take very little space to store, as the database contains only the definition of a view (i.e. the SQL query), not a copy of all the data it presents; and provide extra security, limiting the degree of exposure of tables to the outer world. On the other hand, a view might take some time to return its data content. For complex computations that are often used, it is more convenient to store the information in a permanent table.

Example

You can create views where derived information is (virtually) stored. First, create a new schema where all the analysis can be accommodated and kept separated from the basic data:

Although the best way to visualize this view is in a GIS environment (in QGIS you might need to explicitly define the unique identifier of the view, i.e. gps_data_animals_id), you can query its non-spatial content with

SELECT
gps_data_animals_id AS id,
name AS animal,
time_utc,
sex,
age_class_description AS age,
species_description AS species
FROM
analysis.view_gps_locations
LIMIT 5;

The result is something similar to

id

animal

time_utc

sex

age

species

62

Agostino

2005-03-20 16:03:14

m

adult

roe deer

64

Agostino

2005-03-21 00:03:06

m

adult

roe deer

65

Agostino

2005-03-21 04:01:45

m

adult

roe deer

67

Agostino

2005-03-21 12:02:19

m

adult

roe deer

68

Agostino

2005-03-21 16:01:12

m

adult

roe deer

Now you create view with a different representation of your data sets. In this case you derive a trajectory from GPS points. You have to order locations per animal and per acquisition time; then you can group them (animal by animal) in a trajectory (stored as a view):

CREATE VIEW analysis.view_trajectories AS
SELECT
animals_id,
ST_MakeLine(geom)::geometry(LineString,4326) AS geom
FROM
(SELECT animals_id, geom, acquisition_time
FROM main.gps_data_animals
WHERE geom IS NOT NULL
ORDER BY
animals_id, acquisition_time) AS sel_subquery
GROUP BY
animals_id;

In the figure below you can see analysis.view_trajectories visualized in QGIS.

Lastly, create one more view to spatially summarize the GPS data set using convex hull polygons (or minimum convex polygons):

CREATE VIEW analysis.view_convex_hulls AS
SELECT
animals_id,
(ST_ConvexHull(ST_Collect(geom)))::geometry(Polygon,4326) AS geom
FROM
main.gps_data_animals
WHERE
geom IS NOT NULL
GROUP BY
animals_id
ORDER BY
animals_id

The result is represented in the figure below, where you can clearly see the effect of the outliers located far from the study area.

This last view is correct only if the GPS positions are located in a relatively small area (e.g. less than 50 kilometers) because the minimum convex polygon of points in geographic coordinates cannot be calculated assuming that coordinates are related to Euclidean space. At the moment the function ST_ConvexHull does not support the GEOGRAPHY data type, so the correct way to proceed would be to project the GPS locations in a proper reference system, calculate the minimum convex polygon and then convert the result back to geographic coordinates. In the example, the error is negligible.

Exercise

Create a view of all the points of female animals, visualize it in QGIS and export as shapefile

Supplementary code: UTM zone of a given point in geographic coordinates

Here you create a simple function to automatically find the UTM zone at defined coordinates:

COMMENT ON FUNCTION tools.srid_utm(double precision, double precision)
IS 'Function that returns the SRID code of the UTM zone where a point (in geographic coordinates) is located. For polygons or line, it can be used giving ST_x(ST_Centroid(the_geom)) and ST_y(ST_Centroid(the_geom)) as parameters. This function is typically used be used with ST_Transform to project elements with no prior knowledge of their position.';

Here an example to see the SRID of the UTM zone of the point at coordinates (11.001,46.001):

SELECT TOOLS.SRID_UTM(11.001,46.001) AS UTM_zone;

The result is 32632 that corresponds to UTM 32 N WGS84.

You can use this function to project points when you do not know the UTM zone. You can test this functionality with the following code: