Many of my previous posts in this series [1][2][3] have relied on PostGIS for trajectory data handling. While I love PostGIS, it feels like overkill to require a database to analyze smaller movement datasets. Wouldn’t it be great to have a pure Python solution?

If we look into moving object data literature, beyond the “trajectories are points with timestamps” perspective, which is common in GIS, we also encounter the “trajectories are time series with coordinates” perspective. I don’t know about you, but if I hear “time series” and Python, I think Pandas! In the Python Data Science Handbook, Jake VanderPlas writes:

Pandas was developed in the context of financial modeling, so as you might expect, it contains a fairly extensive set of tools for working with dates, times, and time-indexed data.

Of course, time series are one thing, but spatial data handling is another. Lucky for us, this is where GeoPandas comes in. GeoPandas has been around for a while and version 0.4 has been released in June 2018. So far, I haven’t found examples that use GeoPandas to manage movement data, so I’ve set out to give it a shot. My trajectory class uses a GeoDataFrame df for data storage. For visualization purposes, it can be converted to a LineString:

Of course, this class can be used in stand-alone Python scripts, but it can also be used in QGIS. The following script takes data from a QGIS point layer, creates a GeoDataFrame, and finally generates trajectories. These trajectories can then be added to the map as a line layer.

All we need to do to ensure that our data is ordered by time is to set the GeoDataFrame’s index to the time field. From then on, Pandas takes care of the time series aspects and we can access the index as shown in the Trajectory.get_position_at() function above.

The following screenshot shows this script applied to a sample of the Geolife datasets containing 100 trajectories with a total of 236,776 points. On my notebook, the runtime is approx. 20 seconds.

So far, GeoPandas has proven to be a convenient way to handle time series with coordinates. Trying to implement some trajectory analysis tools will show if it is indeed a promising data structure for trajectories.

Like this:

If you follow me on Twitter, you have probably already heard that the ebook of “QGIS Map Design 2nd Edition” has now been published and we are expecting the print version to be up for sale later this month. Gretchen Peterson and I – together with our editor Gary Sherman (yes, that Gary Sherman!) – have been working hard to provide you with tons of new and improved map design workflows and many many completely new maps. By Gretchen’s count, this edition contains 23 new maps, so it’s very hard to pick a favorite!

Like the 1st edition, we provide increasingly advanced recipes in three chapters, each focusing on either layer styling, labeling, or creating print layouts. If I had to pick a favorite, I’d have to go with “Mastering Rotated Maps”, one of the advanced recipes in the print layouts chapter. It looks deceptively simple but it combines a variety of great QGIS features and clever ideas to design a map that provides information on multiple levels of detail. Besides the name inspiring rotated map items, this design combines

map overviews

map themes

graduated lines and polygons

a rotated north arrow

fancy leader lines

all in one:

“QGIS Map Design 2nd Edition” provides how-to instructions, as well as data and project files for each recipe. So you can jump right into it and work with the provided materials or apply the techniques to your own data.

Today we’re going to look at how to visualize the error bounds of a GPS trace in time. The goal is to do an in-depth visual exploration using QGIS and Time Manager in order to learn more about the data we have.

The Data

We have a file that contains GPS locations of an object in time, which has been created by a GPS tracker. The tracker also keeps track of the error covariance matrix for each point in time, that is, what confidence it has in the measurements it gives. Here is what the file looks like:

Error Covariance Matrix

What are those sd* fields? According to the manual: The estimated standard deviations of the solution assuming a priori error model and error parameters by the positioning options. What it basically means is that the real GPS location will be located no further than three standard deviations across north and east from the measured location, most of (99.7%) the time. A way to represent this visually is to create an ellipse that maps this area of where the real location can be.

An ellipse can be uniquely defined from the lengths of the segments a and b and its rotation angle. For more details on how to get those ellipse parameters from the covariance matrix, please see the footnote.

Ground truth data

We also happen to have a file with the actual locations (also in longitudes and latitudes) of the object for the same time frame as the GPS (also in seconds), provided through another tracking method which is more accurate in this case.

This is because, the object was me running on a rooftop in Zürich wearing several tracking devices (not just GPS), and I knew exactly which floor tiles I was hitting.

The goal is to explore, visually, the relationship between the GPS data and the actual locations in time. I hope to get an idea of the accuracy, and what can influence it.

First look

Loading the GPS data into QGIS and Time Manager, we can indeed see the GPS locations vis-a-vis the actual locations in time.

Let’s see if the actual locations that were measured independently fall inside the ellipse coverage area. To do this, we need to use the covariance data to render ellipses.

Creating the ellipses

I considered using the ellipses marker from QGIS.

It is possible to switch from Millimeter to Map Unit and edit a data defined override for symbol width, height and rotation. Symbol width would be the a parameter of the ellipse, symbol height the b parameter and rotation simply the angle. The thing is, we haven’t computed any of these values yet, we just have the error covariance values in our dataset.

Because of the re-projections and matrix calculations inherent into extracting the a, b and angle of the error ellipse at each point in time, I decided to do this calculation offline using Python and relevant libraries, and then simply add a WKT text field with a polygon representation of the ellipse to the file I had. That way, the augmented data could be re-used outside QGIS, for example, to visualize using Leaflet or similar. I could have done a hybrid solution, where I calculated a, b and the angle offline, and then used the dynamic rendering capabilities of QGIS, as well.

I also decided to dump the csv into an sqlite database with an index on the time column, to make time range queries (which Time Manager does) run faster.

Putting it all together

The code for transforming the initial GPS data csv file into an sqlite database can be found in my github along with a small sample of the file containing the GPS data.

I created three ellipses per timestamp, to represent the three standard deviations. Opening QGIS (I used version: 2.12, Las Palmas) and going to Layer>Add Layer>Add SpatialLite Layer, we see the following dialog:

After adding the layer (say, for the second standard deviation ellipse), we can add it to Time Manager like so:

We do the process three times to add the three types of ellipses, taking care to style each ellipse differently. I used transparent fill for the second and third standard deviation ellipses.

I also added the data of my actual positions.

Here is an exported video of the trace (at a place in time where I go forward, backwards and forward again and then stay still).

Conclusions

Looking at the relationship between the actual data and the GPS data, we can see the following:

Although the actual position differs from the measured one, the actual position always lies within one or two standard deviations of the measured position (so, inside the purple and golden ellipses).

The direction of movement has greater uncertainty (the ellipse is elongated across the line I am running on).

When I am standing still, the GPS position is still moving, and unfortunately does not converge to my actual stationary position, but drifts. More research is needed regarding what happens with the GPS data when the tracker is actually still.

The GPS position doesn’t jump erratically, which can be good, however, it seems to have trouble ‘catching up’ with the actual position. This means if we’re looking to measure velocity in particular, the GPS tracker might underestimate that.

These findings are empirical, since they are extracted from a single visualization, but we have already learned some new things. We have some new ideas for what questions to ask on a large scale in the data, what additional experiments to run in the future and what limitations we may need to be aware of.

Like this:

Do you sometimes start writing an SQL query and around at line 50 you get the feeling that it might be getting out of hand? If so, it might be useful to start breaking it down into smaller chunks and wrap those up into custom functions. Never done that? Don’t despair! There’s an excellent PL/pgSQL tutorial on postgresqltutorial.com to get you started.

To get an idea of the basic structure of a PL/pgSQL function and to proof that PostGIS datatypes work just fine in this context, here’s a basic function that takes a trajectory geometry and outputs its duration, i.e. the difference between its last and first timestamp:

My end goal for this exercise was to implement a function that takes a trajectory and outputs the stops along this trajectory. Commonly, a stop is defined as a long stay within an area with a small radius. This leads us to the following definition:

Note how this function uses RETURNS TABLE to enable it to return all the stops that it finds. To add a line to the output table, we need to assign values to the sequence and geom variables and then use RETURN NEXT.

Another reason to use PL/pgSQL is that it enables us to write loops. And loops I wanted for my stop detection function! Specifically, I wanted to go through all the points in the trajectory:

FOR pt IN SELECT (ST_DumpPoints(traj)).geom LOOP
-- here comes the magic!
END LOOP;

Eventually the function should go through the trajectory and identify all segments that stay within an area with max_size diameter for at least min_duration time. To test for the area size, we can use:

While this function is not really short, it’s so much more readable than my previous attempts of doing this in pure SQL. Some of the lines for determining is_stop are not strictly necessary but they do speed up processing.

Performance still isn’t quite where I’d like it to be. I suspect that all the adding and removing points from linestring geometries is not ideal. In general, it’s quicker to find shorter stops in smaller areas than longer stop in bigger areas.

Let’s test!

Looking for a testing framework for PL/pgSQL, I found plpgunit on Github. While I did not end up using it, I did use its examples for inspiration to write a couple of tests, e.g.

Basically, each test is yet another PL/pgSQL function that doesn’t return anything (i.e. returns void) but outputs messages about the status of the test. Here I made heavy use of the PERFORM statement which executes the provided function but discards the results:

Like this:

Last week, I traveled to Salzburg to attend the 30th AGIT conference and co-located English-speaking GI_Forum. Like in previous year, there were a lot of mobility and transportation research related presentations. Here are my personal highlights:

This year’s keynotes touched on a wide range of issues, from Sandeep Singhal (Google Cloud Storage) who – when I asked about the big table queries he showed – stated that they are not using a spatial index but are rather brute-forcing their way through massive data sets, to Laxmi Ramasubramanian @nycplanner (Hunter College City University of New York) who cautioned against tech arrogance and tendency to ignore expertise from other fields such as urban planning:

One issue that Laxmi particularly highlighted was the fact that many local communities are fighting excessive traffic caused by apps like Waze that suggest shortcuts through residential neighborhoods. Just because we can do something with (mobility) data, doesn’t necessarily mean that we should!

The session Spatial Perspectives on Healthy Mobility featured multiple interesting contributions, particularly by Michelle P. Fillekes who presented a framework of mobility indicators to assess daily mobility of study participants. It considers both spatial and temporal aspects of movement, as well as the movement context:

My introduction to GeoMesa talk failed to turn up any fellow Austrian GeoMesa users. So I’ll keep on looking and spreading the word. The most common question – and certainly no easy one at that – is how to determine the point where it becomes worth it to advance from regular databases to big data systems. It’s not just about the size of the data but also about how it is intended to be used. And of course, if you are one of those db admin whizzes who manages a distributed PostGIS setup in their sleep, you might be able to push the boundaries pretty far. On the other hand, if you already have some experience with the Hadoop ecosystem, getting started with tools like GeoMesa shouldn’t be too huge a step either. But that’s a topic for another day!

Since AGIT&GI_Forum are quite a big event with over 1,000 participants, it was not limited to movement data topics. You can find the first installment of English papers in GI_Forum 2018, Volume 1. As I understand it, there will be a second volume with more papers later this year.

Like this:

If you’re are following me on Twitter, you’ve certainly already read that I’m working on PyQGIS 101 a tutorial to help GIS users to get started with Python programming for QGIS.

I've been toying around with the idea of a PyQGIS intro for non-programmers without all the boring parts of vanilla Python tutorials. Not sure how far this will go, but it's a start: https://t.co/nbO5tm2ZW9

I’ve often been asked to recommend Python tutorials for beginners and I’ve been surprised how difficult it can be to find an engaging tutorial for Python 3 that does not assume that the reader already knows all kinds of programming concepts.

It’s been a while since I started programming, but I do teach QGIS and Python programming for QGIS to university students and therefore have some ideas of which concepts are challenging. Nonetheless, it’s well possible that I overlook something that is not self explanatory. If you’re using PyQGIS 101 and find that some points could use further explanations, please leave a comment on the corresponding page.

PyQGIS 101 is a work in progress. I’d appreciate any feedback, particularly from beginners!