An open source point cloud data infrastructure: introduction, and land.

I’ve recently been exploring the Point Data Abstraction Library (PDAL) and Entwine ecosystem a lot. For work, and for proving some ideas that have been kicking around for a few years now (see my EGU poster from 2016, or FOSS4G talk in 2017). These are not new ideas – other people have had them for a lot longer – and we’ve all evolved in parallel. Right now, a few things seem to be converging based on the work of many visionaries.

The concept of point clouds as a foundation data service has been on some seriously large minds, resulting in infrastructure like Opentopography, Lidar.io, and Geoscience Australia’s ELVIS. All have a really great stab at the idea in different ways, but miss components. At the time of writing a user cannot make a web request and have a product turn up in their GIS or other workflow without doing some extra wrangling . I made a stab at exactly that using the OGC Web Processing Service as an API (PyWPS to be precise); and it worked. It wasn’t scalable the way I built it, there was no UI to search for data, and the web service backend was barely ready to prototype. But it did that one thing all the other servicesdidn’t: you could make a web request and have data show up in QGIS (or other tools) – using tools which already exist.

There were two more drivers for the use of WPS as an API: anyone could build a user interface. Also, anyone (for example OpenTopography) could federate the data – that is, add a tool for querying the metadata catalogue to their service, but have the data products made elsewhere.

Proving the concept – on land

Let me demonstrate. In order to do so, we’ll grab a building from OpenStreetMap and take a look at it in QGIS. We’ll also remove everything except a building we’re interested in – in this case the National Museum of Australia (note – if you were cleverer at OSM, you could directly query for the exact polygon you’re interested in. Also this post led directly to an OSM update for this building. Open data wins all round).

I can then take bounding box for the OSM subset and form a pretty simple configuration file to clip out some point cloud data from the resource we’ve just explored in our browser. Let’s save the JSON below as ‘entwine-nma-bbox.json’

In this example, we still transfer all the data inside the bounding box across the internet to the machine running PDAL, then discard all the points outside the polygon. We could also reverse the operation – clip first, then operate on points we’ve kept (you might want to do this if your initial bounding box is huge).

What else can we do? Create a raster output? Sure.

Here’s an example doing just that – a rasterised surface model, displayed in QGIS, using height above ground (building height – because again, BIM is this Tuesday, and geomatics was last Wednesday):

Rasterised National Museum building height

…and another, this time a mesh model displayed in MeshLab:

Mesh model of the National Museum and surrounding ground

In the tradition of more, what else can we do? Add some custom Python code? OK. Return only ground points? Fine. Whatever we can do with PDAL, we can do here.

Speaking for the trees, too

BIM and smart cities might be hot past next Tuesday, but landscape ecology is our forever.

So let’s look at another datasource – 7610 square kilometres of pretty flat Mallee country in Australia, collected by Geoscience Australia and hosted in their ELVIS system (again, with apologies to Safari users):

As a quick aside – now we’re covering some ground, literally. We’ve jumped from 1600 to 7610 square kilometres. There’s a pattern here!

What happens here? The Python code below grids the LiDAR into a user-defined grid, counts how many points are in each cell, counts how many of those points have Classification = 5 (tall vegetation) and computes a normalised tree percentage. It then writes out a raster of the results. By the way, feel free to suggest betterfasterways to do the same tasks, as long as imported libraries can be kept minimal

Here’s what it looks like in QGIS, after applying some gdalwarp smoothing love. Dark blue means nearly no points were classified as trees, yellow means nearly all the points in a cell were classified as trees.

We also push out a DTM for good measure – that’s why the range filter and gdal writer after the Python code block exist.

The entire operation (data query, processing, writing out two rasters) took 37 seconds – this is time to spare within a single synchronous HTTP request. Clearly, scaling to 100 x 100km might slow things down a little.

Curiosity is fine – but the real application here? In a pretty quick and hacky brew of some JSON and Python, we have a validation tool using LIDAR datasets for satellite-derived tree coverage. 10 metres roughly corresponds to Sentinel-2, we change the cell size to 25 and we have correspondence with DEA ARD, 30 and we have USGS Landsat. I hope the picture is getting clearer.

Summary: A fundamental shift

The fundamental shift here in my eyes is that we have the largest number of parts required in one place that I’ve ever seen for a now-and-future point cloud data infrastructure. Using a single data archive we can tick off exploratory visualisation and real data processing, all without data loss, and with access to complete metadata – required in an open science/open data landscape. We also have compression, and a spatial index in 3D.

Whether you’re a c++ creator, Pythonista extraordinaire or a ‘scientific coder’ and command line hack like myself, this is a warp factor 10 to real data application and utility. It’s getting close to effectively cracking a massive problem – how to manage all these awesome data we collect!

I can’t write this post without a huge shout out to Hobu Inc, and all their supporters/funders – who have really nursed an incredible set of tools into being, and made them open for us all to build with. And I am only scratching at the surface here.

Next…

Like what you see here? Also check out part 2: oceans, and stay tuned for part 3: objects – coming soon.

The sales pitch

Spatialised is a fully independent, full time consulting business. The tutorials, use cases and write-ups here are free for you to use, without ads or tracking.

If you find the content here useful to your business or research, you can support production of more words and open source geo-recipes via Paypal or Patreon; or hire me to do stuff. Enjoy!