Finding swimming pools in Australia

Information about built environments is extremely valuable
to insurance companies, tax assessors, and public agencies, as it empowers a wide range of decision-making including
urban and regional planning and management, risk estimation, and emergency response.
Extracting this information using human analysts to scour satellite imagery
is prohibitively expensive and time consuming. Feature extraction and machine
learning algorithms are the only viable way to perform this type of attribution
at scale.

This is why the Australian company PSMA teamed up with DG to develop the product
Geoscape: a diverse set of building attributes
including height, rooftop material, solar panel installation and presence of a swimming pool
in the property across the entire Australian continent.

We’ll demonstrate how we used deep learning on GBDX to identify swimming pools in thousands of properties across
Adelaide, a major city on the southern coast of Australia with a population of approximately one million. The result: 31071 of the 670784 properties we classified contain pools; approximately 4.6%. This is more or less consistent with the Australian Bureau of Statistics data for households with a pool in South Australia. Compare this with the corresponding figure for New South Wales which is upwards of 10%. Given the similar arid or semi-arid climate, could we identify other reasons for the discrepancy? A little digging into the regions’ economic health might provide some clues:
New South Wales has a significantly higher average annual income compared to South Australia. Given the installation and maintenance costs of a swimming pool, their number could be a potential indicator of economic health!

How

Our GBDX workflow is shown in the following figure.

Platform workflow for property classification.

Preprocessing

The workflow begins with the file properties.geojson. This file contains a collection of polygons in (longitude, latitude) coordinates, each representing a property. Each polygon has two attributes: an image_id, which determines the DG catalog id of the satellite image corresponding to that polygon, and a feature_id, which is simply a number that uniquely identifies that property. In this example, the file only contains properties from a single WV03 image over Adelaide, Australia, with catalog id 1040010014800C00.

The main idea is to label a small percentage of the property parcels using crowdsourcing in order to create a training set train.geojson. We then use train.geojson to train a CNN-based classifier to identify the presence of a swimming pool in each of the remaining unlabeled properties of properties.geojson (referred to as target.geojson). For object classification at a continental or global scale this procedure is a must; it would be virtually impossible to label millions of properties manually in a reasonable amount of time.

Before executing the workflow, the raw image has to be ordered from the factory and processed into a format that is viewable and also usable by a machine learning algorithm. The last part is trickier than it sounds. We lovingly refer to it as UGHLi: Undifferentiated Geospatial Heavy Lifting. In this example, it involves orthorectification, atmospheric compensation, pansharpening and dynamic range adjustment; all this can be achieved with a single GBDX task, AOP_Strip_Processor. You can explore the image below or click here for a full page view.

Workflow inputs

The workflow requires three inputs.

A collection of labeled polygons in geojson format (the training data). In this example, the labeled properties are found in
train.geojson.

A sample of labeled properties.

A collection of polygons which will be classified, in geojson format (the target data). In this example, the properties to be classified are found in target.geojson.

A sample of properties to be classified.

The UGHLi’d image(s) which the polygons in the training and target data overlay, in GeoTiff format. In this example, it’s the Adelaide image.

Tasks

The workflow involves two tasks.

train-cnn-classifier: Trains a CNN classifier on the polygons in train.geojson. Required inputs are train.geojson, associated image strips, and class names as a string argument. This task returns the architecture and weights of the trained model.

The file classified.geojson which includes all the properties in target.geojson classified in ‘Swimming pool’ and ‘No swimming pool’.

Putting everything together

The entire workflow can be found here.
The workflow executes train-cnn-classifier and deploy-cnn-classifier in sequence, and saves the outputs in S3.

Visualizing the results

We uploaded the Adelaide image and classified.geojson to our Mapbox account, and then used the Mapbox GL Javascript library to display the raster and vector tilesets. Here are the results (full page view here). Green/red polygons indicate presence/absence of pool.

Discussion

This is a small example of what is possible on GBDX. Once a trained model is
obtained, it can be deployed on properties over hundreds or thousands of different images
in parallel. Continental scale classification becomes a matter of hours.

GBDX allows large scale parallelization.

In order to exploit the power of GBDX, an algorithm must be packaged into a GBDX task.
The procedure is described here in detail.
Get after it!