September 12, 2017

Intersects & R

We are increasingly performing spatial analyses in R. The replicability and the efficiency of programming languages is much more appealing than using user friendly softwares like ArcGIS, even though you can still code your way through analyses when using those softwares (latter versions of QGIS do a fantastic job in that regard!). The performance of tools available for spatial analyses in R is however not completely certain.

In this post, we compare four different methods to perform spatial intersects between objects in R, from three different packages:

raster::intersect

rgeos::gIntersects

rgeos::gIntersection

sf::st_intersects

sf::st_intersection

More specifically, we test how these methods fare when performing binary (TRUE/FALSE) and zonal or aerial intersects. Keep in mind, not all methods can be used for both binary and zonal intersects:

Function

Binary

Zonal

raster::intersect

X

X

rgeos::gIntersects

X

rgeos::gIntersection

X

X

sf::st_intersects

X

sf::st_intersection

X

X

Obviously, if you mean to perform binary intersects only, the binary functions make more sense as they are built to include less calculations. We nonetheless compare all the functions together for the sake of comparison in this post.

Generate spatial objects for testing

We start be generating random spatial object in space. For the record, the area selected is within the St. Lawrence estuary in eastern Canada (see online ecology series), although the actual location really does not matter for this post!

Grid

We use a regular grid to intersect vectorized data, i.e. points and polygons for this post. This simulate the use of a grid used to extract environmental data (biotic and/or abiotic) from multiple sources to characterize a study area.

Points and Polygons

Now we generate random points within the bounding box to test the intersects. This is done for 1, 10, 50, 100, 250, 500, 1000, 10000 points. Then, to get all data required to perform the tests, we also need to create polygons from the point data.

In this analysis rgeos::gIntersection is clearly much less efficient than the alternative options. Using raster::intersect, rgeos::gIntersects, sf::st_intersects or sf::st_intersection significantly decreases calculation time, with sf::st_intersects proving to be the most efficient option.

We see here that rgeos::gIntersects, sf::st_intersects, sf::st_intersection are far more efficient when dealing with polygons only intersects, with rgeos::gIntersects the most efficient option. raster::intersects loses its previous efficiency, while the efficiency of rgeos::gIntersection decreases even further.

Concluding remarks

Et voilà! It is obvious from these simulations that the sf package overall provides the most efficient options to perform spatial intersects in R. rgeos is also very efficient when it comes to binary intersects, especially with polygons on polygons intersects where it edges st_intersects by decreasing calculation time in half.

Our recommendation: use sf::st_intersects for binary intersects and use sf::st_intersection for zonal intersects. However, be aware that the sf package evolves very rapidly and functions are likely to be modified, although one would hope that efficiency decrease will not be the price of further development.

If you wish to stick with the older packages, then binary intersects could be done quite efficiently, but if you need zonal intersects, we recommend that you start considering changing your ways!