Interact with PostGIS from R

PostGIS extends capabilities of PostgreSQL database to deal with spatial data. Using PostGIS, your database supports geographic queries to be run directly in SQL. In this blog post, we will connect and interact with a PostGIS database from R, using {DBI} and {sf}.

R package {sf} was created to supports simple features (thus, named “sf”). “Simple features” is an ISO standard defined by the Open Geospatial Consortium (OGC) to standardize storage and access to geographical datasets. PostGIS supports objects and functions specified in the OGC “Simple Features for SQL” specification.

Most {sf} functions starts with st_ to follow the denomination of functions in PostGIS. For instance, function sf::st_union has its equivalent ST_Union in Postgis. If you were wondering why, now you know !

Spatial data with {sf} are manipulated in a SQL way. Simple features in {sf} allows to manipulate vector spatial objects using {dplyr}, which syntax is similar to SQL functions names. For instance, dplyr::select, which can be applied on {sf} objects, has its equivalent SELECT in SQL syntax.

Connection to the PostGIS database from R

You can connect to any database using {DBI} package, provided that you have the correct drivers installed. If you use Rstudio as your IDE to develop in R, I would recommend using package {sqlpetr} to connect to your Postgres database. This package has a “connexion contract” allowing to explore your database directly in the Rstudio Connection Pane.

Write and read World data in PostGIS

In the following examples, we will play with a World map from package {rnaturalearth}. Equal Earth projection is used for maps representation here. Equal Earth projection is in the last version of “proj”. If it is not available to you, you can choose another one in the code below. But please avoid Mercator.

Use SQL queries before loading the spatial data

When using st_read, the dataset is completely loaded in the memory of your computer. If your spatial dataset is big, you may want to query only a part of it. Classical SQL queries can be used, for instance to extract Africa only, using st_read with parameter query.

Tips: Do you need help for your SQL query ?

PostGIS allows you to make SQL queries with geomatics functions that you can also define before reading your dataset with {sf}. However, if you are not totally comfortable with SQL but you are a ninja with {dplyr}, then you will be fine. By using dplyr::show_query, you can get the SQL syntax of your {sf} operations. Let’s take an example.With {sf}, union of countries by continent could be written as follows:

Now, let us connect to the Postgres database with {dplyr}. When connected to a database with {dplyr}, queries are executed directly in SQL in the database, not in the computer memory.Then use show_query to get the translation in SQL proposed by {dplyr}. Note that with {dbplyr} (>1.4.0), instead of tbl, you can use a lazy_frame without real connection to the database to do your tests.

{dplyr} has not been tailored to translate spatial operations in SQL, but as you can see below, difference with the correct query is very small. Only the PostGIS function is not correctly written (ST_Union).

Be careful. You can not trust operations with {dplyr}/{dbplyr} on a spatial database because the ‘geometry’ column is not imported as spatial column with tbl() and you would loose it. However, you can use {dplyr} syntax to explore your dataset and all columns out of the spatial information. All operations will be realised inside the database, which assures not to overload the memory.

What about rasters ?

If you follow this blog, you know that I like rasters. Package {sf} is only dealing with spatial vector objects. It can not manage rasters in a PostGIS database. To do so, you can have a look at package {rpostgis} with functions pgGetRast() and pgWriteRast(). This package works with Rasters from package {raster}. You will need to connect using package {RPostgreSQL}, which does not have a Rstudio “connexion contract”. Note that you can have two connections to your database, one with {sqlpetr} to keep the Rstudio Connection pane, and the other with {RPostgreSQL} to connect with Raster objects.