System libraries cannot be installed by R’s install.packages(), but can be
bundled with these packages and for private use by them. Either way, the
necessary libraries are maintained by the good people at the Open Source
Geospatial Foundation for free and easy distribution.

Vector Data

The US Census
website
distributes county polygons (and much more) that are provided with the handouts.
The sf package reads shapefiles (“.shp”) and most other vector data:

Plot Layers

Spatial objects defined by sf are compatible with the plot
function. Setting the plot parameter add = TRUE allows an existing plot to
serve as a layer underneath the new one, so long as the CRS lines up.

But note that the plot function won’t prevent you from layering up geometries
with different coordinate systems: you must safegaurd your own plots from this
mistake. The arguments col and pch, by the way, are graphical parameters
used in base R, see ?par.

Spatial Subsetting

An object created with st_read is a data.frame, which is why the dplyr
function filter used above on the non-geospatial column named “STATEFP”
worked normally. The equivalent of a filtering operation on the “geometry”
column is called a spatial “overlay”.

It can be seen as a type of subsetting based on spatial (rather than numeric or
string) matching. Matching is implemented with functions like st_within(x, y).
The output implies that the 1^st^ (and only) point in sesync is within the 5th
element of counties_md.

Question

What was the message issued by the last command all about?

Answer

It is a reminder that all geometric calculations are performed as
if the coordinates (in this case longitutde and latitude) are Cartesian x,y
coordinates.

The Census data uses unprojected (longitude, latitude) coordinates, but huc is
in an Albers equal-area projection (indicated as “+proj=aea”).

The function st_transform() converts a sfc between coordinate reference
systems, specified with the parameter crs = x. A numeric x must be a valid
EPSG code; a character x is interpretted as a PROJ.4 string.

Proj4 strings contain a reference to the type of projection, this one is another
Albers Equal Area, along with numeric parameters associated with that
projection. An additional important parameter that may differ between two
coordinate systems is the “datum”, which indicates the standard by which the
irregular surface of the Earth is approximated by an ellipsoid in the
coordinates themselves.

Use st_transform() to assign the two layers to a common projection string
(prj). This takes a few moments, as it requires re-calculating coordinates for
every vertex of every polygon in the sfc.

Geometric Operations

The data for a map of watershed boundaries within the state of MD is all here;
in the country-wide huc and in the state boundary “surrounding” all of
counties_md. To get just the huc in a MD outline:

remove the internal county boundaries within the state

clip the hydrological areas to their intersection with the state

The first step is a spatial union operation: we want the resulting object to
combine the area covered by all the multipolygons in counties_md.

state_md<-st_union(counties_md)plot(state_md)

To perform a union of all sub-geometries in a single sfc, we use the
st_union() function with a single argument. The output, state_md, is a new
sfc that is no longer a column of a data frame. Tabular data can’t safely
survive a spatial union and is discarded.

The second step is a spatial intersection, since we want to limit the
polygons to areas covered by both huc and state_md.

huc_md<-st_intersection(huc,state_md)

Warning: attribute variables are assumed to be spatially constant
throughout all geometries

plot(state_md)plot(huc_md,border='blue',col=NA,add=TRUE)

The st_intersection() function intersects its first argument with the second.
The individual hydrological units are preserved but any part of them (or any
whole polygon) lying outside the state_md polygon is cut from the output. The
attribute data remains in the corresponding records of the data.frame, but (as
warned) has not been updated. For example, the “AREA” attribute of any clipped
HUC does not reflect the new polygon.

The GEOS library provides many functions dealing with distances and areas. Many
of these are accessible through the sf package, including:

st_buffer: to create a buffer of specific width around a geometry

st_distance: to calculate the shortest distance between geometries

st_area: to calculate the area of polygons

Keep in mind that all these functions use planar geometry equations and thus
become less precise over larger distances, where the Earth’s curvature is
noticeable. To calculate geodesic distances that account for that curvature,
checkout the geosphere package.

Raster Data

Raster data is a matrix or cube with additional spatial metadata (e.g. extent,
resolution, and projection) that allow its values to be mapped onto geographical
space. The raster package provides the eponymous raster() function
for reading the many formats of such data.

The National Land Cover Database is ‘.GRD’
format data, a lot of it. The file provided in this lesson is cropped and
reduced to a lower resolution in order to speed processing.

library(raster)nlcd<-raster("data/nlcd_agg.grd")

By default, raster data is not loaded into working memory, as you can confirm
by checking the R object size with object.size(nlcd). This means that unlike
most analyses in R, you can actually process raster datasets larger than the RAM
available on your computer; the raster package automatically loads pieces of the
data and computes on each of them in sequence.

The default print method for a raster object is a summary of metadata contained
in the raster file.

The extent can be extracted from sp package objects with extent,
but must be created “from scratch” for an sfc. Here, we crop the nlcd raster
to the extent of the huc_md polygon, then display both layers on the same map.
Also note that the transformed raster is now loaded in R memory, as indicated by
the size of nlcd. We could have also saved the output to disk by specifying an
optional filename argument to crop; the same is true for many other
functions in the raster package.

A raster is fundamentally a data matrix, and individual pixel values can be
extracted by regular matrix subscripting. For example, the value of
the bottom-left corner pixel:

>nlcd[1,1]

41

The meaning of this number is not immediately clear. For this particular
dataset, the mapping of values to land cover classes is described in the data
attributes:

The Land.Cover.Class vector gives string names for the land cover type
corresponding to the matrix values. Note that we need to add 1 to the raster
value, since these go from 0 to 255, whereas the indexing in R begins at 1.

Raster Math

Mathematical functions called on a raster gets applied to each pixel. For a
single raster r, the function log(r) returns a new raster where each pixel’s
value is the log of the corresponding pixel in r.

Likewise, addition with r1 + r2 creates a raster where each pixel is the sum of the
values from r1 and r2, and so on. Naturally, spatial attributes of rasters
(e.g. extent, resolution, and projection) must match for functions that operate
pixel-wise on multiple rasters.

Logical operations work too: r1 > 5 returns a raster with pixel values TRUE
or FALSE and is often used in combination with the mask() function.

Here, fact = 25 means that we are aggregating blocks 25 x 25 pixels and fun =
modal indicates that the aggregate value is the mode of the original pixels
(averaging would not work since land cover is a categorical variable).

The creation of geospatial tools in R has been a community effort, and not
necessarilly a well-organized one. One current stumbling block is that the
raster package, which is tightly integrated with the sp
package, has not caught up to the sf package. The
still-under-development stars package aims
to remedy this problem and others.

Crossing Rasters with Vectors

The extract function allows subsetting and aggregation of raster values based
on a vector spatial object.

plot(nlcd)plot(sesync,col='green',pch=16,cex=2,add=TRUE)

When extracting by point locations (i.e. a SpatialPoints object), the result
is a vector of values corresponding to each point.

When extracting with a polygon, the output is a vector of all raster values for
pixels falling within that polygon.

county_nlcd<-extract(nlcd_agg,counties_md[1,])

>table(county_nlcd)

county_nlcd
11 21 22 23 24 41
3 1 4 5 2 1

To get a summary of raster values for each polygon in a SpatialPolygons
object, add an aggregation function to extract via the fun argument. For
example, fun = modal gives the most common land cover type for each polygon in
huc_md.

Exercises

Exercise 1

Exercise 2

Use st_buffer to create a 5km buffer around the state_md border and plot it as a dotted line (plot(..., lty = 'dotted')) over the true state border. Hint: check the layer’s units with st_crs() and express any distance in those units.

Solution 2

Solution 3

If you need to catch-up before a section of code will work, just squish it's
🍅 to copy code above it into your clipboard. Then paste into your interpreter's
console, run, and you'll be ready to start in on that section. Code copied by
both 🍅 and 📋 will also appear below, where you can edit first, and then copy,
paste, and run again.