GIS software is designed to capture, manage, analyze, and display (on a map!) all forms of geographically referenced information. Before diving deep into software, we review basic concepts for geographical data and the associated open source tools.

Geographical Data

Breaking down these uses of GIS, we find the usual components of any data driven project. The strong emphais on geographical marks the special attention that spatial attributes of data require.

capture

record data with sufficient context to be useful

manage

work with vast amounts of data, often captured in different contexts

analyze

rely on the inherent spatial relationships between data

display

conveniently display data in its native form (on a map!)

So What?

One of the most famous mistakes, amongst academic geographers at least, was an article in the Economist magazine on the threat of missiles from North Korea.
The first map they published, shown above, did not account for the fact that the world is spherical.
It massively underestimated the reach of the North Korean missile designs.
The corrected version is shown below, using the proper interpretation of the same missile-ranges.

Vector File Formats

Geospatial Data Abstraction Library (GDAL)

GDAL interprets between a vast collection of storage formats and applications that read or write spatial data. It has a single abstract data model for rasters, and another for vector data. Because GDAL knows all about different formats, it includes many tools for translating and processing geographical data.

Translation

In addition to reading spatial data in all those formats, GDAL provides capabilities for writing data in new formats.
Translations can also occur between the same format with a step that modifies the data or metadata.

Write metadata about a raster object that is missing from the world.png raster.

Compare the output of gdalinfo for the original world.png and new geoworld.tif raster files. What did the “corner coordinates” property of the PNG file correspond to before translation? And after translation?

The spatial data processing tools provided by GDAL are executed by giving “arguments” to the gdal_translate, which take the form -flag param1 param2 ..., which is how most command line utilities work.

The flag -outsize takes two parameters for the horizontal and vertical output size (optionally as percentages of the input size) in pixels.

gdal_translate -outsize 10% 10% geoworld.tif geoworld-small.tif

The flag -projwin takes four parameters that specify the new corners of the output file, giving a subwindow of the input file.

gdal_translate -projwin -180 90 0 0 geoworld.tif geonw.tif

Exercise

Choose coordinates that bound your home state or country, and create a new file called natural-earth-home.tif that only includes that area of the natural-earth.tif. The order of parameters is upper left “x”, upper left “y”, lower right “x”, lower right “y”.

Question

Why bother with this approach over cropping the image with an image editor?

Conversion

Here’s a handy little utility. Ever want to quickly peek at a shapefile? Why not convert it to a raster at the command line?

Examin the metadata of urban-areas.tif using the -stats flag. This provides additional information, but does it look right? gdal_rasterize is very literal, it set polygons to 255 and everything else to 0, whereas areas outside the polygons might be more appropriately set to “Null”. Read the docs to decide how it can be done. Try to re-create the raster so that the mean value is 255.

In the SQL window we can execute PostGIS functions that rely on the GEOS libary.
A simple example is the function ST_Length, where ST indicates the function accepts spatial types as arguments.

SELECT ST_Length(ST_GeomFromText('LINESTRING(0 0, 1 1)'));

st_length
-----------------
1.4142135623731
(1 row)

A very common vector operation in spatial analysis is to create a buffer around a spatial object.

SELECT ST_AsText(ST_Buffer(ST_GeomFromText('POINT(0 0)'), 1, 1));

st_astext_
----------------
POLYGON((1 0, 0 -1, -1 0, 0 1, 1 0))

Exercise

Calculate an approximation to Pi, the ratio of a circle’s circumference to its diamter, using GEOS functionality built into PostGIS. The function ST_Perimeter gives you the length of the perimiter of a polygon.

Geographic Coordinate Systems (GCS)

A geographic coordinate system (GCS) is a reference system that makes it possible to specify locations on the surface of the earth by a latitude and longitude.

An angular unit of measure

The angular unit of measure is usually a degree, or one 360th of a circle, and fractional degrees are usually specified as decimals.

A prime meridian

The location of 0° longitude, e.g. the Royal Observatory, Greenwich, UK

A datum

An apprimation to the shape of the earth’s surface and where (in the GCS being defined) certain fixed positions are located. A spheroid (a flattish sphere) is used to approximate this shape, and the reference positions anchor its surface to known points.

Spheroids and Geoids

Set an elipse on edge like a coin and spin it: the three dimensional shape you see is a spheroid. This is a simple geometric object.

A surface defined by the Earth’s gravity field is called the geoid. The shape of the geoid is irregular, but overall, it approximates mean sea level (or where mean sea level would be without the continents).

The geoid is related to a spheroid by it’s height, which can be either positive or negative, on a straight line perpendicular to the spheroid. Note that this height is not the height of the ground, or elevation.

Spheroids and Geoids

Although highly exaggerated, this graphic illustrates that the earth itself (the black line) is irregularly shaped. The blue spheroid works well in two areas, but not over the entire surface of the earth. The red spheroid works well in only one area, but it may be a better fit there than the blue spheroid. With the advent of global positioning systems (GPS), new datums and ellipsoids have been developed for the entire globe.
Spheroids create a totally smooth surface across the world, but because this does not reflect reality very well, this ability of a local datum to incorporate local variations in elevation is important.

Datum

Every geographic coordinate system begins with a precisely surveyed starting point.

Provides a frame of reference for measuring relative position

Defines the origin and orientation for lines of latitude and longitude

Datum transformations

e.g. ‘WGS84’

Question:

Examine the output of gdalinfo for natural-earth.tif again (see reminder below). What is the name of Geographic Coordinate System (GEOGCS)? What about the name of the spheroid? Are these different?

Projected Coordinate Systems (PCS)

A projected coordinate system (PCS) is defined on a flat, two-dimensional surface.

Unlike a geographic coordinate system, a projected coordinate system has constant lengths, angles, and areas across the two dimensions.

If you allow height above the flat surface, then you can locate any point in three dimensions with x, y, and z coordinates (these are not coordinates in Space!)

A projected coordinate system is defined by four components:

a geographic coordinate system

a map projection

any parameters needed by the map projection

a linear unit of measure (such as feet or meters)

Question

What GCS is used in westernfires_vir_2015231_geo.tif? What is the map projection?

Why use a projected coordinate system?

Lat/lon is good system for storing spatial data, but areas and distances must be explicitly calculated on a curved surface.

You are making a map in which you want to preserve one or more of these properties: area, shape, distance, and direction.

You are making a small-scale map such as a national or world map. With a small-scale map, your choice of map projection determines the overall appearance of the map. For example, with some projections, lines of latitude and longitude will appear curved; with others, they will appear straight.

Map Projections

Converts the earth’s three-dimensional surface to a map’s two-dimensional surface. This mathematical transformation is commonly referred to as a map projection.

Using GDALs ability to re-project raster images with gdalwarp, let’s compare two rasters.

For localized measures, use local UTM: Universal Transverse Mercator is one of the most common nearly-global projected coordinate systems. The system divides the globe into 60 zones every six degrees of longitude. These zones run from pole to pole. Each zone has its own projection parameters to maximize accuracy. East-west measurements are made in meters from an origin local to the specific zone called the central meridian. North-south measurements are made in meters from the equator.

How does this impact your spatial analysis?

Unless explicitly calculating geometric attributes on a curved surface, the projection used will affect the result of any such calculations. For a “planar” calculation of the area of the USA, for example, different projections give different results.

Summary

The open source stack has two essential libraries at the bottom: GDAL of reading, writing and translating raster and vector data, and GEOS for geometric calculations on vector data.

We will move up the stack to software interacting with these libraries in the next lesson, but the command line utilities offered by GDAL are available for batch processing of spatial data using a shell script.

Projections distort the world - be aware of the spatial reference system your spatial data employ. If possible, find spatial data in an appropriate PCS for your analysis or convert unprojected data with a known GCS to an appropriate PCS.

If you need to catch-up before a section of code will work, just squish it's
🍅 to copy code above it into your clipboard. Then paste into your interpreter's
console, run, and you'll be ready to start in on that section. Code copied by
both 🍅 and 📋 will also appear below, where you can edit first, and then copy,
paste, and run again.