G-FAQ – What is GIS? Part IV

Any of our regular readers of the Geospatial Frequently Asked Question (G-FAQ), which is about two of you I am guessing…, know that I am, well, a bit verbose; and this three-part series which grew into a four-parter is no different! This overly-long G-FAQ series focused on the basic question, ‘What is GIS?’ In part one, we looked at the various definitions of GIS; in part two, the history of GIS; and in part three, we started to explore what can actually be done with GIS. In this, the truly final part of the What is GIS G-FAQ series, we will look at the tools available to users, several examples of the problems you can solve and then conclude with my inclusive definition of GIS.

To remind our readers, this G-FAQ series has focused on these core questions:

What is GIS: is it just computer software or is it a science? How did GIS develop into an established field of study? How does GIS work and what can you use it for?

Being that GIS is inherently a tool for spatial analysis, it is important for our readers to understand the concept of scale. Scale defines how large an object is on your screen versus how large it is in reality. For example, your scale might be 1:10,000 which means that an object that is one inch on your screen is 10,000 inches on the surface of the planet. When you are working with surface features where you need a very detailed view of them, this is called large scale; and when you are working on a project covering a very large area, called small scale, you will see less detail.

With GIS software, you are able to zoom in and zoom out on the data you have loaded, viewing it in both small and large scale depending on the exact question you are trying to answer. Do keep in mind that there is a limit to how detailed (or large scale) your analysis can be, and this limit is set by the datasets you are working with. It becomes rather apparent that you are working beyond the scale your data can handle when: (1) you zoom in on a vector edge or a set of vectors, and the boundaries become straight and show no more detail as the scale increases; or (2) you zoom in on your raster datasets and they look like a bunch of color squares (also called pixilation).

As has been alluded to in the previous parts of this G-FAQ series, spatial analysis in GIS hinges upon the attributes and numeric values tied to each feature of your geographic datasets. In the case of a raster, each pixel has a specific numeric value that is associated with a geographic phenomena, such as elevation. A raster can also have multiple layers with different numeric values in each overlapping pixel, for example in the case of a color image with red, green and blue reflectance values as the human eye sees. By having numeric values tied to each cell of a raster file, you are able to complete mathematic functions on a set of data with similar pixel sizes. For example, you might have three raster files with site suitability scores for an endangered animal species ranked from 1 to 5, with 5 the most suitable locations. Each of the three raster files has a suitability score determined by a different factor, perhaps slope, distance from a water body and dominant food type present; by summing the numeric values in each raster, the cells with the highest scores are the most suitable locations for the endangered species; and those with the lowest are the least suitability.

In the case of a vector file, the attributes of each feature are tied to its spatial boundaries through a relational database. This database is essentially a large spreadsheet with each column representing a different attribute and each row representing an individual spatial feature. Take for example a vector file of the world’s countries. The attributes for each country could include 2010 population, country area, average income and government type. Now with these attributes being presented in a standardized format, you are able to rank the countries by population or perform a more complex analysis that relies on multiple attributes. One question you could answer with this vector file is, “Do countries that are democratic have a higher average income?” Or maybe you want to know if, “Countries with a larger area and population tend to be more wealthy?” These are questions that can be answered with spatial analysis in GIS!

Spatial Analysis in ArcGIS

As in part three of this G-FAQ series, I focus here on the tools available for spatial analysis in Esri’s ArcGIS. Having the ability to perform a set of standardized and repeatable spatial functions separates GIS software from its engineering corollary, AutoCAD. For most users of ArcGIS, the spatial functions employed on a daily basis are found in the various toolboxes. Within each toolbox is a set of operations that are grouped together by common themes. In the Desktop version of ArcGIS 10.1, there are 19 toolboxes with more than 650 functions within them. Here are three of the most commonly used toolboxes by Apollo Mapping employees:

Conversion Tools – this lets you convert files between the various spatial formats ArcGIS can read.

Analysis Tools – these are functions for vectors related to buffering, clipping and merging various layers together.

Data Management – this is a huge toolbox with commonly used vector and raster functions. Some of the ones we use most frequently are the functions to change projections; to dissolve the internal boundaries of vectors; and orthorectification of rasters by applying a digital elevation model to the data.

As with most software applications, there are various levels of licenses that increase in price as more of the toolbox functions are unlocked for use.

One way to extend the functionality of ArcGIS is to make custom scripts and toolbox functions using the programming language Python. By creating your own custom scripts, you are able to automate workflows, process large datasets in bulk and even add new functions to the set of standard toolboxes. Python is a highly flexible open-source language that can work across software platforms. When used inside of ArcGIS, you are able to incorporate the prebuilt toolbox functions into the code you develop with a single line of script to further streamline common spatial analysis workflows. Python is an object-oriented language with the following basic hierarchical structure:

Object – a variable or piece of memory

Expression – these transform objects, such as multiply them by a value

Statement – performs tasks based on a set of expressions, such as print, import and export

Modules – a complete spatial analysis function or set of statements tied together to perform a more complex task than is possible with a single statement

Another way to automate complex workflows or to process datasets in bulk is called Model Builder. This is a graphic programming language of sorts, so for those of you without a degree in computer science, this could become your best friend in ArcGIS. With Model Builder, you are able to drag and drop the various standardized toolbox functions, connect them together and then add in import and export calls to handle your various spatial datasets. The models you build can even be shared with other ArcGIS users and converted to Python scripts.

What Exactly Can You do With ArcGIS

I recognize that the discussion above about the various tools you can use for spatial analysis is a bit obscure without some real world examples. So in this section, I will walk you through three ‘real world’ use cases for GIS. Bear in mind that there are literally thousands, millions or even more questions that you could answer, so these three examples are only meant to give you a taste and feel for what is possible with a GIS analysis.

(1) Are there any nuclear power plants located in political hotspots around the world?

Figure 1 – a map of political stability and nuclear power plant locations as created by Robin Brehm and her students at SUNY Downstate Medical Center.

This is one of the more straight forward uses of GIS as it is involves little analysis per say but lots of layering of datasets and cartographic work to make the final map as readable as possible. In order to answer this question, you need these three datasets: (1) a polygon vector file of the world’s countries; (2) a point vector file of the world’s nuclear power plants; and (3) information on the political stability of each country as determined by the CIA. Once you gather datasets 1 and 2 which can be found in ready-to-go shapefile format, you need to append the information on political stability found in the CIA Factbook as a new attribute in the country polygon. Now that we have the layers required to put together a map, the final steps are all cartographic in nature: (1) determine a color scheme that shows levels of political stability as a background; (2) put the point locations of the nuclear power plants on top of this stability layer in a color and shape that stands out; and (3) add a title, scale and compass rose. Check out Figure 1 for an example of a map that answers this question.

(2) What schools are within a half mile of Boulder, Colorado’s police or fire stations?

Figure 2 – a map showing the locations of the 19 Boulder schools within a half mile of a police or fire station.

In order to answer this question, we will need the following vector file inputs: (1) a line file of the city limits of Boulder; (2) a point file of the center of Boulder’s schools; and (3) a point file of the center of Boulder’s police and fire stations. Thankfully, Esri makes finding this data an easy task as it is part of the software package you receive when you purchase a license to ArcGIS 10.x. Once the data is all loaded, the next step is buffering the point file for the police and fire station locations by a half mile on all sides. Now we can take the buffered police-fire stations, select the Boulder schools that fall inside of them and then color them differently than the schools outside the buffers. The final step is putting together a nice map of the analysis we just completed, which you can check out in Figure 2.

As a side note, this is a question where scale can come into play. Consider that neither a school nor a police-fire station is a point, in reality it is a parcel of land that should be represented by a polygon. However, when you are working at the scale of a city-wide analysis, using a point file is an appropriate approximation of each site’s location. If you were working at the neighborhood level, it would make more sense to use a polygon to represent the actual school and police-fire station grounds.

(3) Where are the best suited areas to relocate black bears in the Great Smoky National Park?

Figure 3 – a map showing the least to most favorable areas for black bear relocation in the Great Smoky National Park.

Answering this question requires a more complex analysis than is presented in Examples 1 and 2, and it also requires the use of both vector and raster datasets. In order to answer this question on site suitability, you could incorporate a very wide range of data layers but in any GIS analysis, you are often limited to those that you can find readily available. As such, we will focus on the five layers that were used by John Lloyd of North Carolina State University to answer this question: (1) a vector polygon file of vegetation cover; (2) a vector line file of roads; (3) a vector line file of streams; (4) a vector line file of trails; and (5) a raster file of elevation. In order to complete this analysis, the vectors need to be converted to rasters with a cell size equal to our elevation layer. This is required as it is impossible to sum (also called band or raster math) any weighting values we assign to each variable unless the boundaries line up exactly. But then, as was discussed previously (see Part III of this G-FAQ series), the conversion of a ‘precise’ vector to a fixed-cell raster creates a less accurate boundary.

Once the vector files are converted to rasters, each dataset needs to be reclassified on a scale of 1 to 3, with 1 being the least favorable habitat for a black bear. So for example, the road raster layer is weighted according to a cell’s distance from a road: cells within 0.5 miles are weighted as 1; 0.5 to 1 mile as 2; greater than 1 mile as 3. The criteria used to reclassify each layer can be found in the link above to Lloyd’s analysis. In the final step, each of the weighting values are summed with band math. 20% of the final suitability score for each cell comes from the 5 layers, the cells with the highest final value are the most suitable for black bears. You can find a map of the results of this analysis in Figure 3.

My Definition of GIS

For those who have made it all the way through this four-part G-FAQ series, you should now see that GIS is both a mapping software and then the study of using this software to solve spatial problems. By layering together multiple spatial datasets, users are able to make maps as well as answer questions about the world around us. No definition that I can offer for GIS will make everyone happy, nor cover every theme possible, but regardless, let’s end this G-FAQ with what I came up with after writing this series over the past four months:

GIS is a mapping software – and the field of science that studies it – which is used to answer questions on physical and/or human phenomena in a standardized and repeatable fashion.