Hack 77. Become a GRASS Ninja

As children playing Make Believe, my friends and I would sometimes imagine ourselves to be a gang of ninja warriors, entrusted with some fiendishly difficult mission of infiltration and surprise. For whatever reason, ninjahood, at least as we envisioned it, was the epitome of competence, resourcefulness, and daring. Ninjas could go anywhere, do anything, and, above all, they were unstoppable. Perhaps it's just the crowd I circulate in, but I still hear the word ninja used from time to time, to signify the presence of those qualities in one context or another. If one could say of a person that she was, for example, a culinary ninja, then, without a doubt, she was the person you wanted to have catering your son's Bar Mitzvah party.

In this hack, we'll talk about the kind of thinking it takes to become that most formidable of open source GIS users, a GRASS ninja. Mind you, I don't claim to be one by any stretch of the imagination, but in the preparation of this book, I've seen just enough of GRASS to want to gesticulate in the direction in which GRASS ninjahood might lie. In the process, I want to show why open source GIS in the Unix environment is at least as good, if not betterfor values of "better" that might be important to hackersthan any flashy, GUI-laden, commercial GIS application you can imagine.

6.16.1. Objective #1: Secure the World's Borders

The objective for this GRASS ninja-training mission is to obtain a high-quality vector map of the world's contemporary international borders,[1] with attribute values identifying which country each polygon belongs to. Ideally, the map should have no restrictions on its use and redistribution and be suitable for creating derivations, such as SVG base maps and so on. That's all. Nothing fancy. Right?

[1] We will ignore, for the moment, the fact that people often have quite heated differences of opinion on precisely whose international borders lie where.

Well, you might think so, and if you weren't me, you might be right. But my coauthors and I searched high and low on the Web for a decent vector map of the world's international borders, and we didn't find one straight off the bat. We found a bunch of old data, like VMAP0, formerly known as the Digital Chart of the World, which is also a bit large and unwieldy to work with. We found out how to get the polygons out of the Generic Mapping Tools (see [Hack #28] ), only to discover that the countries weren't identified. We found different services, such as ESRI's world base map, or Pennsylvania State University's online DCW server, that only allow you to download a bit of the world at a time. We found some vector data of dubious provenance in ESRI's undocumented Ungenerate, or E00, format, which GRASS apparently decided it didn't like at all.

We found the United Nations Environmental Programme's GEO Data Portal (http://geodata.grid.unep.ch/), which offered a map of the world's first-level administrative divisions circa 1998 from ESRI in shapefile format. At first, we shied away from downloading it, because we were concerned about redistribution rights, but over on ESRI's site, we found some verbiage to the effect the data was all drawn from public domain sourcesCIAWorld Data Bank II and so onand that they claimed no intellectual property rights in the data set, beyond the shapefile format itself. So we downloaded the 5 MB ZIP file from http://geodata.grid.unep.ch/download/admin98_po_shp.zip and unpacked it, yielding about 13 MB of ESRI Shapefile:

The next thing we did was to try to get the data into GRASS so that we could have a look at it, play with it, and see what was in it. v.in.ogr was our tool of choice, with the -o option to tell GRASS to ignore the fact that the shapefile came without machine-readable projection metadata. This was safe to do, as the human-readable metadata in the HTML file assured us that the data was given in geographic coordinates (i.e., latitude and longitude) and therefore matched our current GRASS location.

If we'd needed to create a new GRASS location for this, the thing to do would have been to create a latitude-longitude location, decline to set a geodetic datum, set the ellipsoid to something sensible like wgs84, and leave the default region alone (or set it to 90º N / -90º S / -180º W / 180º E).

Well, v.in.ogr chewed and puffed over the data for quite some time, importing it into GRASS's internal vector format, cleaning up the polygons, snapping vertices, eliminating redundant edges, and generally making sure that the topology of the vector data was correct and so forth. Finally, once it had finished and the dust had settled a bit, we found ourselves in possession of a new high-resolution vector layer of the world! Figure 6-52 shows a portion of Western Europe from the admin layer.

Figure 6-52. Part of a first attempt to get a vector map of the world into GRASS

The attribute data was even demonstrably all there. GRASS 5.7 has a terrific vector-attribute model, in which each vector layer can be associated with a table of attributes, and then each feature in the layer can be associated with a row of data in the table, by means of a numeric category ID. GRASS offers a set of commands prefixed with db, which permit SQL-like interactions with a vector layer's attribute table. You can use v.info to get a list of attribute columns for a given table, and db.select to view the actual contents of those columns:

We say "SQL-like" because, by default, the actual table is stored on disk in a DBase IV .dbf file, but a vector layer can (at least in theory) be associated with a table from an actual relational database, like MySQL or PostgreSQL. The interested reader can refer to the GRASS documentation for further details.

So far, so good. However, a single glance at Figure 6-52 immediately reveals that it depicts rather a few more boundaries than one is accustomed to seeing on a map of Europe. The international boundaries are there, to be sure, but so is every other first-level administrative boundary in the world, making individual countries a bit hard to pick out. This is well and good, but it also means we're not done yet.

As it turns out, ESRI distributes a cntry98.shp data set that avoids some of these data-creation issues. However, we've never been able to find a copy of this data set available for download on the Web.

6.16.2. Objective #2: Reunite the World's Nations

The would-be GRASS ninja is not daunted at this juncture. Surely, the ninja trainee must concur, GRASS must have some facility for merging polygons. We simply tell GRASS to merge all of the adjacent shapes belonging to the same country in a new vector layer, and we are done.

If a ninja-in-training may be permitted one flaw, perhaps that flaw is overconfidence. After a bit of poking in the GRASS documentation, we were forced to conclude that GRASS had no such tool, per se. GRASS has a tool, v.extractwhich extracts vector features into a new layer, possibly based on an SQL WHERE query against the attribute tableand even has an option to "dissolve" the borders of adjacent polygons in the process. GRASS also has a tool called v.patch, which will patch multiple vector layers into a single one. The problem is that v.extract must be run separately for each of the 251 countries or country-like territories in the admin layer and then v.patch must patch each country into a new international borders layer. Worse, the attribute data that identifies each country by name and FIPS code must somehow be preserved in the process. To attempt the task by hand would be tedious even to contemplate.

However, you may have gathered that the ninja is second only to MacGyver in sheer resourcefulness. If there is a means at hand, the ninja must find some way to exploit it. First, we need a list of the countries and assorted territories involved. The FIPS code will suffice for this purpose:

We're obliged to use the hoary Unix sort and uniq commands to extract the list of unique countries, because GRASS's rudimentary SQL parser doesn't support DISTINCT. Now, for each country in fips.list, we must v.extract it out of admin, and v.patch it into our new layer, which we will call borders. We could feed the list of countries to a bit of Perl and have the Perl script generate the necessary GRASS commands for us. That bit of Perl might look as follows:

The -d option to v.extract instructs it to dissolve internal borders between shapes. v.extract then writes the shapes for each country to a layer called tmp. In the process, the shapes are assigned a new category value equal to $., which is Perl for "the current line number of the input file." Recall that fips.list is alphabetized. As a result, all the shapes from the first country from our fips.list file (Aruba) will receive the category #1 in the new layer, the second country (Antigua and Barbuda) #2, the third (Afghanistan) #3, and so on. This will turn out to be important later.

Next, the -a option to v.patch tells it to append the input layer to the output layer rather than overwriting the output layer, which is the default. So each call to v.patch will successively add each country to the borders layer, until they're all in there. We use this wodge of Perl as follows, and then run the script it generates:

And then we sit back and wait, while GRASS extracts, merges, and patches every polygon belonging to each of the world's 251 countries or country-like territories. Fortunately, waiting comes easy to the would-be ninja. The would-be ninja is not even fazed by the irritating console beep emitted by v.extract each time it runs, warning that the tmp layer is being overwritten. (The true would-be ninja would have thought to silence his handiwork by adding a printqq/g.removevect=tmp
/ to the top of the loop in gen_extract.pl, but never mind that.)

Some minutes later, GRASS completes its labors without complaint; however d.vectborderstype=area fails to reveal the goods. Any landmass completely within a single countrysuch as the British Isleshows up, but any landmass spanning more than one countrylike, say, mainland North Americahas simply disappeared. However, d.vectborderstype=boundary shows that the vectors are all there. What gives?

It turns out that, had the ninja trainee paid close attention, as every ninja must, he would have noted the following warning emitted by each execution of v.patch:

Patch complete. 1 files patched.
Intersections at borders will have to be snapped.
Lines common between files will have to be edited.
The header information also may have to be edited.

A message like this from GRASS is a sure sign that the vector layer in question needs a run through v.clean to fix what ails it and reassure GRASS that the polygons contained therein really are polygons. The following does the trick:

v.clean needs a few minutes to work its magic (this is 13 megs of vector data, after all) and it won't overwrite the original layer, so we have to create a new layer called borders2. A quick call to d.vect reveals, as shown in Figure 6-53, that the borders2 layer does indeed contain a vector map of the world's international boundaries.

Figure 6-53. Part of a vector map of national boundaries successfully extracted from the first-level administrative boundaries layer

Having verified for ourselves that this is indeed the case, we can engage in a bit of cleanup work, as a proper ninja always covers her tracks:

The answer is that no attribute data made it through. In fact, GRASS is saying that the borders layer doesn't even have an attribute table. It is as though the ninjas have successfully infiltrated the tower, only to find that the villain has absconded with the princess. Honor demands that we give chase!

6.16.3. Objective #3: Recover the Missing Plans

Fortunately, if you'll recall, in our current set up, the vector-layer attribute tables are just DBase .dbf files stored in the filesystem. It would be awful nice if we could just pipe a CREATE TABLE command to db.execute to create the data table for the borders layer, but GRASS's DBF driver isn't nearly that cooperative. So we have to resort to some other means of creating a .dbf file.

The admin layer already has all the data we need, and, if you'll recall, we took care to assign category numbers to each country that correspond to its place in an alphabetized list of FIPS country codes. We can generate a pipe character (|) delimited text file containing all the attribute data we need for our borders layer with a single line of shell, as follows:

Now, that's one heck of a shell command, but let's parse it bit by bit. The db.select command will dump the FIPS country code and the country name of each feature, separated by a pipe, in the admin layer. When the list goes through sort and then uniq, we will get a list of unique country codes and names, sorted alphabetically by code. You may recognize our old friend $. in the subsequent perl command, which simply prints out each line of input with the current line number plus a pipe character added to the front of the line. The borders.txt file ends up looking like this:

1|AA|Aruba
2|AC|Antigua and Barbuda
3|AF|Afghanistan
4|AG|Algeria
...

All it needs is a header line, which we can add by loading it up in our favorite text editor, and making it look like this:

The upshot is that the category numbers that we gave each country's features in the extraction step are now matched up to that country's name and FIPS code in the current step. GRASS likes the column containing the category numbers to be called CAT, but that's not a problem at all. The only step that remains is to get this attribute data into an actual .dbf file and then tell GRASS about it.

There are a bunch of options out there, butremember what we said about ninjas and resourcefulnessthe easiest one turned out to be a bloody spreadsheet program, because a spreadsheet is all a .dbf file is, really, and all modern spreadsheet programs know how to read and write them. You could very easily rely on OpenOffice for this, but we just used Microsoft Excel because it happened to be on the machine we were using at the time. Load borders.txt, set the delimiter to the pipe character, import all columns, and then save in DBase IV format to borders.dbf. We had to do this twice, because DBF is a fixed-width format, and MS Excel decided to chop each column at the width that it defaulted to on import (i.e., the width of the column name), which truncated some of the country names. The fix was to select each column in Excel in turn, go to Format images/ent/U2192.GIF border=0> Column images/ent/U2192.GIF border=0> AutoFit Selection, and then save the whole thing in .dbf again. No problem. No doubt if we had used more suitable tools for dealing with DBFsuch as, say, Perl's DBD::XBasethis wouldn't have been an issue.

Next, we need to put the file where GRASS can find it, which is usually $GISDBASE/$LOCATION_NAME/$MAPSET/dbf/. Our database directory is /home/sderle/grass, our current location is called global, and our current mapset sderle, so, by default, GRASS looks for the DBF file in /home/sderle/grass/global/sderle/dbf/, although we could put it somewhere else if we really wanted to. We choose to go with the default, copy borders.dbf into that directory, and then inform GRASS using v.db.connect:

GRASS:~/hacks/world > v.db.connect borders driver=dbf table=borders
WARNING: The table is now part of vector map and may be
deleted or overwritten by GRASS modules.

Our work here is done. v.info -c borders gives us the output we expect:

Ordinarily, we might verify that the attribute data was correctly assigned by running d.what.vect and clicking around on the X display monitor. Another, somewhat more roundabout, way of testing this would be to pick any country (say, Spain), zoom in on that country, use d.vect to display only that country, and see what comes out:

If all of Spain, and only Spain, is what shows up, then we can be more or less confident that the attribute data has come through correctly. Figure 6-54 suggests that this is indeed the case. The princess is saved, the villain in ruinsin other words, success!

Figure 6-54. Spain, by way of testing the attribute data from our vector map of the world

6.16.4. The Path to Ninjahood

The path to becoming a true ninja of any kind is fraught with danger and pitfalls. Following that path demands flexibility, ingenuity, and clarity of mind. When it comes to mastering GRASS, one must always bear in mind the Unix tools philosophy: the idea that an operating system, or a GIS, is best made up of a myriad of small, interconnectable tools, each of which should do one thing very well. GRASS's integration with the Unix shell environmentand all of the tools that come with itmeans that, in the hands of a capable ninja, GRASS is the hacker's GIS, and easily the equal of any fancy, flashy, graphical GIS environment you could care to stack it up against. Remember the example of the ninja, and you too will in short order be using GRASS to make neat work of your cartographic and geographic problems.