My Home on the Internet

Main menu

Post navigation

Or…. how to spend a day hitting your head into your desk.

Or…. machine learning is easy, right? 🙂

For a while now I have been wanting to upgrade my video card in my desktop so I could actually use it to do machine/deep learning tasks (ML). Since I put together a Frankenstein gaming computer for my daughters out of some older parts, I finally justified getting a new card by saying I would then give them my older nVidia card. After a lot of research, I decided the RTX 2060 was a good balance of how much money I felt like spending versus something that would actually be useful (plus the series comes with dedicated Tensor Cores that work really fast with fp16 data types).

So after buying the card and installing it, the first thing I wanted to do was to get Tensorflow 2.1 to work with it. Now, I already had CUDA 10.2 and the most up-to-date version of TensorRT installed, and knew that I’d have to custom compile Tensorflow’s pip version to work on my system. What I did not know was just how annoying this would turn out to be. My first attempt was to follow the instructions from the Tensorflow web site, including applying this patch that fixes the nccl bindings to work with CUDA 10.2.

However, all of my attempts failed. I had random compiler errors crop up during the build that I have never had before and could not explain. Tried building it with Clang. Tried different versions of GCC. Considered building with a Catholic priest present to keep the demons at bay. No dice. Never could complete a build successfully on my system. This was a bit to be expected since a lot of people online have trouble getting Tensorflow 2.1 and CUDA 10.2 to play nice together.

I finally broke down and downgraded CUDA on my system to 10.1. I also downgraded TensorRT so it would be compatible with the version of CUDA I now had. Finally I could do a pip install tensorflow-gpu inside my virtual environment and and it worked and it ran and I could finally run my training on my GPU with fantastic results.

Almost.

I kept getting CUDNN_STATUS_INTERNAL_ERROR messages every time I tried to run a Keras application on the GPU. Yay. After some Googling, I found this link and apparently there’s an issue with Tensorflow and the RTX line. To fix it, you have to add this to your Python code that uses Keras/Tensorflow:

FINALLY! After several days of trying to custom compile Tensorflow for my system, giving up, downgrading so I could install via pip, running into more errors, etc, I now have a working GPU-accelerated Tensorflow! As an example, running the simple additionrnn.py example from Keras, I went from taking around 3.5 seconds per epoch on my Ryzen 7 2700X processor (where I had compiled Tensorflow for CPU only to take advantage of the additional CPU instructions) to taking under 0.5 seconds on the GPU. I’m still experimenting, and modifying some things to use fp16 so I can take advantage of the Tensor Cores in the GPU.

In part one I discussed how I go about finding and downloading maps from the USGS Historic Topomap site. Now I will show how I go through the images I downloaded in Part 1 and determine which ones to keep to make a merged state map. I will also show some things you may run into while making such a map.

Now that the maps are all downloaded, it is time to go through and examine each one to determine what to keep and what to digitally “throw out.” When you download all the historic maps of a certain scale for a state, you will find that each geographic area may have multiple versions that cover it. You will also find that there are some specially made maps that go against the standard quadrangle area and naming convention. The easiest way for me to handle this is to load and examine all the maps inside QGIS.

Using QGIS to Check an Image

I load the maps that cover a quadrangle and overlay them on top of something like Google Maps. For my purposes, I usually try pick maps with the following characteristics:

Oldest to cover an area.

Good visual quality (easy to read, paper map does was not ripped, etc)

Good georeferencing to existing features

QGIS makes it easy to look at all the maps that cover an area. I will typically change the opacity of a layer and see how features such as rivers match existing ones. You will be hard-pressed to find an exact match as some scales are too coarse and these old maps will never match the precision of modern digital ones made from GPS. I also make sure that the map is not too dark and that the text is easily readable.

One thing you will notice with the maps is that names change over time. An example of this is below, where in one map a feature is called Bullock’s Neck and in another it is Bullitt Neck.

Feature Named Bullock’s Neck

Feature Named Bullitt Neck

Another thing you will find with these maps is that the same features are not always in the same spots. Consider the next three images here that cover the same area.

Geographic Area to Check for Registration

First Historic Map to Check Registration

Second Map to Check Registration

If you look closely, you will see that the land features of the map seem to move down between the second and third images. This happens due to how the maps were printed “back in the day.” The maps were broken down into separates where each separate (or plate) contained features of the same color. One contained roads, text, and houses, while another had features such as forests. These separates had stud holes in them so they could be held in place during the printing process. Each separate was inked and a piece of paper was run over each one. Over time these stud holes would get worn so the one or more would move around during printing. Additionally, maps back then were somewhat “works of art,” and could differ between who did the inscribing. Finally, depending on scale and quality of the map, the algorithms to georeference the scanned images can result in rubber sheeting that can further change things.

During my processing, one of the things I use QGIS for is to check which maps register better against modern features. It takes a while using this method but in the end I am typically much happier about the final quality than if I just picked one from each batch at random.

Another thing to check with the historic maps is coverage. Sometimes the map may say it covers a part of the state when it does not.

Map Showing No Actual Virginia Coverage

Here the map showed up in the results list for Virginia, but you can see that the Virginia portion is blank and it actually only contains map information for Maryland.

Finally, you may well find that you do not have images that cover the entire state you are interested in. If you group things by scale and year, the USGS may no longer have the original topomaps for some areas. It could also be that no maps were actually produced for those areas.

Once the images are all selected, the images need to be merged into a single map for an individual state. For my setup I have found that things are easier and faster if I merge them into a GeoTIFF with multiple zoom layers as opposed to storing in PostGIS.

Here I will assume there is a directory of 250K scale files that cover Virginia and that these files have been sorted and the best selected. The first part of this is to merge the individual files into a single file with the command:

gdal_merge.py -o va_250k.tif *.tif

This command may take some time based on various factors. Once finished, the next part is to compress and tile the image. Tiling breaks the image into various parts that can be separately accessed and displayed without having to process the rest of the image. Compression can make a huge difference in file sizes. I did some experimenting and found that a JPEG compression quality of eighty strikes a good balance between being visually pleasing and reducing file size.

Finally, GeoTIFFs can have reduced-resolution overview layers added to them. The TIFF format supports multiple pages in a file as one of the original uses was to store faxes. A GIS such as QGIS can recognize when a file has overlay views added and will use them first based on how far the user has zoomed. These views usually have much fewer data than the full file and can be quickly accessed and displayed.

With the above command, GDAL will add an overview that is roughly half sized, quarter sized, and so on.

In the end, with tiling and compression, my 250K scale merged map of Virginia comes in at 520 megabytes. QGIS recognizes that the multiple TIFF pages are the various overviews and over my home network loading and zooming is nearly instantaneous. Hopefully these posts will help you to create your own mosaics of historic or even more modern maps.

My first professional job during and after college was working at the US Geological Survey as a software engineer and researcher. My job required me to learn about GIS and cartography, as I would do things from writing production systems to researching distributed processing. It gave me an appreciation of cartography and of geospatial data. I especially liked topographic maps as they showed features such as caves and other interesting items on the landscape.

Recently, I had a reason to go back and recreate my mosaics of some Historic USGS Topomaps. I had originally put them into a PostGIS raster database, but over time realized that tools like QGIS and PostGIS raster can be extremely touchy when used together. Even after multiple iterations of trying out various overview levels and constraints, I still had issues with QGIS crashing or performing very slowly. I thought I would share my workflow in taking these maps, mosaicing them, and finally optimizing them for loading into a GIS application such as QGIS. Note that I use Linux and leave how to install the prerequisite software as an exercise for the reader.

As a refresher, the USGS has been scanning in old topographic maps and has made them freely available in GeoPDF format here. These maps are available at various scales and go back to the late 1800s. Looking at them shows the progression of the early days of USGS map making to the more modern maps that served as the basis of the USGS DRG program. As some of these maps are over one-hundred years old, the quality of the maps in the GeoPDF files can vary widely. Some can be hard to make out due to the yellowing of the paper, while others have tears and pieces missing.

Historically, the topographic maps were printed using multiple techniques from offset lithographic printing to Mylar separates. People used to etch these separates over light tables back in the map factory days. Each separate would represent certain parts of the map, such as the black features, green features, and so on. While at the USGS, many of my coworkers still had their old tool kits they used before moving to digital. You can find a PDF here that talks about the separates and how they were printed. This method of printing will actually be important later on in this series when I describe why some maps look a certain way.

Process

There are a few different ways to start out downloading USGS historic maps. My preferred method is to start at the USGS Historic Topomaps site.

USGS Historic Maps Search

It is not quite as fancy a web interface as the others, but it makes it easier to load the search results into Pandas later to filter and download. For my case, I was working on the state of Virginia, so I selected Virginia with a scale of 250,000 and Historical in the Map Type option. I purposely left Map Name empty and will demonstrate why later.

Topo Map Search

Once you click submit, you will see your list of results. They are presented in a grid view with metadata about each map that fits the search criteria. In this example case, there are eighty-nine results for 250K scale historic maps. The reason I selected this version of the search is that you can download the search results in a CSV format by clicking in the upper-right corner of the grid.

Topo Map Search Results

After clicking Download to Excel (csv) File, your browser will download a file called topomaps.csv. You can open it and see that there is quite a bit of metadata about each map.

Topo Map CSV Results

If you scroll to the right, you will find the column we are interested in called Download GeoPDF. This column contains the download URL for each file in the search results.

Highlighted CSV Column

For the next step, I rely on Pandas. If you have not heard of it, Pandas is an awesome Python data-analysis library that, among a long list of features, lets you load and manipulate a CSV easily. I usually load it using ipython using the commands in bold below.

Finally, as there are usually multiple GeoPDF files that cover the same area, I download all of them so that I can go through and pick the best ones for my purposes. I try to find maps that are around the same data, are easily viewable, are not missing sections, and so on. To do this, I use the wget command and use the text file I created as input like so.

Eventually wget will download all the files to the same directory as the text file. In the next installment, I will continue my workflow as I produce mosaic state maps using the historic topographic GeoPDFs.

At my day job I am working on some natural language processing and need to generate a list of place names so I can further train the excellent spacy library. I previously imported the full Planet OSM so went there to pull a list of places. However, the place names in OSM are typically in the language of the person who did the collection, so they can be anything from English to Arabic. I stored the OSM data using imposm3 and included a PostgreSQLhstore column to store all of the user tags so we would not lose any data. I did a search for all tags that had values like name and en in them and exported those keys and values to several CSV files based on the points, lines, and polygons tables. I thought I would write a quick post to show how easy it can be to manipulate data outside of traditional spreadsheet software.

The next thing I needed to do was some data reduction, so I went to my go-to library of Pandas. If you have been living under a rock and have not heard of it, Pandas is an exceptional data processing library that allows you to easily manipulate data from Python. In this case, I knew some of my data rows were empty and that I would have duplicates due to how things get named in OSM. Pandas makes cleaning data incredibly easy in this case.

First I needed to load the files into Pandas to being cleaning things up. My personal preference for a Python interpreter is ipython/jupyter in a console window. To do this I ran ipython and then imported Pandas by doing the following:

In [1]: import pandas as pd

Next I needed to load up the CSV into Pandas to start manipulating the data.

In [2]: df = pd.read_csv('osm_place_lines.csv', low_memory=False)

At this point, I could examine how many columns and rows I have by running:

In [3]: df.shape
Out[3]: (611092, 20)

Here we can see that I have 611,092 rows and 20 columns. My original query pulled a lot of columns because I wanted to try to capture as many pre-translated English names as I could. To see what all of the column names are, I just had to run:

The first task I then wanted to do was drop any rows that had no values in them. In Pandas, empty cells default to the NaN value. So to drop all the empty rows, I just had to run:

In [4]: df = df.dropna(how='all')

To see how many rows fell out, I again checked the shape of the data.

In [5]: df.shape
Out[5]: (259564, 20)

Here we can see that the CSV had 351,528 empty rows where the line had no name or English name translations.

Next, I assumed that I had some duplicates in the data. Some things in OSM get generic names, so these can be filtered out since I only want the first row from each duplicate. With no options, drop_duplicates() in Pandas only keeps the first value.

In [6]: df = df.drop_duplicates()

Checking the shape again, I can see that I had 68,131 rows of duplicated data.

In [7]: df.shape
Out[7]: (191433, 20)

At this point I was interested in how many cells in each row still contained no data. The CSV was already sparse since I converted each hstore key into a separate column in my output. To do this, I ran:

Here we can see the sparseness of the data. Considering I am now down to 191,433 columns, some of the columns only have a single entry in them. We can also see that I am probably not going to have a lot of English translations to work with.

At this point I wanted to save the modified dataset so I would not loose it. This was a simple

In [8]: df.to_csv('osm_place_lines_nonull.csv', index=False)

The index=False option tells Pandas to not output its internal index field to the CSV.

Now I was curious what things looked like, so I decided to check out the name column. First I increased some default values in Pandas because I did not want it to abbreviate rows or columns.

As I’ve posted before, the download from the USGS Geonames site has some problems. The feature_id column should be unique, but it is not because the file contains some of the same feature names in Mexico and Canada with the same id which breaks the unique constraint.

I just added a very quick and dirty python program to my misc_gis_scripts repo on Github. It’s run by

python3 gnisfixer.py downloadedgnisfile newgnisfile

For the latest download, it removed over 7,000 non-US or US Territory entries and into the DB perfectly.

Since I needed it as part of my job, I finally got around to finishing up the Geonames scripts in my github repository misc_gis_scripts. In the geonames subdirectory is a bash script called dogeonames.sh. Create a PostGIS database, edit the bash file, and then run it and it will download and populate your Geonames database for you.

Note that I’m not using the alternatenamesv2 file that they’re distributing now. I checked with a hex editor and they’re not actually including all fields on each line, and Postgres will not import a file unless each column is there. I’ll probably add in a Python file to fix it at some point but not now 🙂

I just uploaded all of the 2017 US Census Tiger datasets where I’ve converted them from county-based to state-based ShapeFiles. You can find them in my GIS Data section. Let me know of any errors you come across. Note I haven’t done all of the Census data, just the ones I regularly use.

A while back I picked up a Raspberry Pi 3 and turned it into a NAS and LAMP stack server (Apache, PostgreSQL, Mysql, PostGIS, and so on). Later I came across forums mentioning a new entry from Asus into this space called the Tinkerboard. Now I’m not going to go into an in-depth review since you can find those all over the Internet. However, I do want to mention a few things I’ve found and done that are very helpful. I like the board since it’s supports things like OpenCL and pound for pound is more powerful than the Pi 3. The two gig of RAM vs one with the Pi 3 makes it useful for more advanced processing.

One thing to keep in mind is that the board is still “new” and has a growing community. As such there are going to be some pains, such as not having as big a community as the Pi ecosystem. But things do appear to be getting better, and so far it’s proven to be more capable and, in some cases, more stable than my Pi 3.

So without much fanfare, here are my list of tips for using the Tinkerboard. You can find a lot more information online.

Community – The Tinkerboard has a growing community of developers. My favorite forums are at the site run by Currys PC World. They’re active and you can find a lot of valuable information there.

Package Management – Never, EVER, run apt-get dist-upgrade. Since it’s Debian, the usual apt-get update and apt-get upgrade are available. However, running dist-upgrade can cause you to loose hardware acceleration.

OpenCL – One nice think about the Tinkerboard is that the Mali GPU has support for hardware-accelerated OpenCL. TinkerOS an incorrectly named directory in /etc/OpenCL which causes apps to not work by default. The quick fix is to change to /etc/OpenCL and run ln -s venders vendors. After doing this, tools like clinfo should properly pick up support.

Driver updates – Asus is active on Github. At their rk-rootfs-build site there you can find updated drivers as they’re released. I recommend checking this site from time to time and downloading updated packages are they are released.

Case – The Tinkerboard is the same size and mostly the same form-factor as the Raspberry Pi 3. I highly recommend you pick up a case with a built-in cooling fan since the board can get warm, even with the included heat sinks attached.

You can follow this link and install Tensorflow for the Pi on the Tinkerboard. It’s currently not up-to-date, but much less annoying than building Tensorflow from scratch.

SD Card – You would do well to follow my previous post about how to zero out a SD card before you format and install TinkerOS to it. This will save you a lot of time and pain. I will note that so far, my Tinkerboard holds up under heavy IO better than my Pi 3 does. I can do thinks like make -j 5 and it doesn’t lock up or corrupt the card.