Extract free and open source data from OpenStreetMap

Open the Overpass Turbo website and, on the map, search for the city from which you want to extract data. (The Overpass query will be generated in such a way that it’ll only search for data in the current map view.)

Click the “Wizard” button in the top toolbar. (Alternatively you can copy the code below and paste it into the text area on the website and click the “Run” button.)

In the Wizard dialog box, type in “railway=subway” in order to find metro, subway, or rapid transit lines. (If you want to download interstate highways, or what they call motorways in the UK, use “highway=motorway“.) Then click the “build and run query” button.

In a few seconds you’ll see lines and dots (representing the metro or subway stations) on the map, and a new query in the text area. Notice that the query has looked for three kinds of objects: node (points/stations), way (the subway tracks), relation (the subway routes).

If you don’t want a particular kind of object, then delete its line from the query and click the “Run” button. (You probably don’t want relation if you’re just needing GIS data for mapping purposes, and because routes are not always well-defined by OpenStreetMap contributors.)

Download the data by clicking the “Export” button. Choose from one of the first three options (GeoJSON, GPX, KML). If you’re going to use a desktop GIS software, or place this data in a web map (like Leaflet), then choose GeoJSON. Now, depending on what browser you’re using, a couple things could happen after you click on GeoJSON. If you’re using Chrome then clicking it will download a file. If you’re using Safari then clicking it will open a new tab and put the GeoJSON text in there. Copy and paste this text into TextEdit and save the file as “mexico_city_subway.geojson”.

Screenshot 1: After searching for the city for which you want to extract data (Mexico City in this case), click the “Wizard” button and type “railway=subway” and click run.

Screenshot 2: After building and running the query from the Wizard you’ll see subway lines and stations.

Screenshot 3: Click the Export button and click GeoJSON. In Chrome, a file will download. In Safari, a new tab with the GeoJSON text will open (copy and paste this into TextEdit and save it as “mexico_city_subway.geojson”).

Convert the free and open source data into a shapefile

After you’ve downloaded (via Chrome) or re-saved (Safari) a GeoJSON file of subway data from OpenStreetMap, open QGIS, the free and open source GIS desktop application for Linux, Windows, and Mac.

In QGIS, add the GeoJSON file to the table of contents by either dragging the file in from the Finder (Mac) or Explorer (Windows), or by clicking File>Open and browsing and selecting the file.

Convert it to GeoJSON by right-clicking on the layer in the table of contents and clicking “Save As…”

In the “Save As…” dialog box choose “ESRI Shapefile” from the dropdown menu. Then click “Browse” to find a place to save this file, check “Add saved file to map”, and click the “OK” button.

A new layer will appear in your table of contents. In the map this new layer will be layered directly above your GeoJSON data.

Screenshot 4: The GeoJSON file exported from Overpass Turbo has now been loaded into the QGIS table of contents.

Screenshot 5: In QGIS, right-click the layer, select “Save As…” and set the dialog box to have these settings before clicking OK.

Query for finding subways in your current Overpass Turbo map view

/*
This has been generated by the overpass-turbo wizard.
The original search was:
“railway=subway”
*/
[out:json][timeout:25];
// gather results
(
// query part for: “railway=subway”
node["railway"="subway"]({{bbox}});
way["railway"="subway"]({{bbox}});
relation["railway"="subway"]({{bbox}});/*relation is for "routes", which are not always
well-defined, so I would ignore it*/
);
// print results
out body;
>;
out skel qt;

I’m leading the development of a website for Slow Roll Chicago that shows the distribution of bike lane infrastructure in Chicago relative to key and specific demographics to demonstrate if the investment has been equitable.

All of our research and code should be public and open source so it’s clear how we made our assumptions and came to our conclusions (“show your work”).

Using git, GitHub, and version control is a desirable skill and more people should learn it; this project will help people apply that skill.

There are no emails involved. I deplore using email for group communication.*

The website focuses on using empirical research, maps, geographic analysis to tell the story of bike lane distribution and requires processing this data using GIS functions. Normally the data would be transformed in a desktop GIS software like QGIS and then converted to a format that can be used in Leaflet, an open source web mapping library.

Relying on desktop software, though, slows down development of new ways to slice and dice geographic data, which, in our map, includes bike lanes, wards, Census tracts, Divvy stations, and grocery stores (so far). One would have to generate a new dataset if our goals or needs changed .

I’ve built maps for images and the web that way enough in the past and I wanted to move away from that method for this project and we’re using Turf.js to replicate many GIS functions – but in the browser.

Yep, Turf makes it possible to merge, buffer, contain, calculate distance, transform, dissolve, and perform dozens of other functions all within the browser, “on the fly”, without any software.

After dilly-dallying in Turf for several weeks, our group started making progress this month. We have now pushed to our in-progress website a map with three features made possible by Turf:

Buffer and dissolving buffers to show the Divvy station walk shed, the distance a reasonable person would walk from their home or office to check out a Divvy station. A buffer of 0.25 miles (two Chicago blocks) is created around each of the 300 Divvy stations, hidden from display, and then merged (dissolved in traditional GIS parlance) into a single buffer. The single buffer –called a “super buffer” in our source code – is used for another feature. Currently the projection is messed up and you see ellipsoid shapes instead of circles.

Counting grocery stores in the Divvy station walk shed. We use the “feature collection” function to convert the super buffer into an object that the “within” function can use to compare to a GeoJSON object of grocery stores. This process is similar to the “select by location” function in GIS software. Right now this number is printed only to the console as we look for the best way to display stats like this to the user. A future version of the map could allow the user to change the 0.25 miles distance to an arbitrary distance they prefer.

Find the nearest Divvy station from any place on the map. Using Turf’s “nearest” function and the Context Menu plugin for Leaflet, the user can right-click anywhere on the map and choose “Find nearby Divvy stations”. The “nearest” function compares the place where the user clicked against the GeoJSON object of Divvy stations to select the nearest one. The problem of locating 2+ nearby Divvy stations remains. The original issue asked to find the number of Divvy stations near the point; we’ll likely accomplish this by drawing an invisible, temporary buffer around the point and then using “within” to count the number of stations inside that buffer and then destroy the buffer.

* I send one email to new people who join us at Open Gov Hack Night on Tuesdays at the Mart to send them a link to our GitHub repository, and to invite them to a Dropbox folder to share large files for those who don’t learn to use git for file management.

A man looks at one of the first five stations installed, at State and Randolph (but the board says State and Lake).

Navigation maps on Divvy bike sharing stations will be placed at 400 locations around the city. A map this pervasive, to be read and interpreted by hundreds of thousands of locals and visitors to Chicago (including people who will never use Divvy), should have a design that communicates good routes to ride, and important places like train stations, nearby Divvy stations, points of interest, and where to find places to eat or be entertained.

The design of the maps on the station boards needs to be improved. The first issue I noticed in June is that streets and alleys are given equal significance in their symbology, possibly confusing people on which route to take. The map should strip alleys, offering room for more info on the map, like useful destinations. It may be easier for some to locate the Art Institute of Chicago as a labeled, light-gray block instead of trying to locate its address on the map (nigh impossible). When one locates the destination, one can more easily locate the nearest Divvy station.

The map at North/Clybourn’s station (actually on Dayton Street) covers a large portion of the map with the “you are here” label and lacks the connection between North Avenue and Goose Island.

I’ve noticed that the “you are here” labels cover up train station markers/labels, and the loop elevated tracks are missing (a common reference point for Chicago). It takes a moment to realize that the white text is labeling the CTA stations and not the nearby Divvy stations. It’s unclear where “you are here” points to, until you realize that it’s at the center of the blue 5-minute walking circle. Dearborn Street is symbolized as a bike lane, but not labeled as a street. Clark Street and State Street are doubly wide, but the meaning of that is unknown. The legend is useful to distinguish bike lane types but is placed far from the map, at the bottom of the board.

Here are other areas where the boards and maps should be redesigned:

The “service area” map has low utility in its current form as it’s not labeled with streets, points of interest, or a time or distance scale. It appears as a reduced-boundary blob of Chicago. It could be improved if it communicated “this is where you can go if you take Divvy” and label streets, train stations, and points of interest at the edge of the service area.

The 5-minute bike ride map is nearly identical to the 5-minute walk map, but smaller. The 5-minute walk map should be made larger and integrate the now-eliminated 5-minute bike ride map.

Much of the text is unnecessarily large. The CTA station labels are so large in comparison to the streets that it’s not clear where on the block the stations are located. CTA stations are labeled but the train routes aren’t always shown (Loop stops are just gray); it’s not even clear that they’re CTA stops.

The purpose of the blue circle isn’t labeled or clear: the larger map, titled “5 minute walk”, shows a large map but there’s a blue circle – is the blue circle or the square map the 5-minute edge? The connection between the title and the blue circle could be tightened by using the same color for the text and the circle or by wrapping the text around the circle path.

The map, which is likely to serve as a neighborhood “get around” and discovery map for tourists, and even locals, lacks basic info: there are absolutely no destinations marked, no museums, parks, etc.

The bike lane symbology doesn’t match the Chicago Bike Map, which uses blue, purple, orange, and red to denote different bike lane types, and hasn’t used green for at least seven years. The use of green makes them look like narrow parks.

The map designers should consider placing the city’s cardinal grid numbering system to enable readers to find an address.

North/Clybourn’s Divvy station map lacks a bikeable connection from North Avenue to Goose Island via the Cherry Avenue multi-modal bridge. The maps should be reviewed for street network accuracy by people who live and ride nearby.

Photo shows the original board and map at the Milwaukee/Wood/Wolcott station, which has since moved. The station on this map marked at Marshfield/North was moved to Wood/North this week.

There are many opportunities for the map to change because they will have to be updated when stations are moved, for both the moved station and the handful of station boards that include the moved station. At least four boards needed to be updated when the station at Milwaukee/Wood/Wolcott moved from Milwaukee Avenue (next to Walgreens) to Wood Street (across from the Beachwood). The maps for Citibike in New York City don’t share these design flaws.

N.B. More trips are currently taken by tourists and people with 24-hour memberships than people with annual memberships. I question the bikeway symbology and suggest that the streets have three symbols: one representing a bike lane (of any kind), one representing sharrows (because they are legally different from bike lanes) and one representing a street with no marked bikeways. The current bikeway symbology may not be understandable by many visitors (or even understood by locals because of differing definitions) and show a jumble of green hues whose meanings are not clear or even useful. It’s not currently possible to take a route on a bicycle that uses only protected bike lanes, or uses protected bike lanes and buffered bike lanes, so the utility of this map as a route building tool is weak. One wastes their time looking at this map in the attempt to construct a route which uses the darkest green-hued streets.

I also recommend that the board and map designers give Divvy CycleFinder app messaging greater prominence. I believe that a majority of users will be searching app stores for appropriate apps. When you search for Divvy, you’ll find eight apps, including my own Chicago Bike Guide.

Updated 22:43 to clarify my critique and make more specific suggestions for changes.

All shapefiles are from the United States Department of Transportation, Bureau of Transportation Statistics’s National Transportation Atlas 2012 edition except for Illinois places, which comes from the Census Bureau’s TIGER project.

At the end of this tutorial, you’ll have a good introduction on how to find geographic data, build a map with TileMill, style the map, and publish it for the public. Your map will not look like mine as this tutorial doesn’t describe how to add labels or use the hover/info feature.

Tutorial to make Amtrak Illinois map

Unzip the four ZIP files you downloaded and move their contents into a folder, like /Documents/GIS/Amtrak Illinois/shapefiles. This is your project folder.

Install TileMill and open it.

Set up a project. In the Projects pane, click “New Project”. In the filename field, title it “amtrak_illinois”. Ensure that the checkbox next to “Default data” is checked – this shows a world map and helps you get your bearings (but it’s not absolutely necessary).

Get familiar with TileMill’s layout. Your new project will open with the map on the left side and your Carto style code on the right side. There are four buttons aligning the left edge of your map. From top to bottom they are: Templates, Font list, Carto guide, and Layers.

Add a layer. We’re going to add the four shapefile layers you downloaded. Click the “Layers” button and then click “Add layer”. In the ID field, type in “amtrak_routes”. For Datasource, browse to your project folder and find “amtrak.shp” – this file has the Amtrak route lines. Then click “Done”. Click “Save & Style”.

Style that layer. When you click “Save & Style” after adding a layer, your attention will be called to the Carto style code on the right side of TileMill. A section of code with the “amtrak_routes” #selector will have been inserted with some default colors and styles. If you know CSS, you will be familiar with how to change the Amtrak routes line styles. Change the “line-color” to “#000”. After “line-color”, add a new line and insert “line-opacity: 0.5;”. This will add some transparency to the line. Press the “Save” button above the code.

Hide bus stations. The Amtrak stations layer shows bus and ferry stations as part of Amtrak’s Thruway connections. You probably don’t want to show these. In your Carto style code, rename the #selector from “#amtrak_stations” to “#amtrak_stations[STNTYPE=’RAIL’]”. That makes the following style code only apply to stations with the “rail” type. Since there’s no style definition for things that aren’t of that type, they won’t appear.

Screenshot of my map.

Prepare your map for uploading

TileMill has many exporting options. You can save it as MBTiles and publish the map for free using MapBox (TileMill’s parent), or you can export it as image files (but it won’t be interactive), or you can display the map using the Leaflet JavaScript map library (which I use for the Chicago Bike Map app). This tutorial will explain how to export MBTiles and upload to MapBox, the server I’m using to display the map at the top of this page.

Change project settings. To upload to MapBox, you’ll have to export your project as MBTiles, a proprietary format. Click the “Export” button above your Carto style code and click “MBTiles”. You’ll be asked to provide a name, description, attribution, and version. Input appropriate text for all but version.

Adjust the zoom levels. Adjust the number of zoom levels you want (the more you have the longer it takes to export and upload your project, and you might exceed MapBox’s free 50 MB account limit). My map has zoom levels 8-11.

Adjust the bounds. You’ll then want to draw your bounds: how much of the map’s geographic extents you want to export. Zoom to a level where you can see the entire state of Illinois in your map. Hold down the Shift key and drag a box around the state, plus a buffer (so viewers don’t fall of your map when they pan to the edges).

Export your map. Click Export and watch the progress! On a four-year-old MacBook it took less than one minute to export the project.

Embed it in your website. Click the “Share” button in the upper left corner of your map and copy the embed code. Paste this into the HTML source code of a webpage (or in a WordPress post) and save that (I’m not going to provide instructions on how to do that).

Now you know how to find geographic data, build a custom map using the TileMill application, begin to understand how to style it, and embed your map for the public on a website or blog.

N.B. I was originally going to use QGIS to build a map and then publish a static image before I realized that TileMill + MapBox (the website) can build a map but publish an interactive feature instead of a static image. I’m happy I went that route. However, I did use QGIS to verify the data and even create a new shapefile of just a few of the key train stations on the Lincoln Service (the centerpiece of my Grid Chicago article).

Share this:

It is now possible to upload a shapefile (and its companion files SHX, PRJ, and DBF) to Google Fusion Tables (GFT).

Before we go any further, keep in mind that the application that does this will only process 100,000 rows. Additionally, GFT only gives each user 200 MB of storage (and they don’t tell you your current status, that I can see).

Login to your Google account (at Gmail, or at GFT).

Prepare your data. Ensure it has fewer than 100,000 rows.

ZIP up your dataX.shp, dataX.shx, dataX.prj, and dataX.dbf. Use WinZip for Windows, or for Mac, right-click the selection of files and select “Compress 4 items”.

Visit the Shape to Fusion website. You will have to authorize the web application to “grant access” to your GFT tables. It needs this access so that after the web application processes your data, it can insert it into GFT.

If you want a Centroid Geometry column or a Simplified Geometry column added, click “Advanced Options” and check their checkboxes – see notes below for an explanation.

Choose the file to upload and click Upload.

Leave the window open until it says it has processed all of the rows. It will report “Processed Y rows and inserted Y rows”. You will be given a link to the GFT the web application created.

Sample Data

If you’re looking to give this a try and see results quickly, try some sample data from the City of Chicago data portal:

Community Areas – 77 official community areas + 3 “out” areas to make 80 polygons.

I had trouble many times while using Shape to Fusion in that after I chose the file to upload and clicked Upload, I had to grant access to the web application again and start over (choose the file and click Upload a second time).

Centroid Geometry – This creates a column with the geographic coordinates of the centroid in a polygon. It lists it in the original projection system. So if your projection is in feet, the value will be in feet. This is a function that can easily be performed in free and open source QGIS, where you can also reproject files to get latitude and longitude values (in WGS84 project, EPSG 4326). The centroid value is surrounded in the field by KML syntax “<Point><coordinates>X,Y</coordinates></Point>”.

Simplified Geometry – A geometry column is automatically created by the web application (or GFT, I’m not sure). This function will create a simpler version of that geometry, with fewer lines and vertices. It also creates columns to list the vertices count for the simple and regular geometry columns.

Share this:

Let’s say you’re perusing the 309,425 crash reports for automobile crashes in Chicago from 2007 to 2009 and you want to know a few things quickly.

Like how many REAR END crashes there were in January 2007 that had more than 1 injury in the report. With Google Refine, you could do that in about 60 seconds. You just need to know which “facets” to setup.

By the way, there are 90 crash reports meeting those criteria. Look at the screenshot below for how to set that up.

Facets to choose to filter the data

Get your January facet

Add your 2007 facet

Select the collision type of “REAR END” facet

Choose to include all the reports where injury is greater than 1 (click “include” next to each number higher than 1)

After we do this, we can quickly create a map using another Google tool, Fusion Tables.

Make a map

Click Export… and select “Comma-separated value.” The file will download. (Make sure your latitude and longitude columns are called latitude and longitude instead of XCOORD and YCOORD or sometimes Fusion Tables will choke on the location and try to geocode your records, which is redundant.)