Maps, data visualization, earth science, and the environment

Menu

I just saw a trailer for the movie San Andreas. It looks preposterous but I love geology disaster movies, so I’ll probably see it. In the film, a series of earthquakes destroy California, culminating with a giant magnitude 9.5 quake. Fortunately the Rock is on scene to help save the day.

The largest earthquake ever recorded in real life struck central Chile on May 22, 1960. With a magnitude of 9.6 (some estimates say 9.5) this was a truly massive quake, more than twice as powerful as the next largest (Alaska 1964), and 500 times more powerful than the April 2015 Nepal quake. The seismic energy released by the 1960 Chile quake was equal to about 20,000 Hiroshima atomic bombs. Thousands were killed. It also triggered a tsumami that traveled 17,000 km across the Pacific Ocean and killed hundreds in Japan.

But I think the most striking thing about this quake is that it accounts for about 30% of the total seismic energy released on earth during the last 100 years. To illustrate this, I calculated the seismic moment (a measure of the energy released by an earthquake) of all earthquakes greater than magnitude 6 and plotted the global cumulative seismic moment over the last 100 years.

Click for interactive version

This plot clearly shows how the 1960 Chile quake (and to a lesser extent the 1964 Alaska event) dominates the last 100 years in terms of total energy released. This is not always obvious as the earthquake magnitude scale is logarithmic. So a magnitude 9.6 releases twice as much energy as a 9.4 and 250 times as much as an 8.0.

Technical notes: To make this plot I downloaded from the USGS archive data on all the earthquakes greater than magnitude 6 from 1915-2015. There are about 10,500 of them.

I calculated the seismic moment for each quake relative to a magnitude 6 (the smallest in the database) using

Where m1 is the magnitude of each quake and m2 = 6.

So a mag 9.6 is about 250,000 times more powerful than a mag 6.0. (Note that this refers to energy released, not necessarily ground shaking, which is influenced by many factors, such as earthquake depth).

Then I summed all the relative moments, normalized to 1, and plotted the cumulative seismic moment over the time period.

A few caveats. First, the quality of the magnitude measurements has improved over time, so that the data from the earlier part of the 20th century is not as reliable as the more current data.

Second, this analysis only looks at earthquakes larger than magnitude 6.0. Of course there are many, many smaller earthquakes. However, the cumulative amount of seismic energy released by these smaller quakes is very small compared to the larger ones (again, remember the logarithmic scale).

Third, the magnitudes listed in the USGS archive are calculated in different ways. The majority are moment magnitude or weighted moment magnitude. The equation above is meant for these types of magnitude. Other magnitude measurements, such as surface wave magnitude, have slightly different ways of calculating total energy release. This may introduce some inaccuracies, However, they will be small compared relative to total energy release.

If any seismologists would like to weigh in, I would be most grateful.

More information on calculating magnitude and seismic moment here and here.

Ah, Switzerland. Land of fondue, chocolate, and neutrality. If you want to learn more about this unique little country, maps are a great way to start. Not only does Switzerland have fascinating geography, but it also has a long and storied tradition of cartography and design.

1. Where in the world?

But you already knew that, right? And you also knew that the capital is Bern, not Zurich or Geneva. Switzerland is not very big. It’s the world’s 135th largest country. Four U.S. counties are larger. But what it lacks in size it makes up in other ways. For example, The Economistranked it the best country in the world to be born in.

2. Confoederatio Helvetica

Switzerland is made up of 26 cantons, many of which were established as sovereign states hundreds of years ago. Then, in 1848, with the establishment of the Swiss Constitution, the cantons joined together to form the Swiss Confederation, or in Latin, Confoederatio Helvetica. That’s where the abbreviation “CH” comes from.

Switzerland is a federal states and the cantons still retain strong identities and policy autonomy, in a way that’s analogous to the states in the U.S.

3. A multilingual nation

Switzerland has four official languages: German (the most prevalent), French, Italian, and Romansh. Romansh is spoken by less than 1% of the population, and only in a few places in Eastern Switzerland. I personally have never heard an utterance of Romansh. But it’s the only language unique to Switzerland, so I suppose it has a special place in the Swiss national identity.

The Swiss have a well-deserved reputation as polyglots. Almost all Swiss people I know speak at least two Swiss languages plus English, and some many more.

4. Let’s get physical

This map shows the terrain of Switzerland together with land use. You can immediately see that Switzerland is a mountainous country, with the Alps dominating the southern 2/3 and the smaller Jura Mountains along the northwest border. The bit in the middle, which is also where most of the people live, and most of the agricultural land and industrial production are located, is called the Swiss Plateau.

You’ll also notice the lakes. Switzerland has a lot of them, including some of the biggest lakes in Europe. Most Swiss lakes, including Geneva, were formed when the ice sheets of the last glacial period retreated, leaving deep basins carved by ice, and filling with water from the melting glaciers. More on the ice age below.

5. A geologist’s paradise

Ok, this one’s not a map. It’s a geologic cross section (source) showing a very simplified version of what the earth might look like if you cut out a slice 50 km deep and several hundred km long from Italy in the south, through the heart of the Swiss Alps, and north into France. The diagram gives an idea of the folding and faulting wrought by the massive tectonic collision that created the Alps.

In simplest terms, the Alps formed when two tectonic plates, the African and Eurasian plates, collided over millions of years. It all started in the Late Cretaceous, around 100 million years ago, when the ocean that separated what are now Eurasia and Africa began to close. Eventually the two continental masses themselves collided, with rocks on African side thrust up and over the Eurasian plate. The suture where the two plates became fused is called the Insubric Line.

The Alps are tectonically active to this day, raising up on the order of 1 mm per year. To geologists, the Alps are special because they were the first collisional mountain range to be studied extensively and much of the early understanding of structural geology comes from those pioneering Alpine studies.

6. The ice age

Made with the Swiss Federal Geoportal mapping tool

If the great tectonic collision provided the medium of folded, faulted and uplifting rock, the ice ages were the sculptor who fashioned the Swiss Alpine landscape into the wonder that we recognize today. The map above shows the extent of the ice cap that covered much of present-day Switzerland during the last glacial maximum, about 20,000 years ago.

The glaciers carved the spectacular U-shaped valleys and jagged peaks of the Alps. They also created the basins that would eventually be filled with water and form the Swiss lakes. Other evidence of the the glaciers is often visible in Switzerland, such as great boulders carried by the ice and stranded, and gentle hilly moraines that dot the Swiss Plateau.

7. The trains run on time

Back to the present day. One of the best things about Switzerland is the passenger train network, depicted on the map above, which you’ll find in every train car and station in the country. It’s the densest passenger network in Europe. You really can get just about anywhere on the train, even high into mountain villages on the many cog wheel and narrow gauge lines. And the trains are on time. Well, 95% of them, according to the Swiss national railway company. To really appreciate the attention to detail that the Swiss give to rail travel, check out this incredible diagram.

8. Let’s hit the slopes

When you think of Switzerland, you think of skiing, and the Swiss Alps have some of the top ski resorts in the world. One thing I love about the alpine ski resorts, aside from the great slopes, are the beautiful hand drawn piste maps. Here’s one of the Grindlewald/Wengen area in the Bernese Oberland. Just looking at it makes me want to start planning next year’s ski trip.

Switzerland is famous for its direct democracy, the process whereby voters frequently weigh in on referendums, popular initiatives, and even have veto power over laws. The Swiss vote a lot. Elections happen about four times a year and often contain several referendums at the national, cantonal, and local level, as well as ballots for elected representatives.

Immigration is a contentious issue in Switzerland (as in many other parts of the world). Relative to its population, immigration levels are quite high, compared to say, Germany or even the U.S. In some cases, xenophobia wins out in popular initiatives, such as when the Swiss voted in 2009 to prohibit the construction of minarets.

10. A rich cartographic history

With its varied geography and strong scientific and educational traditions, it’s no surprise that Switzerland has produced some stunning cartography. The first official map series to encompass all of Switzerland was produced by Guillaume-Henri Dufour and published from 1845-1865. The result of decades of surveying, drawing, copperplate engraving, and printing, the map achieved a high level of accuracy and detail for its time, and is also distinguished by the attractive use shading to show topography. More information on the Dufour map, as well as the equally impressive Siegfried map is available here.

Swiss excellence in mapping continues to this day. For example, the Federal Geoportal has a great mapping tool that allows you to access and display hundreds of data layers, from road networks to wetlands.

You’re probably familiar with the concept of population density. It’s the total population divided by the area. When talking about cities, it’s commonly understood that high population density is a necessary if not sufficient condition for urban vibrancy and efficient mass transit. But it can be difficult to compare population densities of metropolitan areas because the administrative boundaries have an arbitrary effect on measurement. For example, if the LA metro area is defined at the county level and includes all of San Bernardino County, which is mostly empty desert, you get a pretty meaningless density measurement.

Now, you can look at smaller administrative areas to get a better handle on the population density of a city. In the U.S. the census tract is the highest resolution. With the areas and populations of each census tract, you can calculate an even more interesting metric: population-weighted density, which is the the average of each resident’s census tract density. That means that areas where more people live get more weight in the overall density calculation.

Another way to think about population-weighted density is the density at which the average person lives. The simple population density of the entire U.S. is 87 people per square mile. That really does not tell us much. But the population-weighted density is over 5,000 people per square mile. The average American lives in an urban area. (That example is from a U.S. Census report on metropolitan areas.)

An interesting (if not intuitive) insight from population-weighted density is the strong relationship between city size and density. Big cities are more dense. The plot below shows the population weighted densities and total populations of the 100 U.S. largest cities (well, technically core-based statistical areas). Click on the image for the interaction version if you want to mouse over the dots to identify individual cities.

Click for interactive version. Note log-log scale.

The cities are categorized by region, showing the general pattern that southern cities are the least dense and northeastern and western cities the most dense. This regional difference is emphasized in the linear fits shown for each region. I was surprised by how dense on average the western cities are. Honolulu is a real outlier in terms of having a high density for its size. Unsurprisingly, the sprawling giants of Atlanta, Dallas, and Houston are low-density outliers.

Incidentally, I got the idea for this graph after listening to a very interesting podcast on Streetsblog about the urban form of Milwaukee. It mentioned that Milwaukee is actually one most the most dense cities for its size, especially when looking in the Midwest. And sure enough, Milwaukee lies well above the blue trend line for Midwest cities. If you have 45 minutes and are interested in Milwaukee you should definitely listen to the podcast. Full disclosure: I was born and raised there.

Technical notes: Plot made with plot.ly using data from U.S. Census. The color palette is inspired by the film Rushmore and is from Karthik Ram’s wesanderson R package. Yes, this was all an elaborate excuse to try out the Wes Anderson color palettes.

A nice in-depth look at urban density and implications for transit can be found here.

Finally, if you are interested in extreme urban density, check this out. I can’t vouch for the accuracy of the data, but the web site name suggests it’s probably pretty legit.

I’ve been playing with an interesting dataset recently, and it got me thinking about challenges in effectively visualizing geospatial data. Specifically, how do you best display a continuous variable whose values span several orders of magnitude?

The dataset I’m working with comes from the Arctic Monitoring and Assessment Program. It’s a estimate of global anthropogenic emissions of mercury per 0.5 x 0.5 degree grid square. One important reason why AMAP generated these data (and how they did it is an interesting problem and the topic for another post) was to help atmospheric transport modelers who need to know where on earth emissions are coming from. But the data also allow for a nice visualization of global sources of mercury pollution that goes beyond simple maps showing emissions by country.

I’ll present two options here, and I’d love feedback on what works best. I think there are also trade-off depending on what the purpose of the visualization is (presentation vs. exploration) and the scale. Both are made on CartoDB. You can zoom, scroll, and click on a point to see the data. Check out the full-screen option which I think is pretty cool.

The first is perhaps the more flashy one. It uses yellow circles whose size are proportional to mercury emissions. There is a multiply effect so areas of overlap appear orange-red.

This one is a more traditional chloropleth approach using an orange-red scale to represent the magnitude of emissions over each grid square.

Some technical notes:

The dataset contains around 45,000 grid squares (areas with no anthropogenic emissions, like oceans, are no data) with mercury emissions ranging from about 10^-5 to 12,000 kg. That’s around 8 orders of magnitude. Some quick exploration of the data revealed that almost all the mercury emissions came from less than 10 percent of the model area.

Cumulative sum of mercury emissions (normalized to 1) as a function of magnitude of emissions in each cell. Almost all emissions are from cells with greater than 10 kg emissions. Note log scale on x axis.

Most areas have very small emissions, but a few have very high emissions. The data are like this because the emissions estimates are made using both point sources, “area” sources like artisanal mining, and population as a proxy for some general emissions. In any case, to facilitate visualization I removed the very-low-emissions-value grid squares. The remaining ~5000 squares comprise ~93% of total emissions. These data still have a pareto-like distribution ranging almost three orders of magnitude, but they are easier much easier to display on a map.

Cumulative sum of mercury emissions (normalized to 1) as a function of magnitude of emissions in each cell. Cells with < 50 kg Hg removed. Note log scale on x axis.

Note that the maps display mercury emissions per square km for each cell, not total mercury emissions. That is because the areas of the 0.5 x 0.5 grid cells vary with longitude. Those closest to the equator are larger, closer to the poles are smaller. So it makes for a more accurate display to normalize by the cell area.

An important factor in the visual appearance of continuous data like these is where to choose the breaks separating data points into different colors or sizes. This is especially difficult with pareto or power law distributions. CartoDB has several built in options for binning data. After playing around with them I choose head/tail breaks, which seems to work well on this type of distribution. CartoDB also allows you to easily change the breaks manually with cartoCSS. It was a challenge to find a binning and color/size scheme that portrayed the data in the most accurate way, while also maintaining a clear and striking appearance.

The other day I learned that wordpress.com now supports embeds of CartoDB maps. This is pretty cool, and it inspired me to finish up a little project that I’ve been tinkering with for a while, in order to try out the new feature.

By the way, CartoDB is a web mapping tool that I think is one of the best interfaces available for creating interactive maps. You can make great looking maps quickly and easily, but there is also enough functionality to do more advanced stuff, like mess around with the CSS code.

This map shows estimates of how much mercury is on site at chlor-alkali plants per country. It distinguishes between countries that ban the export of mercury and those that don’t. This is important because chlor-alkali plants often contain hundreds of tons of mercury. When the facilities close the mercury can enter the commodity market where it can be used in artisanal gold mining.

The size of the bubbles reflects how many tons of mercury are estimated to be in chlor-alkali facilities in each country. Scroll, zoom, hover, or click for more details. The data are from the UNEP Global Mercury Partnership chlor-alkali inventory.

Technical CartoDB note: In order to distinguish (by bubble color) countries with and without export bans, I made two layers from the data table. However, because each set had a different range of values, the scale for the bubble size was different for each color. To fix this I manually changed the bubble size distribution cutoffs in the CSS tab. Is there an easier solution that I am missing?

It’s been quite some time since my last post. I have been busy with a young child, new job, and an international move. But I’m hoping to get back into posting and making visualizations on a regular basis.

The reason for this post is that I came across an interesting resource called the International Environmental Agreements Database Project, hosted at the University of Oregon. The database contains information on about 1100 multilateral environmental agreements (MEAs) dating back to 1857. The data include the title, type (an original agreement or a protocol or amendment to an existing agreement), dates of signature and entry into force, and the parties. For some agreements there is even data on performance as well as coding to allow for comparison of the actual legal components.

As an initial exploration, I simply looked at how many agreements were concluded over time. The plot below shows the results for the last 100 years. Click for the interactive and shareable plot.ly version.

Click for interactive version

There is a pretty interesting pattern. From the early 20th century until the 1950s there are not that many MEAs. Then the pace picks up in mid-century, peaking in the early 1990s, and declining considerably after that.

The Earth Summit was a huge event in the global environmental community, and occurred at a high point of optimism about multilateralism. There was a flurry of MEA activity around this time. But there was also a building movement to ensure that international environmental diplomacy was benefiting the poor, and in particular, developing countries.

The Rio Declaration enshrined the principle of common but differentiated responsibilities. This is the idea that while all nations have a responsibility to protect the global environment, rich nations should shoulder a greater share of the burden.

It is a noble sentiment, and one that in my view makes a lot of sense. But it had the effect of making it more difficult to reach agreements in international environmental negotiations. Developing countries started going into the negotiations expecting more support, in the form of funding, reduced obligations, or technology transfer, from the developed world. Common but differentiated responsibilities is at the root of a major sticking point in global climate talks. Should China, India, and other rapidly developing nations have the same stringent obligations as more mature economies?

I certainly don’t think this is the only cause of the decline in new MEAs in the last 20 years. And neither can I claim to be the first to think about the Rio Declaration’s impact on MEAs. There’s an entire literature on it. For example, Richard Benedick discussed this theme at length in reference to the Montreal Protocol and its aftermath in his book Ozone Diplomacy.

As a final disclaimer, for this analysis it would be best to filter the IEA database to exclude those MEAs that only have a few parties. That way you could really focus on the rate of global or large regional MEAs over time. Perhaps I’ll do that next.

But in any case, it’s an interesting dataset and an interesting pattern. And a good excuse to step back and think about the big picture in global environmental politics.

Recently there has been bit of buzz about a study claiming that female named hurricanes cause more fatalities, on average, than male ones. The authors suggested that the discrepancy is attributable to gender bias. Female named hurricanes do not seem as threatening to people, so presumably they take fewer precautions. From the start this seemed pretty far-fetched, and in fact a number of problems have been found with the study.

But it got me thinking about hurricane names. A more likely effect of a hurricane’s name would be to discourage parents from giving their children that name, if the hurricane is associated with death and destruction. Fortunately, there is readily available data with which to test this hypothesis. For hurricanes, I used the same data as the hurricane gender study described above (they may have had some problems with their methodology, but at least they released their data). It contains data on 92 Atlantic hurricanes that made landfall in the U.S. since 1950*. For baby names I turned to the Social Security Administration. There is a great R package called babynames that makes the yearly SSA data available in a readily accessible format for use in R. As an aside, the SSA baby names data is the source of all sorts of interesting visualizations and analyses, such as the baby name voyager and this article from fivethirtyeight.com on predicting a person’s age based on their name.

The tricky part of this analysis is deciding how to define a decrease in name usage after a hurricane. The simplest way would be to look at how many times a name was given in the year of a hurricane versus how many times that name was given the following year. For example, how many baby Kartrinas were there in 2005 versus 2006. However, this method does not take into account that most names are either decreasing or increasing in popularity as part of a longer-term trend. So you have to look at how the popularity of a name was changing before the hurricane as well. To see why, look at this plot of the number of babies named Katrina over time.

Katrina peeked in popularity in in 1980 and has been declining ever since. But from 2004-2005 the number of Katrina’s actually increased about 13%. From 2005-2006, however, it decreased dramatically – by 26%. It’s a pretty good bet that this rapid decrease was due to the hurricane.

To quantify the change in a name’s usage after a hurricane, I made the assumption that the best predictor of how a name’s popularity will change in a given year is how it changed last year. To calculate the post-hurricane change in name usage I subtracted the percent change in name usage in the year before the hurricane from the percent change after the hurricane. In the Katrina example the post hurricane change would be (-26%) – (13%) = -39%. This post-hurricane percent change value is what I use in the analysis below.

Before we get to the results, let’s take at look at the fascinating case of Carla:

Hurricane Carla was an extremely intense storm that hit Texan in 1961, killing 43. The name “Carla” had been surging in popularity, but after 1961 it started a decline in popularity from which it never recovered. It seems a pretty good bet that the hurricane had a major role in Carla’s decline. Interestingly, the first live television broadcast of a hurricane was of Carla, with a young Dan Rather himself reporting from Galveston. Could the shock of the American TV-viewing public seeing footage of the storm in their living rooms have contributed to the demise of Carla as a name?

Back to the analysis. Indeed, the hurricane baby name effect seems real. After running the numbers, I found that names associated with a landfalling hurricane were about 15 percent less common in the year after the hurricane. Out of the 93 hurricanes in the data set, 65 were associated with a decrease in the popularity of their names, and only 21 were followed by increasing name usage. (Seven hurricane names were not found in the SSA data in their landfall year).

So far this is pretty intuitive. Of course people are less likely to name their dear infant after a natural disaster. Based on this reasoning, you’d expect that the more fatalities caused by a hurricane, the greater the baby name effect. Let’s test that.

The effect is quite small. When we take Katrina out (a massive outlier in terms of fatalities), it’s smaller still:

So the correlation between change in baby name usage and hurricane fatalities is quite weak. Finally, I had to see if the gender of the hurricane name affected this relationship. Were more deadly female-named hurricanes more or less likely than male names to affect baby name popularity? Maybe I’d even find that male baby name usage goes up with hurricane fatalities because parents associate the names with strength? I can see the Slate headline now! Alas, there is no significant difference:

By the way, there are more female names because from 1950 – 1979 all Atlantic hurricanes were given female names.

There’s an almost endless amount of interesting things to glean from the baby names data. My ultimate dream is an algorithm to determine the perfect name for your baby based on a number of criteria chosen by the expectant parents. It would really take the stress out of the naming process. One of the criteria would certainly be that the name is not on the World Meteorological Association’s list of tropical storm names!

This post is intended to illustrate the cool things you can do with plot.ly’s API for R. Plot.ly is a web-based tool for making interactive graphs. It uses the D3.js visualization library, and lets you create very attractive plots that can be easily shared or embedded in a web page. With the R API you can manipulate data in R and then send it over to plot.ly to create an interactive graph. There’s also a function that let’s you create a plot in R using ggplot2, and then shoot the result directly over to plot.ly (summarized nicely here).

I have great little free app on my iPhone called Pedometer++ that keeps track of how many steps I take each day. I exported the data, plotted up a time series with ggplot2, and used the API to make the graph in plot.ly. It worked quite nicely. The only hiccup was that plot.ly did not recognize the local regression curve, so I had to add that separately.

You can see from the plot that I’m not consistently meeting my 10,000 step goal. In fact, I averaged 7,002 steps over this period. That still comes out to a total of 1,470,463 steps. From October through February my step count was trending slightly downward, but since then it’s picked up. Maybe that had something to do with the cold winter. Hopefully as the weather (and my motivation) improves, I’ll hit my goal.

Click to see the interactive version

Any here’s a bonus box plot showing steps taken by day of the week (also using the R API):

Click to see the interactive version

If there are any pedometer users out there who are interested, let me know and I can post the code.

One of the first posts on this blog was about using Tableau to visualize data on global emissions of mercury. I’ve gotten suggestions from a few people and given the graphic a bit of a face lift. Click on the image to see the interactive viz:

Click for interactive graphic

I also used the same dataset to make some static graphics using ggplot2 and the ggthemes package. I’d love any input on how to improve the the look and feel of both these and the Tableau viz. I’ve always found picking good colors very challenging, so thoughts on the palettes are especially welcome.

The 8 industry sectors with the highest global mercury emissions. Data for 2010 from the 2013 UNEP Global Mercury Assessment.

Countries with the highest mercury emissions. Data for 2010 from the 2013 UNEP Global Mercury Assessment.