Elegant Figures

After nearly 20 years, it’s time for me to leave NASA and do something radically different—help Planet Labs develop world-class satellite imagery. OK, not that radically different. In any case, now is as good a time as ever to point to some of my favorite visualizations. If you’d like like to get in touch with me, drop me a note on Twitter (@rsimmon) or contact the Earth Observatory team—they’ll know where I am.

Thanks to everyone on the Earth Observatory team that I’ve worked with over the years. Among them: David Herring, Kevin Ward, Mike Carlowicz, Paul Przyborski, Rebecca Lindsey, Holli Riebeek, Adam Voiland, Jesse Allen, Reto Stöckli, Goran Halusa, and John Weier.

Ten: The Blue Marble

Let me get this out of the way: this was the first NASA image of an entire hemisphere of the Earth made with full-color data since the Apollo 17 crew returned from the Moon. It’s certainly striking, but once you look carefully a number of flaws start to appear. Flaws that I may or may not have pointed out when I described my design process. (I’ll also reiterate Reto Stöckli’s invaluable work building the land and cloud textures, which were the hard parts.)

Nine: Global Net Primary Productivity

Although these two datasets show the exact same property (net primary productivity, a measure of the amount of carbon the biosphere draws out of the atmosphere) they’re measured in different ways, and deserve to be differentiated. By using palettes with different hues but an identical range of lightness and saturation, they are directly comparable but remain distinct from one another.

Eight: The Landsat Long Swath

Shortly after launch, Landsat 8 collected what was probably the single largest satellite image ever made. Roughly 12,000 pixels wide by 600,000 pixels tall the image combined 56 individual Landsat scenes into a single strip from Siberia to South Africa.

Seven: LIDAR

Back when I used to be competent at 3D, I made this illustration showing how a pulse of laser light can measure the structure of a forest canopy.

Six: Air Quality over 16 Megacities

It may just be that I’m enamored with Alberto Cairo, but I’m growing increasingly fond of slope graphs. They occasionally tell stories more clearly than more conventional graph types.

Five: an Erupting Volcano from the International Space Station

When Sarychev Volcano blasted a column of ash high above the Kuril Islands an astronaut captured not one, but a whole sequence of photographs of the plume. Make sure you look for the pyroclastic flows coursing down the side of the volcano. (The eruption did not blast a hole in the clouds by the way, that’s a result of interactions between wind, clouds, and island topography.)

For some reason the color and contrast work better in this version (originally published in 2000 to accompany the story Bright Lights, Big City) than any of my attempted remakes. This includes the 2012 Black Marble, which was made with much better data.

One: Seeing Equinoxes and Solstices from Space

I’m generally skeptical of animation in data visualization, but for some things motion is the story. I think this applies to the apparent motion of the sun over the course of a year, alternately lighting the North and South Poles. (Apologies for the poor quality of the YouTube compression. Make sure you check out the HD version.)

Finally: Thanks to the NASA family, and to all of you who’ve expressed appreciation for my pictures over the years.

Palettes aren’t the only important decision when visualizing data with color: you also need to consider scaling. Not only is the choice of start and end points (the lowest and highest values) critical, but the way intermediate values are stretched between them.

For most data simple linear scaling is appropriate. Each step in the data is represented by an equal step in the color palette. Choice is limited to the endpoints: the maximum and minimum values to be displayed. It’s important to include as much contrast as possible, while preventing high and low values from saturating (also called clipping). There should be detail in the entire range of data, like a properly exposed photograph.

These maps of sea surface temperature (averaged from July 2002 through January 2014) demonstrate the importance of appropriately choosing the range of data in a map. The top image varies from -5˚ to 45˚ Celsius, a few degrees wider than the bounds of the data. Overall it lacks contrast, making it hard to see patterns. The lower image ranges from 0˚ to 28˚ Celsius, eliminating details in areas with very low or very high temperatures. (NASA/MODIS.)

Ocean chlorophyll (a measure of plant life in the oceans) ranges from hundredths of a milligram per cubic meter to tens of milligrams per cubic meter, more than 3 orders of magnitude. Both of these maps use the almost same endpoints from near 0 (it’s impossible to start a logarithmic scale at exactly 0) to 11. Plotted linearly, the data show a simple pattern: narrow bands of chlorophyll along coastlines, and none in mid-ocean. A logarithmic base-10 scale reveals complex structures throughout the oceans, in both coastal and deep water. (NASA/MODIS.)

Some visualization applications support logarithmic scaling. If not, you’ll need to apply a little math to the data (for example calculate the square root or base 10 logarithm) before plotting the transformed data.

Appropriate decisions while scaling data are a complement to good use of color: they will aid in interpretation and minimize misunderstanding. Choose a minimum and maximum that reveal as much detail as possible, without saturating high or low values. If the data varies over a very wide range, consider a logarithmic scale. This may help patterns remain visible over the entire range of data.

Over shorter times scales (Suomi-NPP completes an orbit every 100 minutes or so) multiple Day Night Band scenes stitched together show a snapshot of the Earth at night, like this view of South America, including the 14 Brazilian World Cup cities.

Marit Jentoft-Nilsen and I used a number of software tools to read, stitch, project, and visualize the data, starting with a handful of HDF5 files. VIIRS data is aggregated into granules, each acquired over 5 minutes. These files are distributed, archived, and distributed by NOAA’s CLASS (the Comprehensive Large Array-data Stewardship System). To deal with the unique projection of VIIRS, I used ENVI’s Reproject GLT with Bowtie Correction function to import the data. (If you’re unfamiliar with VIIRS data, now’s a good time to read the Beginner’s Guide to VIIRS Imagery Data (PDF) by Curtis Seaman of CIRA/Colorado State University.)

So far so good. Of course the data is in Watts per square meter per steradian, and the useful range is something around 0.0000000005 to 0.0000000500. With several orders of magnitude of valid data, any linear scale that maintained detail in cities left dim light sources and the surrounding landscape black. And any scaling that showed faint details left cities completely blown out.

To make the data more manageable, show detail in dark and bright areas, and allow export to Photoshop I did a quick band math calculation: UINT(SQRT((b1+1.5E-9)*4E15)*(SQRT((b1+1.5E-9)*4E15) lt 65535) + (SQRT((b1+1.5E-9)*4E15) ge 65535)*65535)

It looks a bit complicated, but it’s not too bad. It adds an offset to account for some spurious negative values; multiplies by a large constant to fit the data into the 65,536 values allowed in a 2-byte integer file; calculates the square root to improve contrast, sets any values above 65,535 to 65,535; then converts from floating point to unsigned integer. This data can be saved as a 16-bit TIFF readable by just about any image processing program, while maintaining more flexibility than an 8-bit file would.

The final steps were to bring the TIFF into Photoshop, tweak the contrast with levels and curves adjustments to bring out as much detail as possible, add coastlines and labels, and export for the web. The result: Brazil at Night published by the NASA Earth Observatory on the eve of the World Cup.

Last week I had the privilege of helping judge the 22nd Malofiej Awards. Presented by the Spanish Chapter of the Society for News Design, Malofiej 22 recognized the best news infographics of 2013. (For the uninitiated, infographics aren’t limited to those posters illustrating tenuously related facts with a handful of pie charts. A good infographic is an information-rich visual explanation; often incorporating illustration and text, but sometimes they’re as simple as a line graph.)

My fellow jurors for the online competition were an eclectic and brilliant bunch: Anatoly Bondarenko, interactive visualization designer for Texty (a Ukrainian guerrilla news site); Scott Klein, news applications editor at ProPublica, Golan Levin, artist & professor at Carnegie Mellon University; and Sérgio Lüdtke, journalist and professor of digital journalism at the International Institute of Social Sciences. (Good thing I didn’t read these bios before heading to Spain: I would have been more intimidated than I already was.)

A traditional Spanish lunch for the Malofiej jurors and student volunteers.

For four days we evaluated, discussed, and sometimes argued about 400 or so online visualizations. We first narrowed the field down to about 100, then selected 8 gold, 24 silver, and 39 bronze medals (assuming I counted right). It was interesting to me that we largely liked (and disliked) the same entries, but often for different reasons. This led to several “ah ha” moments, when I suddenly appreciated a new perspective on a figure.

Instead of writing about the golds, which have already garnered their fair share of praise, here are some of my favorite silver and bronze winners.

BronzeElectionland (Zeit Online)
I have a few quibbles about this map of the 2013 German Bundestag elections, but it’s a brilliant bit of data crunching and abstraction. Voting districts are positioned based on the similarity of their voting patterns, rather than geography.

Since its launch in February 2013, Landsat 8 has collected about 400 scenes of the Earth’s surface per day. Each of these scenes covers an area of about 185 by 185 kilometers (115 by 115 miles)—34,200 square km (13,200 square miles)—for a total of 13,690,000 square km (5,290,000 square miles) per day. An area about 40% larger than the united states. Every day.

Although it’s possible to process all that data automatically, it’s best to process each scene individually to bring out the detail in a specific region. Believe it or not, this doesn’t require any tools more specialized than Photoshop (or any other image editing program that supports 16-bit TIFF images, along with curve and level adjustments).

The first step, of course, is to download some data. Either use this sample data (960 MB ZIP archive) of the Oregon Cascades, or dip into the Landsat archive with my tutorial for Earth Explorer. The data are comprised of 11 separate image files, along with a quality assurance file (BQA) and a text file with metadata (date and time, corner points—that sort of thing). These arrive in a ZIPped and TARred archive. (Windows does not natively support TAR, so you may need to use something like 7-Zip to unpack the files. OS X can automatically uncompress the files (with default settings) when the archive is downloaded—double-click to unpack the TAR.) Inexplicably, the TIFFs aren’t compressed. [The patent on the LZW lossless compression algorithm (which is transparently supported by Photoshop and many other TIFF readers) expired about a decade ago.]

I prefer using the red, green, and blue bands because natural-color imagery takes advantage of what we already know about the natural world: trees are green, snow and clouds are white, water is (sorta) blue, etc. False-color images, which incorporate wavelengths of light invisible to humans, can reveal fascinating information—but they’re often misleading to people without formal training in remote sensing. The USGS also distributes imagery they call “natural color,” using shortwave infrared, near infrared, and green light. Although the USGS imagery superficially resembles an RGB picture, it is (in my opinion) inappropriate to call it natural-color. In particular, vegetation appears much greener than it actually is, and water is either black or a wholly unnatural electric blue.

Even natural-color satellite imagery can be deceptive due to the unfamiliar perspective. Eduard Imhof, a Swiss cartographer active in the mid Twentieth Century, wrote*:

(A)erial photographs from great heights, even in color, are often quite misleading, the Earth’s surface relief usually appearing too flat and the vegetation mosaic either full of contrasts and everchanging (sic) complexities, or else veiled in a gray-blue haze. Colors and color elements in vertical photographs taken from high altitudes vary by a greater or lesser extent from those that we perceive as natural from day to day visual experience at ground level.

Seen from above, the Earth’s surface is made brighter and bluer by the atmosphere (compare the hue of the clouds to the hue of the panels on the International Space Station). (Astronaut Photograph ISS037-E-001180)

Because of this, satellite imagery needs to be processed with care. I try to create imagery that matches what I see in my mind’s eye, as much or more than a literal depiction of the data. Our eyes and brains are inextricably linked, after all, and what we see (or think we see) is shaped by context.

Enough theory. To build a natural-color image, open the three separate files in Photoshop:

You’ll end up with three grayscale images, which need to be combined to make a color image. From the Channels Palette (accessed through Photoshop’s Windows menu item) select the Merge Channels command (click on the small downward-pointing triangle in the upper-right corner of the palette).

Then change the mode from Multichannel to RGB Color in the dialog box that appears. Set band B4 as red, B3 as green, and B2 as blue.

This will create a single RGB image, which will be dark. Very dark:

The raw Landsat 8 scene is so dark because it’s data, not merely an image: the numbers represent the precise amount of light reflected from the Earth’s surface (or a cloud) in each wavelength, a quantity called reflectance. To encode this information in an image format, the reflectance is scaled, and the range of values found in the data are smaller than the full range of values in a 16-bit image. As a result the images have low contrast, and only the brightest areas are visible (much of the Earth is also quite dark relative to snow or clouds—I’ll explain why that matters later).

As a result, the images need to be contrast enhanced (or stretched) with Photoshop’s levels and curves functions. I use adjustment layers, a feature in Photoshop that applies enhancements to a displayed image without modifying the underlying data. This allows a flexible, iterative approach to contrast stretching.

The first thing to do after creating the RGB composite (after saving the file, at least) is to add both a Levels adjustment layer, and a Curves adjustment layer. You add both adjustment layers from the Layer Menu (just hit OK to create the layer): Layer > New Adjustment Layer > Levels… and Layer > New Adjustment Layer > Curves…

Levels and curves adjustments both allow manipulation of the brightness and contrast of an image, just in slightly different ways. I use both together for 16-bit imagery since the combination gives maximum flexibility. Levels provides the initial stretch, expanding the data to use the full range of values. Then I use curves to fine-tune the image, matching it to our nonlinear perception of light.

This screenshot shows Photoshop’s layers palette (on the left) and a levels adjustment layer (right). From top to bottom: the adjustment menu allows you to save and load presets; the channel menu allows you to select the individual red, green, and blue bands; the histogram shows the relative distribution of light and dark pixels (the higher the bar the more pixels share that specific brightness level); the white eyedropper sets the white point of the red, green, and blue channels to the values of a selected pixel (select a pixel by clicking on it); and the black point and white point sliders adjust the contrast of the image.

Move the black point slider right to make the image darker, move the white point slider left to make the image lighter. (Important: any parts of the histogram that lie to the left of the black point slider or to the right of the white point slider will be solid black or solid white. Large areas of solid color don’t look natural, so make sure only a small number of pixels are 100% black or white). Combined black point and white point adjustments make the dark parts of an image darker, and the light parts lighter. The process is called a “contrast stretch” or “histogram stretch” (since it spreads out the histogram).

With a Landsat image, we want to do a contrast stretch to make the image brighter and reveal detail. The first step is to select the white eyedropper (the bottom of the three eyedropper icons on the left edge of the Levels Palette), and find an area of the brightest area of the image that we know is white: a puffy cloud, or even better, pristine snow. In general the brightest side of a cloud (or mountain, if you have snow) will be facing southeast in the Northern Hemisphere, east in the tropics, and northeast in the Southern Hemisphere. (This is simply because Landsat collects imagery at about 10:30 in the morning, and the Sun is still slightly to the east of the satellite.) If there are no clouds or snow, a salt pan or even a white rooftop would be a good alternative.

Setting the white point does two things: the brightest handful of pixels in the image are now as light as possible, and the scene is partially white balanced. This means that the white clouds appear pure white, and not slightly tinted. Slight differences in the characteristics of the different color bands, combined with scattering of blue light by the atmosphere, can add an overall tint to the image. Ordinarily our brain does this for us (a white piece of paper looks white whether it’s lit by the sun at noon, a fluorescent light, or a candle), but satellite imagery needs to be adjusted. As do photographs: digital cameras have white balance settings, and film type effects the white balance in an analog camera.

It’s possible to automate white balancing (Photoshop has a handful of algorithms, accessed by option-clicking the auto button in the levels and curves adjustment layer windows) but I find hand-selecting an appropriate area to work best.

After white balancing, the histograms of the red, green, and blue channels in the Mount Shasta image look like this:

Notice that the value of white in each channel is slightly different (213, 202, and 201), and that there’s a small number of pixels above the white point in each channel. The combined RGB histogram makes it easier to see the differences between channels.

The histogram also shows that most of the image is still dark: there’s a sharp peak from about 10 to 20 percent brightness, with few values in the middle tones, and even fewer bright pixels. This is typical in a heavily forested region like the Cascades. The next step is to brighten the entire image (at least brighten it enough to make further adjustments).

In the curves adjustment layer, click near the center of the line that runs at a 45˚ angle from lower left to upper right. This will add an adjustment point with which to modify the curve. When it’s straight, no adjustments are made. When it’s above 45˚ the image will be brighter, and when it’s below it will be darker. Steeper parts of the line result in more contrast in those tones, while where the line is flatter it will compress the brightness range of the image. Dragging the curve to the left and down brightens the image overall, but dark areas are brightened more than lighter ones. This reveals details in the land surface, and makes clouds look properly white, instead of gray.

Take a look at the combined histogram (open the Histogram Palette by selecting Window > Histogram from Photoshop’s menu bar): it’s easy to see that the darkest areas of the image aren’t black, and that the minimum red values are lower than the minimum green values, which are lower than the minimum blue values.

There’s no true black in a Landsat 8 image for three reasons. First, Landsat reflectance data are offset—a pixel with zero reflectance is represented by a slightly higher 16-bit integer (details: Using the USGS Landsat 8 Product). That is, if there were any pixels with zero reflectance. Very little (if any at all) of the Earth’s surface is truly black. Even if there was, light scattered from the atmosphere would likely lighten the pixel, giving it some brightness. (Landsat 8 reflectance data are “top of atmosphere”—the effects of the atmosphere are included. It’s possible, but very difficult in practice, to remove all the influence of the atmosphere.)

Shorter wavelengths (blue) are scattered more strongly than longer wavelengths (red), so the satellite images have an overall blue cast. This is especially apparent in shadowed regions, where the atmosphere (i.e. blue light from the sky) is lighting the surface, not the sun. We think of shadows as black, however, so I think it’s best to remove most of the influence of scattered light (a little bit of the atmospheric influence was already removed by choosing a white point) by adjusting the black points in each channel.

You can change the curves separately for each channel by selecting “Red”, “Green”, or “Blue” from a drop-down menu (initially labelled “RGB”) in the Curves adjustments palette. Starting with the red channel, drag the point at the lower-left towards the right, until it’s 2 or 3 pixels away from the base of the histogram. Repeat with green and blue, but leave a bit more space (an additional 1 or 2 pixels for green, and 3 or 4 for blue) between the black point and base of the histogram. This will prevent shadows from becoming pure black, with no visible details, and leave a slight blue cast, which appears natural.

Like so:

The image is now reasonably well color balanced, and the clouds look good, but the surface is (again) dark. At this point it’s a matter of iteration. Tweak the RGB curve, maybe with an additional point or three, then adjust the black points as necessary. Look at the image, repeat. I find it helpful to walk away every once in a while to let my eyes relax, and reset my innate color balance: if you stare at one image for too long it will begin to look “right”, even with a pronounced color cast.

I sometimes find it helpful to find photographs of a location to double-check my corrections, either with a generic image search or through a photography archive like flickr. It helps give me a sense of what the ground should look like, especially for places I’ve never been. Keep in mind, however, that things look different for above. In an arid, scrub-covered landscape, for example, a ground-level photograph will show what appears to be nearly continuous vegetation cover. From above, the spaces between individual bushes and trees are much more apparent.

Try to end up with an RGB curve that forms a smooth arch: any abrupt changes in slope will cause discontinuities in the contrast of the image. Don’t be surprised if the curve is almost vertical near the black point, and almost flat near the white point. That stretches the values in the shadows and in dark regions, and compresses them in the highlights. This enhances details of the surface, and makes clouds look properly white. (If there’s too much contrast in clouds they’ll look gray.)

Notice how each of the bands have two distinct peaks, and most of the histogram (select All Channels View to see separate red, green, and blue channels) is packed into the lower half of the range. The bimodal distribution is a result of the eastern Oregon landscape being split into two distinct types: the dark green forests of the Cascades, and the lighter, high-altitude desert in the mountain’s rain shadow. The skew towards dark values is a compromise. I sacrificed some detail in the forest and left the desert a little dark, to bring out features in the clouds. If I was focusing exclusively on a cloud-free region, the peaks in the histogram would be more widely distributed. Fortunately Landsat 8’s 12-bit data provides a lot of flexibility—adjustments can be quite aggressive before there’s noticeable banding in the image.

The full image has a challenging range of features: smoke, haze, water, irrigated fields (both crop-covered and fallow), clearcuts, snow, and barren lava flows, in addition to forest, desert, and clouds. The color adjustments are a compromise, my attempt to make the image as a whole look appealing.

Some areas of the image (the Three Sisters) look good, even close up (half-resolution, in this case). But others are washed-out:

I’d use a different set of corrections if I was focused on this area (and probably select a scene without smoke, which is visible in the center of this crop).

These techniques can also be applied to data from other satellites: older Landsats, high-resolution commercial satellites, Terra and Aqua, even aerial photography and false-color imagery. Each sensor is slightly (or more than slightly) different, so you’ll probably use different techniques for each one.

Complete fidelity to natural color can not be achieved in a map. Indeed, how can it be when there is no real consistency in the natural landscape, which offers endless variations of color? For example, one could consider the colors seen while looking straight down from an aircraft flying at great altitude as the model for a naturalistic map image. But aerial photographs from great heights, even in color, are often quite misleading, the Earth’s surface relief usually appearing too flat and the vegetation mosaic either full of contrasts and everchanging complexities, or else veiled in a gray-blue haze. Colors and color elements in vertical photographs taken from high altitudes vary by a greater or lesser extent from those that we perceive as natural from day to day visual experience at ground level.

The faces of nature are extremely variable, whether viewed from an aircraft or from the ground. They change with the seasons and the time of day, with the weather, the direction of views, and with the distance from which they are observed, etc. If the completely “lifelike” map were produced it would contain a class of ephemeral—even momentary—phenomena; it would have to account for seasonal variation, the time of day and those things which are influenced by changing weather conditions. Maps of this type have been produced on occasion and include excursion maps for tourists, which seek to reproduce the impression of a winter landscape by white and blue terrain and shading tones. Such seasonal maps catch a limited period of time in their colors.

This 1823 map by W. C. Woodbridge is an early example of the use of colors to represent numbers—in this case more qualitative than quantitative. The rainbow palette is effective for this map because colors in the spectrum are perceived as “cool” and “warm,” and the colors clearly segment the climate zones. Map from the Historic Maps Collection, Princeton University Library.

By the mid-1960s cartographers had already established guidelines for the appropriate use of color in map-making. Jacques Bertin pointed out shortcomings of the rainbow palette in Sémiologie Graphique (The Semiology of Graphics), and Eduard Imhof was crafting harmonious color gradients for use in topographic maps [published in Kartographische Geländedarsellung (Cartographic Relief Presentation)].

The subtle colors in this bathymetric map of Crater Lake are a direct descendent of the palettes created by Eduard Imhof. Map courtesy National Park Service Harper’s Ferry Center.

According to much of this research, a color scale should vary consistently across the entire range of values, so that each step is equivalent, regardless of its position on the scale. In other words, the difference between 1 and 2 should be perceived the same as the difference between 11 and 12, or 101 and 102, preserving patterns and relationships in the data. (For data with a wide range that is better displayed logarithmically, relative proportions should be maintained: the perceived difference between 1 and 10 should be the same as 1,000 and 10,000.) Consistent relationships between numbers—like in a grayscale palette—preserves the form of the data. Palettes with abrupt or uneven shifts can exaggerate contrast in some areas, and hide it others.

Compared to a monochromatic or grayscale palette the rainbow palette (IDL number 35) tends to accentuate contrast in the bright cyan and yellow regions, but blends together through a wide range of greens.

A palette should also minimize errors from the color shifts introduced by nearby areas of differing color or lightness, a phenomenon known as simultaneous contrast.

Simultaneous contrast (a visual phenomenon that helps us interpret shapes through variations in brightness) shifts the appearance of colors and shades based on their surroundings. (After Ware (1988).)

Simultaneous contrast is most pronounced in monochromatic palettes, while sharp variations in hue minimize the effect. As a result variations of the rainbow palette are good for preserving exact quantities.

How to take advantage of the strengths of both the grayscale palette (preservation of form) and rainbow palette (preservation of quantity), while minimizing their weaknesses? Combine a linear, proportional change in lightness with a simultaneous change in hue and saturation. Colin Ware describes this type of palette as “a kind of spiral in color space that cycles through a variety of hues while continuously increasing in lightness” (Information Visualization: Perception for Design, Second Edition). The continuous, smooth increase in lightness preserves patterns, the shift in hue aids reading of exact quantities, and the change in saturation enhances contrast.

A color palette that combines a continuous increase in lightness with a shift in hue is a good compromise that preserves both form and quantity. These three palettes show the smooth, even gradations that result from color scales calculated in perceptual color spaces. Color scales with varied hues and contrast are suitable for representing different datasets. (After Spence et al. (1999), chroma.js, and Color Brewer.)

Of the three components of color—hue, saturation, and lightness—lightness is the strongest. As a result, accurate, one-way changes in lightness are more important than those in hue or saturation. For example, a color scale that goes from black to color to white can still be read accurately, even though the saturation is lower at both ends of the scale than in the middle. This allows a bit of flexibility in designing palettes, especially for datasets that benefit from high-contrast color ramps. You also don’t need to worry too much about color scales that drift a little bit out of gamut (the complete range of colors displayed on a particular device) for a portion of the ramp. Just make sure lightness is still changing smoothly.

This palette differs from the ideal with saturation that increases from low-to-mid values, and decreases from mid-to-high values. It’s still readable because lightness, the component of color perceived most strongly, changes continuously. (Derived with the NASA Ames color tool).

All of these palettes are appropriate for sequential data. Data that varies continuously from a high to low value; such as temperature, elevation, or income. Different palettes are suited to other types of data, such as divergent and qualitative, which I’ll discuss next week.

Introduction
The use of color to display data is a solved problem, right? Just pick a palette from a drop-down menu (probably either a grayscale ramp or a rainbow), set start and end points, press “apply,” and you’re done. Although we all know it’s not that simple, that’s often how colors are chosen in the real world. As a result, many visualizations fail to represent the underlying data as well as they could.

The purpose of data visualization—any data visualization—is to illuminate data. To show patterns and relationships that are otherwise hidden in an impenetrable mass of numbers.

Encoding quantitative data with color is (sometimes literally) a simple matter of paint-by-numbers. In 1964 Richard Grumm and his team of engineers at NASA’s Jet Propulsion Laboratory hand-colored the first image of Mars taken from an interplanetary probe as they waited for computers to process the data.

In spatial datasets [datasets with at least two dimensions specifying position, and at least one additional dimension of quantity (a category that includes not only maps, but everything else ranging from individual atoms to cosmic background radiation)] color is probably the most effective means of accurately conveying quantity, and certainly the most widespread. Careful use of color enhances clarity, aids storytelling, and draws a viewer into your dataset. Poor use of color can obscure data, or even mislead.

Color can be used to encode data from the atomic scale (left), to the universal (right). (Scanning tunneling microscope image originally created by IBM Corporation (left), cosmic background radiation image courtesy ESA and the Planck Collaboration (right)).

Fortunately, the principles behind the effective use of color to represent data are straightforward. They were developed over the course of more than a century of work by cartographers, and refined by researchers in perception, design, and visualization from the 1960s on.

Although the basics are straightforward, a number of issue complicate color choices in visualization. Among them:
The relationship between the light we see and the colors we perceive is extremely complicated.
There are multiple types of data, each suited to a different color scheme.
A significant number of people (mostly men), are color blind.
Arbitrary color choices can be confusing for viewers unfamiliar with a data set.
Light colors on a dark field are perceived differently than dark colors on a bright field, which can complicate some visualization tasks, such as target detection.

(Very) Basic Color Theory
Although our eyes see color through retinal cells that detect red, green, and blue light, we don’t think in RGB. Rather, we think about color in terms of lightness (black to white), hue (red, orange, yellow, green, blue, indigo, violet), and saturation (dull to brilliant). These three variables (originally defined by Albert H. Munsell) are the foundation of any color system based on human perception. Printers and painters use other color systems to describe the mixing of ink and pigment.

Lightness, hue, and saturation (sometimes called chroma) are the building blocks of color.

Computers (and computer programmers) on the other hand, do process colors in terms of red, green, and blue. Just not the same red, green, and blue that our eyes detect. Computer screens display colors that are a combination of very narrow frequency bands, while each type of cone in our eyes detect a relatively broad spectrum. Complicating things further, computers calculate light linearly, while humans perceive exponentially (we are more sensitive to changes at low light levels than high light levels), and we’re more sensitive to green light than red light, and even less sensitive to blue light.

Computers calculate color using three primary colors—red, green, and blue. Unfortunately, we see green as brighter than red, which itself is brighter than blue, so colors specified in terms a computer understands (RGB intensities from 0-255) don’t always translate well to how we see.

The combined result of these nonlinearities in our vision is color perception that’s, well, lumpy. For example, the range of saturation we’re capable of seeing for a single hue is highly dependent on its lightness. In other words, there’s no such thing as a dark yellow. Near the center of the lightness range, blue and red shades can be very saturated, but green tones cannot. Very light and very dark colors are always dull.

The range of colors perceived by humans is uneven. (Equiluminant colors from the NASA Ames Color Tool)

CIE Color Spaces
The unevenness of color perception was mapped by the International Commission on Illumination (Commission Internationale de l´Eclairage in French, hence “CIE”) in the 1930s. The CIE specified (and continues to refine) a series of color spaces that allow scientists, artists, and printers—anyone who works with light—to describe colors consistently, and accurately translate color between mediums. CIE L*a*b, for example, is used internally by Adobe Photoshop to interpolate color gradients and convert images from RGB (screen) to CMYK (print).

Another of these specifications: CIE L*C*h [lightness, chroma (saturation), hue] is my preferred tool for crafting color palettes for use in visualization. Because the three components of CIE L*C*h are straightforward, it’s simple to use. Because it’s based on studies of perception, color scales developed with L*C*h help accurately represent the underlying data. I say “help” because perfect accuracy is impossible—there are too many variables in play between the data and our brains. [Another option (used in Color Brewer) is the Munsell Color System, which is accurate in lightness and hue, but not in saturation.]

Choosing and interpolating colors in a perceptual space—CIE L*c*h—helps ensure consistent change across the entire palette. In this example, which varies from pale yellow to blue, the range of green shades is expanded, and blues are compressed in the nonlinear palette relative to the linear palette. Palettes generated via Gregor Aisch’s L*C*h color gradient picker and chroma.js

In short, people aren’t computers. Computer colors are linear and symmetrical, human color perception is non-linear and uneven. Yet many of the tools commonly used to create color schemes are designed more for computers than people. These include tools that calculate or specify colors in the red, green, blue (RGB) or hue, saturation, value (HSV) color spaces. A constant increase in brightness is not perceived as linear, and this response is different for red, green, and blue. Look for tools and color palettes that describe colors in a perceptual color space, like CIE L*C*h or Munsell.

In the rest of this series, I’ll outline the principles behind the “perfect” color palette, describe different types of data that require unique types of palettes, give some suggestions for mitigating color blindness, and illustrate some tricks enabled by careful use of colors.

Somehow, a recent conversation on Twitter about tweet density led to a mention of the installation of variable highway lighting in the Netherlands, by way of the 2012 NASA/NOAA city lights map. Which made me wonder—would we be able to see the effects of the new lighting from space? After all, the day night band on the VIIRS instrument is sensitive enough to see by starlight.

According to Philips, streetlights along the A7 from Purmerend (just north of Amsterdam) to Wognum are dimmed 50% after 9 pm. The system was switched on in early 2013, so any change should be visible in 2012 vs. 2013 data:

It’s subtle, but the difference is clear.

A wider view of the area reveals more interesting features:

The brightest lights in the Netherlands are from clusters of greenhouses—perhaps growing hydroponic vegetables 24 hours a day. The Netherlands and Belgium are more densely populated and brightly lit than neighboring France and Germany, including their highways. Unsurprisingly water bodies are largely dark, but there is a scattering of boats in the North Sea.

A few notes on the image processing. These data are from the NOAA CLASS system, which is the primary archive for VIIRS data. Unfortunately they’re not in a convenient format: each file is an 80-second chunk of the satellite’s orbit (called a granule), with no preview images. The file format is HDF, which many scientists like but can be (extremely) difficult to read. (This Beginner’s Guide from Colorado State University may be helpful.)

The product is labeled “VIIRS Day Night Band SDR.” SDR means “science data record” which is a calibrated measurement, in this case Watts per square meter per steradian. It turns out this is a very low number, so to visualize the data we multiply everything by 1,500,000,000,000,000 (1.5E15) to make a usable 16-bit grayscale image, which looks like this:

There’s little detail aside from the very brightest lights, so we take the square root of the data to accentuate low values and compress high values—the results of which are in the top images. I usually apply additional contrast adjustments on published imagery, but here I wanted to be more conservative.

Dan W. Williams made a GIF with the 2012 and 2013 images which makes the changes easier to see. Thanks!

Update 2:

By request, here are the full files as GeoTIFFs. You’ll need a TIFF reader that supports floating point (Photoshop does not) to read the “Raw” and “Scaled” data. Photoshop (and even Safari on a Mac) will read the two “Square Root” files.

The Landsat Data Continuity Mission is now Landsat 8, and that means images are now public (woohoo!). NASA handed control of the satellite to the USGS yesterday (May 30, 2013), and calibrated imagery is available through the Earth Explorer. Unfortunately, the Earth Explorer interface is a bit of a pain, so I’ve put together a guide to make it easier.

You can search, but not order data, without logging in—so register if you don’t have an account (don’t worry, it’s instant and free), or log in if you do.

The simplest way to select a location is to simply pick a single point on the map. You can define a box or even a polygon, but that makes it more likely you’ll get images with only partial coverage. Navigate to the location you’re interested in, and click to enter the coordinates. You can choose a data range, but right now there are only 3 or 4 scenes for a given spot, so skip it and just click “Data Sets”.

On the data sets page, you can search everything from Aerial Imagery to Vegetation Monitoring. Click the “+” symbol next to Landsat Archive, then the first check box that appears: “L8 OLI/TIRS” (which stands for Landsat 8 Operational Land Imager/Thermal Infrared Sensor (creative, no?)). Click “Results” to start a search.

After a short wait, you’ll get a list of available images. The thumbnails aren’t big enough to show much, so click on one to see a slightly larger image. Close that window, and click the download icon: a green arrow pointing down towards a hard drive …

… which doesn’t actually download the data, just provides a list of download options. “LandsatLook” are full-resolution JPEGs, and are a quick way to check image quality (I’d prefer full-resolution browse images without a separate download, but I digress). The Level 1 Product is terrain-corrected, geolocated, calibrated data—a bundle of 16 bit, single-channel GeoTIFFs. Select the “Level 1 Product” radio button, then click “Select Download Option”.

Done! Oh, wait. Not done. You need to click one more button: “Download”.

Now you’re done. The data should arrive in your browser’s designated download folder.

Drop a note in the comments section if I’ve skipped a step, or if you have any other questions. Next week I’ll explain what to do with the data once you’ve got it.

This photo of Skylab was taken by the astronauts of Skylab-2 as they left the space station and departed for Earth on June 22, 1973. More photos from all three Skylab missions are archived on NASA’s Gateway to Astronaut Photography of Earth.

To look through the rest of the Skylab collection, select Find Photos > Search > Mission-Roll-Frame from the menu in the upper-left hand corner of the Gateway to Astronaut Photography of Earth home page. Under Missions pick one or more of SL2, SL3, and SL4, then delete the “E” in the Roll field. Finally, hit Run Query at the bottom of the page. On the Database Search Results page, enable the Show thumbnails if they are available checkbox. Click the number in the Frame column to view a screen-sized image. High-res images are downloadable from each Display Record, just click the View link for the image size you want.