With Safari, you learn the way you learn best. Get unlimited access to videos, live online training,
learning paths, books, tutorials, and more.

Chapter 4. Ratios and Rates

“We are the safest large city in America, but any crime rate is too high.”

—Michael Bloomberg, former mayor of New York City

Until now, we’ve considered absolute amounts of countable and measurable nouns, like
population, tons of garbage collected, and number of rat sightings by
borough. Pleasant, I know.

But not all boroughs are created equal. As we saw in Chapter 1, they have different surface areas and
different numbers of people living in them. The fact that they produce
different amounts of garbage isn’t surprising, and probably doesn’t make for
a very interesting message to communicate to an audience.

These absolute comparisons, while helpful for showing the big picture,
aren’t generally considered “apples-to-apples” comparisons. There are
apples, and, well, there are Big Apples.

A helpful way to “normalize” comparisons is using ratios, rates, proportions, and
percentages. What’s the difference between these four?

A ratio is a comparison of two terms expressed as a quotient. For example, Manhattan
produced 0.264 tons of recycle for every ton of refuse. Ratios can be
expressed as “x to y,” “x:y,” “x/y,” or as a decimal.

A rate is a ratio in which the two terms have different units. For example, the
population density of Brooklyn is 36,136 residents per square mile.
Rates are often predictive because time can be used as the denominator
(crime rates, population growth rates, etc.).

A proportion is a ratio in which the numerator is a partial amount and the denominator is
the total amount (expressed as a number between 0 and 1). For example,
the proportion of the NYC population living in the Bronx is 0.169. A
proportion is expressed as a number between 0 and 1.

A percentage is a ratio comparing a number to 100. For example, 16.9% of NYC residents live
in the Bronx. A percentage is generally a number between 0 and 100, but
can be larger than 100 (e.g., “sales have increased by 150%
year-over-year”).

These types of normalized comparisons can make for much more
interesting messages to communicate. Luckily, they are quite easy to create
in Tableau. We’ll explore the first two in this chapter, and the last two in
the next.

Ratios

The data source itself may include a field that is a ratio, but most often
the raw data includes terms that an analyst needs to combine to create a
ratio. Tableau can handle these types of computations with Calculated Fields, in which new fields can be created by
combining existing fields using a variety of operations. We introduced the
concept in Chapter 2, and now we’ll build
on it.

Let’s return to the DSNY data set to illustrate the technique, and
we’ll see if the ratio comparison is any more interesting than the
absolute figures.

The data reported by DSNY includes three continuous Measures
(quantitative fields) for each community district:

RefuseTonsCollected

Amount of nonrecyclable garbage

PaperTonsCollected

Amount of recyclable paper

MGPTonsCollected

Amount of recyclable metal, glass, and plastic

The second two data types (paper and MGP) are both recyclable, and
so can be summed to create a new variable for each community district that
we will call RecyclableTonsCollected to
stay consistent with the provided nomenclature.

A comparison of the ratio of recyclable material to refuse would be interesting to
consider, as it would indicate which communities in New York City are
doing a better job recycling.

First, we’ll need to create a new calculated field that sums the two
recyclable fields. Right-click anywhere in the lefthand Dimensions or
Measures panels, and select Create Calculated
Field. Then, fill out the resulting dialog box, as shown in Figure 4-1.

Figure 4-1. Summing two fields with a Calculated Field

Next, let’s create a second calculated field that creates the ratio
of recyclable material to refuse. Right-click in the Dimensions or
Measures area again and fill out the dialog box as shown in Figure 4-2.

Notice that we put the numerator and denominator in brackets,
preceded by SUM. This is critical, because as we move up a level of from community
district to borough, we don’t want to add the
individual ratios of each community district; we want to create a new
ratio that represents the quotient of the aggregate amounts. The first
approach would create a larger ratio that is misleading.

Figure 4-2. Creating a ratio with a Calculated Field

Now that we’ve created the ratios, how can we best visualize them?
Let’s start with a simple bar chart of the aggregate ratios of each of the
boroughs. To create one, we’ll drag the Borough field from the Dimensions area on the
left to the Rows shelf, and we’ll drag the Recycle to Refuse Ratio calculated field to the
Columns shelf. Color wouldn’t strictly be needed, but we’ll add it anyway
for aesthetics by dragging Borough to the Color shelf. Lastly, we’ll sort the bars in descending order to put the borough
with the highest recycle ratio at the top. The screen shot in Figure 4-3 shows the resulting view.

We can easily see from this simple bar chart that Manhattan has the
highest ratio of recycle to refuse, but that it still only recycles about
one ton of material for every four tons of trash. The ratio for Manhattan
is about twice as great as the ratio for the Bronx.

Does this mean each community district in Manhattan had a higher
recycle ratio than each community district in the Bronx? Not necessarily,
as we’re dealing with the terms in the aggregate.

How would we communicate the results at a community district level?
If we just replace Borough with
CommunityDistrict in the Rows shelf,
then we’ll get a bar chart where all the community district #1s will be
aggregated, and all the community district #2s will be aggregated, and so on. Aggregating community districts
in this way isn’t very meaningful.
Instead, we’d like to see community district ratios for each
borough.

Figure 4-3. Visualizing the ratio of recycle to refuse for each
borough

To make this comparison, let’s click and drag the Borough pill from the Rows shelf up to the
Columns shelf, and add the CommunityDistrict field from the Measures panel
to the Rows shelf to create a grid of bar charts, as shown in Figure 4-4.

Figure 4-4. Ratios by community district and borough

From this view, we can now see that even within boroughs, not all
community districts are created equal. Manhattan community district #1 has
the highest ratio overall, but it’s clear that some community districts in
the Bronx have higher ratios than some in Manhattan.

But what if we wanted to know the exact ratios? The bar charts in
Figure 4-4 make it fairly
easy to get a general sense of the relative ratios for the community
districts, but precision isn’t easily obtained, is it? Can you tell the
exact ratio for Manhattan community district #1? I can’t. If knowing the
precise amounts is a task our audience will want to perform,
we’ll need to think about how to make it easy for them to accomplish that
task.

We could increase precision by dragging the Recycle to Refuse Ratio from the Measures area
to the Label shelf (or Ctrl-selecting AGG(Recycle to Refuse Ratio) and dragging it
from the Columns shelf to the Label shelf to duplicate). The
resulting bar chart grid with labels is shown in Figure 4-5.

Figure 4-5. Adding labels to the bar chart grid

That’s a lot of decimal places, isn’t it? Our audience probably
doesn’t need to know the ratio to the fourth decimal place
(ten-thousandths). We can change the number of digits shown by
right-clicking the Recycle
to Refuse Ratio in the Measures panel, selecting Default Properties, then Number Format, and choosing Number (Custom). In the resulting dialog box,
we’ll change the decimal places to three, as shown in Figure 4-6.

Figure 4-6. Specifying the number of decimal places in the Measures
panel

Notice that the labels in the bar chart grid change
accordingly.

What we really have here is a table that has been augmented by
adding bars with lengths proportional to the values in each cell. Another
way to show this precise comparison is by getting rid of the bars and
instead coloring each cell by the ratio—a highlight table.

Let’s make a highlight table by starting with a new sheet.
Ctrl-click Borough, CommunityDistrict, and Recycle to Refuse Ratio, and then open the
Show Me panel and select highlight table. Tableau creates the view shown
in Figure 4-7.

Figure 4-7. The default highlight table for Recycle to Refuse Ratio

Notice that Tableau created a highlight table that’s “flipped”
compared to our bar chart grid shown in Figure 4-5 (Borough is in Rows instead of Columns, and
CommunityDistrict is in Columns instead
of Rows). To flip it to match, we’ll just click the Swap button in the toolbar, , resulting in the taller table shown in Figure 4-8.

Figure 4-8. The highlight table with Rows and Columns swapped

Tableau has defaulted to a sequential green color palette, meaning
the gradient increases proportionately with increasing Recycle to Refuse Ratio. Our eyes can easily
pick out the darkest cell, and we also find the lightest rather easily.
Most people can distinguish between about six different levels of
intensity. If the goal were to communicate the precise values, we could
have created a simple table without the highlights. Adding the highlights
makes it easier to quickly compare cells and find the highest and lowest
values.

It’s still not immediately obvious which community district has the
second-highest ratio, though. Our eyes have to dart around to all the dark
cells and read and memorize the values. Eventually we settle on Brooklyn
community district #6 as having the second-highest ratio, but the task
gets harder as we move along. Which has the third highest? The fourth? Is
there an easier way to facilitate this assessment of rank?

What we’d like is a simple list of community districts in descending
order of the ratio. To make this list, let’s first create a new field that
combines the Borough and CommunityDistrict fields into a single
Dimension.

Ctrl-click Borough and CommunityDistrict so they are both selected,
then right-click on one of the two selected fields (either will do fine)
and select Combine Fields. A new
string field will appear in the Dimensions area called Borough & CommunityDistrict
(Combined).

Create the list shown in Figure 4-9 by dragging Borough & CommunityDistrict to the Rows
shelf, Recycle to Refuse Ratio to the
Columns shelf, and Borough to the Color shelf, sorting in
descending order.

Figure 4-9. Bar chart of community districts

Now we can quickly tell which are the top four or five districts,
and which are the bottom four or five. Using the colored bars, we also can
see some patterns, like that the districts with the lowest ratio are all
in the Bronx (blue), and most of the top ten are in Manhattan
(green).

But if I asked you to pick out the tenth or the fifteenth districts,
you’d have to count down from the top, right? Not the easiest task to
perform. How can we add rank to this list to easily pick out the order
from 1 to 59?

Two Ways of Adding Rank

Let’s create a Rank field. To
do so, right-click anywhere in the Dimensions or Measures panel
on the left and select Create a Calculated
Field once again. This time, name the field Rank and enter INDEX(). As soon
as the message appears in the bottom left indicating “the calculation is
valid,” blue text will also appear in the top right that says “Default
Table Calculation.” Click on this blue text and change the Compute using drop-down to Borough & CommunityDistrict (Combined),
as shown in Figure 4-10.

Figure 4-10. Creating a rank for each of the 59 community districts

This tells Tableau that the rank should apply to the combination
of Borough and CommunityDistrict.

After clicking OK twice, we
now see a new field in the Measures panel called Rank. Notice that this field is continuous
(the # symbol is green), but we can consider the rankings to be discrete integers,
because there is no rank between any two successive ranks. To change the
Rank data field type, right-click on
it in the Measures panel and select Convert to
Discrete. The # symbol
next to Rank will turn blue, indicating that it’s discrete. We’re now ready
to add Rank to the chart.

To add the Rank field to the
bar chart, simply drag it to the left of Borough & CommunityDistrict (Combined) on
the Rows shelf. After removing the row dividers, we have the colored, sorted, and now ranked bar chart
shown in Figure 4-11.

Figure 4-11. Adding the Rank field to the bar chart

Alternatively, since the launch of Version 8.1, users can create a
similar Rank field by making use of
the new Rank Table Calculation. To do so, drag another instance of
Recycle to Refuse Ratio to the
columns shelf, click in the down arrow of the new pill, and change it to
Discrete, and then click the down
arrow again and select Quick Table
Calculation and then Rank. Finally, click the down arrow once more
and select Compute using, then
choose Borough CommunityDistrict
(combined). (This calcuated field also lets you handle ties
in different ways.)

Now the only task this chart doesn’t facilitate is gleaning
precise values. We could add the labels to the right of the bars as we
did in Figure 4-5 by simply
dragging the Ratio field to the Label
shelf, but perhaps a better approach would be to add the labels to the
left of the bars so as not to interfere with the perception of the bars’
lengths.

Because Recycle to Refuse Ratio
is a continuous data field (the #
symbol is green), if we just drag it out onto the Rows shelf to the
right of Borough &
CommunityDistrict, Tableau will create 59 y-axes for each of
the bars. Try it and see what I mean.

We just want the ratio value itself to appear, so we’ll first
convert Recycle to Refuse Ratio to a
discrete field the same way we converted Rank, and then we’ll drag the ratio onto the
Rows shelf to the far right, leaving a blue pill and creating the
updated view shown in Figure 4-12.

Figure 4-12. Precise values added to the left of the bars

We’ve now visualized the ratio in a way that makes it easy for our
audience to perform a number of different tasks very quickly:

Get a general sense of how the 59 community districts compare
relative to one another

Determine which districts have the highest ratios, and which
have the lowest

Get a general sense of which boroughs have districts at the
top of the list, and which have districts at the bottom

Know the precise ratio for any district

Pick out any district in rank order (e.g., the
12th or the
28th)

This view is a very versatile and useful way to communicate the
ratio of recycle to refuse. But it doesn’t tell us which community
districts produced more trash overall per person.
For all we know, the districts at the top may be recycling a lot, but
they may also be producing a much higher amount per person. That’s where
rates come into play.

Rates

Recall that a rate is just a special type of ratio in which the numerator and
the denominator have different units. In the previous example, the ratio
of recycle to refuse involved terms with the same units—namely, tons. When
we consider the trash production rate per person, our numerator is still
tons, but now our denominator is the number of people.

But the data set we have been working with so far doesn’t include
community district population. In order to determine trash production per
person, we’ll need to find a way to include population data in our
analysis.

Going back to the data discovery process in Figure 1-1, we’ve gone all the way around the
horse track, and now we have a brand-new question that involves gathering
new data.

Ideally, we would find population by community district for
September 2011, which is when the DSNY collection data was taken. We can’t
find that, but we came come close: 2010 census data puts us
within a year’s time of the trash collection data, and is probably
suitable to get a reasonable approximation of the trash production
rate.

Now that we found population data, how can we create rates with it?
Luckily, Tableau allows us to connect our current workbook to this new
data set and “blend” it with the existing data in the workbook.

Blending Data Sources

From the Data menu of our current workbook, select Connect to Data and then Microsoft Excel, and browse to the population
data file, connecting to the correct sheet in the file and importing the
data as an extract. We now see the new data source appear in the Data area along
with the DSNY data, as shown in Figure 4-13.

Figure 4-13. Adding a second data source to a workbook

The newly imported data set is highlighted in the Data area, and
the fields for that data set are shown in the Dimensions and Measures
panels below. If we click on the DSNY data set, the data fields for that
data set will show below.

Notice that CommunityDistrict
is in the Dimensions panel of the DSNY set, but it’s in the Measures
area of the new population data source. Let’s match them up by dragging
CommunityDistrict from the Measures
panel of the population table into the Dimensions area.

Because both data sources now have a Dimension called Borough and a Dimension called CommunityDistrict, Tableau will link them
together for us. If the fields to link had different names, we would
have to manually link them using Data→Edit
Relationships.

Visualizing Rates

Now that we have linked these two data sources, our updated Tableau workbook
contains both terms of the refuse per person ratio: the measurement of
the amount of refuse in tons (the numerator) and the count of the number
of people in each community district (the denominator). We’ll create the
rate the same way we created the ratio in the previous section, but now
we need to create a calculated field that includes fields from two
different sources, as shown in Figure 4-14.

Figure 4-14. Creating a rate using fields from two data sources

Using the SUM aggregation
type for both terms in the rate means that we can get
meaningful rates for boroughs as well as community districts. Also,
notice we are converting from short tons (US) to pounds (lbs.) by
multiplying the numerator by 2,000—which is how many pounds are in each
ton. On a per person basis, it’s much easier for us to think in terms of
pounds, so this is a better unit to use for this comparison.

Now that we have created the rate, we can visualize it. We’ll
follow the same steps that we used to create the ratio bar chart in
Figure 4-12 to create the rate comparison bar chart shown in Figure 4-15.

Figure 4-15. Comparing rates in a bar chart

It’s clear from this view that the three community districts in
Staten Island produced the most refuse per person in September 2011, at
a rate of more than 70 pounds per person.

Warning

A word of caution about these results: often, data visualization
and data analysis are best used to propose new questions to ask. Data
is great at helping us make comparisons, but it doesn’t always answer
“why?” and “how?” It can be easy for us to jump to conclusions based
on what the data shows, but some further qualitative investigation is
frequently required.

Why are the Staten Island community districts at the top of this
chart? The chart itself doesn’t tell us. It helps us formulate the next
question. Does this mean every person who lives on Staten Island
produced 70 to 80 pounds of trash that month? No, that’s not what the
data shows. The data just shows how much trash was collected from their
communities. DSNY provided overall refuse
collected, not just for residences, so perhaps there are industries or
businesses on Staten Island that produced a lot of trash.

The point is to be careful with what we communicate, and not to
communicate more than the data gives us license to say.

Summary

In this chapter, we learned how to use Calculated Fields and blended
data sources to create ratios (quotients) and rates (quotients with mixed
units). We also learned the Index function as well as the Rank table
calculation, and created highlight tables and bar charts to compare
different dimensions—at both the borough and community district levels.
This enabled us to go beyond overall amounts (“how much”) and to consider
relative amounts (“how much per”).

In the next chapter, we’ll consider another type of normalized
comparison: proportions and percentages.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training,
learning paths, books, interactive tutorials, and more.