This course will expose learners to additional tools that can be used to perform Data Visualization. In particular, the courses focuses on Tableau, a state-of-the-art visualization package. In this course, the visualization concepts from previous courses are reinforced and the Tableau software is introduced through replication of the visualizations built in previous courses.

Meet the Instructors

Ross Maciejewsk

Associate Professor at Arizona State University in the School of Computing, Informatics & Decision Systems Engineering and Director of the Center for Accelerating Operational EfficiencySchool of Computing, Informatics & Decision Systems Engineering

Welcome to the third video,

covering charts in Tableau for data visualization.

In this case, we're going to take a look at the advanced plots

that we created in Jupyter Notebook before,

specifically the mosaic plot,

the parallel coordinate plot and the scatter plot.

In the Jupyter Notebook demonstration,

we also created a pixel plot and a Wordle.

However, there is no way to duplicate those as it is in Tableau.

So, if you are determined to use Tableau you're better

off working with a different representation for these two plots.

Let's get started with the mosaic plot.

The mosaic plot is a little bit more complicated,

because if you remember from

the Jupyter Notebook demonstration we created what's called a confusion matrix,

listing high and low values of protein and sugar.

We can duplicate this in Tableau,

but it requires a little bit more thought.

So we need to create a new dimension for high protein,

low protein and high sugar and low sugar.

We can do this in Tableau by creating what's called a calculated column.

From this drop down in the dimension's pane,

we can create a calculated field which I will call protein high-low

and then input the simple calculation that's needed to assign a high or low

to protein values depending on what the original sugars protein value is.

You'll notice that this syntax is

significantly different from the Python syntax in resembles

the Excel function syntax much more closely than it does Python.

Next, I create the same calculated column for sugar instead of protein.

In this case, I've selected the one gram value

for greater than one gram value for

high protein and greater than three grams for high sugar.

If we'd like to, we can go back to

our data source pane and see the results of our new calculated column.

As you can see, there are many cereals that are both high in protein and high in sugar.

In this case, if

you have a different threshold for what constitutes high or low protein or sugar,

you can modify these calculated fields to suit your needs.

Next, we'll take these calculated fields that we just created and move those to

the columns bar and then we'll move the number of

records which is what we're counting up in our mosaic plot to the rows field.

This automatically gives us a bar chart,

but what we really want is called a tree map by Tableau.

Here, you can see the result of our mosaic plot.

In these thresholds, many cereals 41 have a high protein and high sugar,

but very few cereals only two have both low sugar and low protein.

Just like before, we're going to title

this mosaic plot and move on to the next worksheet.

In this case, we're going to create a parallel coordinate plot,

plotting carbohydrates, fat and

protein just like we did in the Jupyter Notebooks example.

Here, we're going to take the measure names and move it

to columns since we want to use more than one measure and measure

values to rows since again we want to use

more than one measure being carbohydrates, protein and fat.

Notice that this plot all of our measures values

which means that the high relative values of sodium,

potassium and calories drown out the low relative values of carbohydrates,

fat and protein which are measures of interest.

So we're going to move over to the measures pane and

remove all of the measures other than the ones that we are interested in.

Now that we have all of the measures that we're interested in,

we can change the automatic marks that Tableau has generated being bars to lines.

This gives us the line charts of the some of the carbohydrates,

fat and protein but as aggregates

instead of as individual values which is what we'd like to see.

So, since we'd like to compare the values of carbohydrates,

fats and proteins cereal by cereal,

what we can do to disaggregate this graph is to

drag the cereal dimension to the color field.

And then since this has 77 members,

it gives us a prompt that asks,

are we sure that we do this?

In this case, we are.

The final result is our parallel coordinate plot

which with each cereal given a different color.

There's so many lines on the graph,

this can be a little bit difficult to interpret.

Rather mounting over each individual line to try to

find a serial that's right based on the parallel coordinate plot,

but we can do in Tableau is we can easily add filters.

This filters pane allows us to add arbitrary filters on cereal dimensions.

So let's drag the protein high-low filter to

the filters pane and then filter on high protein cereals.

This cuts down our visualization a little bit,

but it's still pretty noisy.

So let's take the sugar high-low dimension and add it to

the filter pane and then filter on low sugar cereals since we're trying to eat healthy,

we want high protein and low sugar.

In this case, the visualization is reduced to

a much more interpretable number

of cereal elements and we can see that some of the cereals may include Cheerios,

Quaker Oatmeal or All-bran with extra fiber.

Duplicates and in fact adds additional functionality to

the parallel coordinate plot we created in Python and Jupyter Notebook.

Well, it may not be more difficult for an experienced Python user to add

these filter conditions to make the parallel coordinate plot more interpretable.

It's hard to argue that it's much easier for an uninitiated person to use the filtering

built-in to Tableau to reduce the set

of plotted cereals to something that is much more interpretable.

Next, let's move on to the scatter plot, in a new worksheet.

Previous. In Jupyter Notebook,

we created a scatter plot of protein and fat,

so let's duplicate that here.

We drag the protein measure to columns and the fat measure to rows.

In this case, Tableau has automatically selected to use the sum of

all protein values and the sum of all fat values which gives us

exactly one data point which represents

the sum of fat and protein for the entire data set.

So instead, we're going to plot the dimension protein against

the dimension fat and that gives

us the same scatter plot that we had in the Jupyter Notebook.

In this video, we've duplicated three of

the five more complex statistical graphics

that we covered in a previous Jupyter Notebook example.

The pixel plot and wordle aren't available in Tableau,

but those are relatively easy to duplicate outside of

Tableau if you'd like to follow the Python example.

Explore our Catalog

Join for free and get personalized recommendations, updates and offers.