#TakeapartTuesday Chicago Taxis

I think people may have been intimidated by the prospect of wrestling with 105 million rows of data for #MakeoverMonday this week. To my eye at least, it seemed we had fewer submissions than usual, but the community dashboard will validate that in time.

My own submission was limited from an analytical angle, but it was the look and feel that I wanted to achieve, having immediately decided to implement Tim Ngwena‘s recent “Step Line Chart” above a (publicly available) view of the Chicago skyline:

At the time of writing (Monday evening in the UK), there’s one dashboard that stands out from the crowd for me, and it’s from co-founder of #MakeoverMonday – Andy Kriebel. I’ve previously written about this type of dashboard – I lazily label them as “corporate” – and Andy’s viz this week is so good, and so in keeping with the style I want to embed at work, that I’d be an idiot not to break this one apart.

As usual, I’m going to download the original workbook to use as a visual reference, and I’ll try to build the specifics of the dashboard from scratch in a new workbook. In this case, I’m going to create a duplicate of Andy’s workbook and clear out all the sheets, calcs, parameters, sets, groups – everything. I’m doing this because I deleted my own #MakeoverMonday dataset as it was so big. All I’ll retain from Andy’s sheet is the source data and the Data Source filter he applied (excluding 2017).

Starting at the top, I won’t dwell on the title, but I will look at the summary sparkline, which shows total trips per month, with a black dot on month one and a red on on the last month. It’s straightforward enough:

Dual-axis line and dot, with two relatively cunning components. First – the [First and Last] calculation:

This ensures that Circles are only drawn for the first and last values across the table. To colour the dots, I need to be able to differentiate between the first and last records:

A nice gentle introduction, and things are going swimmingly. There’s the standard clean-up of gridlines, axes etc., but that’s about it. I couldn’t find a logo to match the one Andy used, so just whacked one purporting to relate to Chicago taxis on there, to complete the top line on the dashboard:

Next we need to recreate the vertical sparklines, which appear to essentially be the same as the overall total, but for the top ten destinations in the final month of 2016. I started off by duplicating the first worksheet, and fiddling with Rows and Columns a bit:

Duplicating MONTH(Trip Start) allows me to replicate the “Monthly trips” axis title at the top, whilst retaining visibility of Year on the lower x-axis.

With hindsight, I prefer Andy’s method:

A “dummy” in-line Discrete “calculation” on Columns, with the Field Labels hidden. A neat little trick that squeezes the white space up on that uppermost x-axis.

Adding From to Rows splits the view up into all of the source locations for each taxi ride. All we need to do now is isolate the ten locations with the highest number of rides in December 2016 (the last month). “All”! Took me ages, and I had to use a load of calculations to achieve what can presumably be achieved more efficiently.

First, I found the latest date from each From:

I then created a boolean with this calculation and the [Trip Start] field:

I could then evaluate that statement in a final calculation:

This final calc is the one that could be used as a filter:

And it could be used to sort the view too:

Note that by this stage, I realised that I also needed to exclude “** masked for privacy **” from the data.

It seems like too many calcs, but if I tried to nest it, things went Pete Tong, and I generally find that breaking things down into lowest common denominators gets me there in the end.

Next in the dashboard, we have a bunch of columns of data to create, which represent certain KPIs for these Top Ten journey source locations. First one is just the number of trips in the last month, so I can leverage an existing calculation. Again, I’m just duplicating an existing sheet to get the foundation for the next. This time, I took the Top Ten Sparklines sheet and worked with that:

Same filter, same sort, so the same Dimension members are in the view. It’s just a case of getting the right Measure onto text. Note also the removal of Row Banding and the inclusion of another “dummy” field on Columns to force a nicely worded Header into place.

The next column shows the percentage change relative to the same month in the prior year. To address this, I first need to calculate that prior year date:

Which I validated here:

With that in place, I then continued my merry little game of writing small calcs to achieve the desired solution. I duplicated existing calcs and recycled them:

Here I got a bit tied up, even though the way to calculate the variance is pretty easy. It looks as if it should just be ZN(([Last Month Value] – [Last Month in Prior Year]) / [Last Month In Prior Year])

This returned a big fat zero. Lots of them:

Let’s break it down to see where it goes wrong. Surely the [Last Month Value]-[Last Month in Prior Year Value] bit is OK?

More zeroes. OK, so I can’t refer directly to both calcs. Will it work if I refer to one indirectly?

Good. No zeroes and the calc is doing the right thing. Now I just need to modify the modified one to get the percentage nailed:

That’s half an hour that I won’t get back. No real clue why the first calc doesn’t work and I can’t explore the data to see my calcs so I won’t worry too much about that. To finish this worksheet, I needed to ensure that I kept the [Last Month Value] on detail to retain the sort, and I added a simple boolean to colour the text:

The third column is called “Latest Year”, so I assume it is the 2016 total. Again, I just duplicated the last sheet, and this time I added this new calculation on top of Text:

Other than that, I just needed to remove the colour calc from the Colour shelf, and this was a pleasingly simple sheet to build. Onto the final column which compares 2016 trips to 2015. I followed the same process as the other prior period comparison sheet.

Onto the sort of bar-in-bar / bullet bit. It looks really good, and also like it’s bound to trip me up. First I tackled the red bar, which I read to be the inverse of the [2016 vs 2015] calculation depicted above. So instead of Near North Side showing a (24.2%), it will instead show 75.8% of the 2015 trips being driven. That’s just a case of this:

To get the “labels”, you need to add cell level Reference Lines:

Then just Format the alignment and colouring to get to this stage:

Now it’s on to the challenge of creating that tri-tone shading, which you can tell from Andy’s viz is denoting 50%, 75% and 100% of the prior year total. First, I assumed (hoped!) it was some sort of standard reference band effect. It wasn’t. Hovering over the shaded parts of this chart in Andy’s dashboard, I could distinctly see three separate grey sections, so I figured it was three calcs grouped and lobbed onto one axis as a sort of stacked bar.

I created a trio of calcs, based on this logic:

You can figure what the 75% and 100% equivalents were. I then fired up a new sheet to test how this would look as a stacked bar:

The colours aren’t quite right, but the concept seems sound, so it was time to bring this idea into the original sheet:

It came out quite well. I took longer choosing shades of grey than anything else. The main complications were basic formatting issues – like remembering to turn Stack Marks Off for the stacked bar, and to set the shading for the Reference Line value to 0% to ensure I didn’t have a shaded area behind the numbers, disrupting the smooth aesthete.

Time to quickly look at the dashboard so far:

Everything is tiled – with one big text box for the title, the summary sparkline and an image for the taxi icon. In the main charty bit, the 10 sparklines and the bars sandwich a Horizontal Container which houses the four columnar views. Things are fairly well aligned, which was achieved by just tweaking the lower row dividers in each chart to get them “just so”. I thought I could put the Horizontal Container inside a Vertical Container, and drop a blank tile in there to control the vertical positioning, but that didn’t work out too well.

There is just one main component to get right now, before I can consign this dashboard to the “done – but it was hard” pile. I started off by duplicating my bar-in-bar / bullet sheet, and I then just copied and pasted Andy’s title. Flicking away from Andy’s worksheet, I looked back at his dashboard to see how the structure of this part of the viz appeared to be laid out.

A useful clue was provided by this video, which Andy uploaded to demonstrate the effectiveness of aggregated extracts:

It’s useful as it showed that Andy appears to have written a calculation to combine the From and To fields, in order to give a unique string to filter and sort by. I tried initially creating a combined field, but it appears you can’t filter by combined fields. The calc was a basic one:

I now needed to filter to the top ten routes by volume in 2016, and sort them in descending order:

This taught me a few things. Firstly, I needed to use Context filters for From and To (I was excluding the ‘** marked for privacy **’ Dimension members). If I didn’t set them to Context filters, the Top 10 only returned 9 results, as the 1st result was actually routes to and from locations where the source / destination remained private. Note that the ‘#’ calculation is just INDEX() running down the table to give a view of rank.

It also showed me that Andy’s Top 10 differed to mine. I maybe am using a different top ten, because when I built a “clean” worksheet to test the top ten, it seemed to go along with my original sheet, with Andy’s Lake View to Lake View route a distant twelfth:

These differences create an impression later on, with the average fare and tip metrics skewed a lot with trips from O’Hare airport affecting the average fare and tip measures.

Average Distance was a simple one:

And as I have created similar views before, I knew I was dealing with three lots of dual-axis bar / gantt charts for the remainder of this section. The first just reused existing calculations, set to Continuous with 2016 as a bar, 2015 as a gantt, axes synchronised and a bit of formatting:

For the next section, average Fare, I had to create new calcs for 2015 and 2016 to sum up the fares for each year. 2015 looked like this, and 2016 was somewhat similar:

If I divided these calcs by the equivalent calcs which summed up the Number of Records per year, I had an average fare to plot. Similarly, the final section is average tip, so I needed to follow the same principles there.

Finally (finally!) I needed to colour the bars for each Measure to denote whether the 2016 values were higher or lower than 2015. But I have run out of steam. I tried and failed to get them to work, but have expended all my enthusiasm recreating the main bits of the viz. The final (oh-so-close!) dashboard is here.