#MakeoverMonday – Data breaches

This week, Andy Kriebel provided us with a whole new website devoted to #MakeoverMonday, as well as the usual weekly challenge. It looks pretty damn sharp, and a slideshow at the top of it reminded me that I have a lot of time for Andy Cotgreave‘s visualisations.

My view? Hectic. It is exploring the World’s Biggest Data Breaches, but I find it hard to gauge if the problem is worsening, and if there is a key driver behind the trajectory of these breaches. Bubble charts have their place, but they’re pretty lousy at actually delivering insight in my opinion. Time to look at the data.

Whenever I start anything in Tableau, I like to explore the data to get a sense of what I’m dealing with. Initially, this is a cursory glance when first connecting to the data. To get a better feel for the size of the dataset and its dimension members, I’ll then just drag some stuff onto a fresh sheet to see what’s what.

This week, that enabled me to rapidly rule out using the three Source Dimensions, as I had no real use for a bunch of URL’s. Adding Alternative Name to Rows revealed 113 marks – too many to distill into a punchy visualisation:

Repeating this process for all Dimensions, I made the following assessments:

Data Sensitivity – 9 marks, so manageable and gives context as these breaches range from email address theft through to stealing credit card and health information.

Entity – 218 records. No thanks. I mean, it’s “nice” to know whose data has been subject of a breach, but is it a critical bit of information? I’m not so sure.

Organisation – Just a manner of grouping entities into sectors. A couple of standout categories, but also a lot of relative sparsity that wouldn’t look great in a chart.

Records Lost Notes – Didn’t see how this was relevant given the Measures in the dataset (Records Lost and Records Stolen)

Source Name – The media agency that reported the breach is not as relevant as the breach itself. Of no interest to me.

Story – Seems like the headline of the article reporting the breach. 191 marks – too much to effectively visualise.

So, I had my Dimension in mind – Method of Leak. Then it was time to look at the Measures:

Didn’t seem like enough variation to be unduly concerned by, so I just elected to make use of the Records Stolen Measure, as it sounds a little more emotive to me than Records Lost.

Next up, it was time to knock up the chart that first sprang to mind when I first saw the Method of Leak data. I am one of life’s great procrastinators, but have learned that I often need to go with my gut instincts when it comes to visualising data:

OK. That sucks! The intention was to try to depict whether or not the number of Records Stolen is noticeably increasing over time, and if there is a particular Method of Leak driving that trend. Here I can see that hackers are the main cause of data breaches, but this chart just isn’t as clear as it needs to be.

Note that the Exclusions are “inside job, hacked, 2004” and “leak, 2016”. I excluded them as they were single marks and they looked weird on the original chart. I have retrospectively reincorporated them in the final visualisation

A change of approach was needed. I decided to add a quick two-pass table calculation to show a running Percent of Total of all data breaches, by Method of Leak and Year across the table:

Why did I do this? I always felt that the “story” I wanted to tell was that hacking is the overwhelming source of data breaches, and by visualising the data this way, it would show that to be the case, whilst also showing that other Method(s) of Leak were generally plateauing.

Once the basic chart was in place, it was just time to format it in a way to create a punchy, single-chart dashboard. I put the “shabby” in shabby-chic when it comes to design, so I took the lazy route by google-ing to find a cool looking title related to hacking. I found this, and that was the foundation of the colouration of my final design:

I quite liked the font, and definitely liked the green. However, when adding it as a tiled image I found that it was off-centre and awkward when set to Fit Image. It was also impossible to find a vaguely equivalent font in Tableau, and as I’m a big believer in consistency of font across a chart / dashboard, I ditched the image but retained the colour. At the end of it, and after correcting a spelling cock-up (thanks Neil!), the end product was this:

Good Charts: The HBR Guide to Making Smarter, More Persuasive Data Visualizations – Scott Berinato

Data Points: Visualization That Means Something – Nathan Yau

The Visual Display of Quantitative Information – Edward Tufte

I have enjoyed some more than others, but perhaps it is my corporate grounding that sways me towards Cole and Scott’s books. Both heavily influence me, and that is why this week’s chart features:

Punchy headline

Annotation (added as the title wasn’t overly descriptive, but was there to catch the eye)

Frugal use of colour to draw attention to key reference points

Axis labels which are clear but don’t leap out to detract from the data

Discreet gridlines

I think the chart is logical. The title is more of an attention-grabber than an informer, but the placement of the annotation resolves that by notifying the reader of the message to take away. The labelling is discreet and the green colouring of the “hacked” Method of Leak stands out and marries well with the use of the word “hacked” in the title and the mark annotation at the top of the chart.

Whilst my Tableau techniques remain rough and ready and firmly a work in progress, it’s actually the design aspect which is most intriguing for me at this stage of my development. If anyone reading this can recommend additional design-centric books for me to read, please let me know!