Interactive Hierarchies - Drowning in rats

February 01, 2019

It’s been a while since my last blog (turns out moving both job and city requires some effort), but for my glorious return I’ve decided to dive into the world of interactive graphics. In particular I’m going to investigate some alternatives packages that allow us to visualise hierarchical data.

Interactive visualisations within R can be created using htmlwidgets. This specific set of packages brings the a host of interactive JavaScript visualisation libraries to R, without the user needing any prior JavaScript knowledge. Notable htmlwidget packages are:

networkD3 - A library that offers you the ability to create D3 network graphs.

DT - an R interface to the JavaScript library DataTables, a great way for displaying tabular data.

So lets get started. The most obvious example that came to me when thinking about hierarchical data, is the manner in which scientists classify living organisms. All organism, with the broadest category at the top and most specific at the bottom, the scheme looks like this:

Kingdom

Phylum

Class

Order

Family

Genus

species

After some digging around on the web I found Mammal Species of the World, 3rd edition, a database of mammalian taxonomy. You can also download the entire taxonomy as a csv file. So lets start by loading the tidyverse and importing the data,

Ok so we’ve got a dataframe of 13582 different species of mammals, with different columns for different levels of the hierarchy, such as Order and Family. The nomenclature consists mainly of Latin, which isn’t something that means a lot to me. Luckily there is a column of common_names, so I’m going to use this variable instead of species. I’ll filter down to those species which are neither extinct and also have a entry in common_names. If I had more domain knowledge, this isn’t the route I would take, but as I want to explore the data, Giraffe conveys more meaning than camelopardalis .From previous inspection I also know there are few non UTF-8 characters spread throughout so I’ll correct that too. So with that in mind, lets take a few steps to clean our data.

Now that our data is prepared, lets jump in to our first visualisation. This comes from the collapsibleTree package, which creates Reingold-Tilford tree diagrams using D3.js. We simply pass our data to the collapsibleTree function, and specify the orders of the hierarchy. This collapsible tree diagram is a really powerful visualisation for hierarchy data as you can navigate your way through the entire hierarchy.

This diagram gives a more stark view, and we can instantly see how rodents are by far the biggest order of mammals. We can click through this treemap and find out within the Rodentia order, within which Muridae is the largest family, within which Rattus is the largest genus. Upon opening the Rattus genus we discover 66 different species of rats, like the Timor Forest Rat and the Roof Rat! This isn’t quite the ideal use for this visualisation, as in the final tier, each species has a count of 1, but I wanted to understand what species made up each genus.

One final visualisation for hierarchical data is the sunburst diagram, from the sunburstR package. It isn’t really a good fit for this dataset, but lets have a look anyway.