Downloading data from github repo

So proofreader’s site has a csv, but I found yoni’s pkg most useful because it had scraped the API for more data both in terms of rows and columns. If you wanna hit that API now, download yoni’s package. It’s supposed to have functions for connecting to the API, but I didn’t try it.

So I found myself that download.file was a bit glitchy (trying to download the github website instead of the raw data), so curl::curl_download was my go-to, probably will be from now on. Make sure to use the link to the raw data (as I did) in your curl call, using the ?raw=true at end of url.

Cleaning / Set up

There’s been some really nice work nice work in visualizing data with d3 and
exposing those tools through APIs for R. htmlwidgets has been at the fore of this. (Thanks y’all!)

A fun pkg I’ve recently found for making chord diagrams is Matt Flor’s chorddiag github pkg. So all my data cleaning work will be towards getting an adjacency matrix of tag counts.

With these chord diagrams I think a healthy check with a more traditional
visualization gives you some barrings on what to expect from the chorddiag output.
It also gives you some ways you might want to tweak options, as will be demonstrated below.

Adjacency Matrix

Now we build the matrix by registering whether a tag shows up for a specific
petition and then creating the adjacency matrix to represent co-occurences of tags, which
is what we need for the chord diagram.

Chord Diagram and choices

I’ll save you the suspense regarding what choices I had to make in order to getchorddiag to work for me.

The first is that this data set has many more categories or groups
than chorddiag is set up to work with out of the box. This is mainly because they have a
default color selection referencing RColorBrewer palettes, which have fewer than we need.

So this blogpost was really helpful in showing how you can augment RColorBrewer palettes for many
more than the usual set. It build a function that can then create as many colors as you need off
variations in an original ColorBrewer set. This function shows up below in the chorddiag groupColors
option where you pass in a palette other than the default.

# set number of colors needed
colorCount

The second choice is that the order of how the chord are printed can really obscure
the main take away from your pre-attentive processing. Thus, I used a function to order the
adjacency matrix by the categories with the most. This ordering allows the smaller categories to show up
in the background, leaving the more prominent categories unobscured. One can also change thechordedgeColor option to something fainter to lessen their influence further.

# manage use of diagonal cells in adj_mat
remove_diags

Third choice was whether you wanted the diagonal of the matrix to show up.
In the diagonal the counts represent the tags own association with itself. So visually the chord in the
diagram will return unto itself, forming a hump of sorts, for that category. The humps
work sort of like the bars in barchart above in showing the relative weight each category had on its own,
independent to how it is connected to other tags.