Categories

digital humanities blog

Letters of 1916: Social Network with VisJS

Over the last couple of days, I’ve been investigating various libraries for graph visualisations. One thing that came up from the last Letters of 1916Twitter Chat, where we looked at some of Roman’s visualisations of the letters, was the difficulty of understanding and explaining outliers (like the love letters) when the data represented by each node cannot be accessed.

As a result, I decided to look into interactive graph tools (particularly web-based ones… they’re maybe a bit slower, but I speak passable JavaScript, and almost every modern browser can show them without much difficulty). The simplest seemed to be vis.js, which is astonishingly simple (just pass it some JSON for the nodes and edges). So I diverted the Python script I used to generate the graph in the last post into an html page with the vis.js library included, and hit refresh… This takes an awfully long time to render.

One potential for this web-based approach is incorporating this into a digital (online) edition of some kind: each of the nodes is an html canvas object, so it’s possible to add any amount of data — names, pop-ups, canonical links — to them. I also like the ability to zoom in easily (scroll) and to drag nodes around to spot the links between larger clusters. (It looks like there are two relatively-tightly linked clusters, that are joined only by a chain of about four people.)

The other thing to note is that this graph tool does not overlay identical edges, so each line now represents a single letter. This creates a sort of tightly-knit bundle for two people who wrote to each other a lot (‘James Finn’ and ‘May Fay’ are connected by so many edges that the graph library still hasn’t managed to stabilise, and they dance around each other and — when zoomed out —seem to flash like pulsars; I feel this is quite romantic, somehow.)

All the data in this document comes to about 150kb, but then factor in the loading of the javascript library and the rendering of the page in-browser and you might be there a while. (Chrome claims the page is not responding: it is.)

Unidentified persons

Since posting the graph in the previous post, with its big cluster of ‘unknowns’ in the middle, I’ve been trying to think of various ways round this — or, at least, to make it not quite so disruptive to the graph as a whole. Once the corpus is more fully-developed, hopefully a lot more of these people will be identified, but in the meantime I just wanted a way to ‘unbundle’ all the unknowns into discrete unknowns. The question then becomes, “How many unknown people are there in this bundle?”

At one end of the spectrum, we can assume that every single unknown sender or recipient was a distinct person. But this is probably not the case — a quick skim through the Excel file of data shows that particular people just didn’t write the recipient’s full name on the letter, which is quite understandable in the case of family members, for instance. In this case, a great number of additional people will be magicked into existence, which made the graph quite a lot more complicated.

At the other extreme, there is the situation we saw in the last graph, where we assume all the unknowns are one and the same person. This makes Mr. Unknown the most popular person in Dublin by a wide margin, and wildly distorts the graph.

In the end, I settled for somewhere in between, and assumed that each sender or recipient had preciselyone unknown correspondent. These have ‘ANONC’ (for unknown creator) and ‘ANONR’ (unknown recipient) appended to their labels. Somewhere in the graph there will be a pair of nodes called just ‘ANONC’ and ‘ANONR’, where both sender and recipient were unknown.

(The other option, which has just occurred to me, would be to remove all letters involving an unknown person. This would have the effect of removing some actual people from the graph entirely.)

The many-names-of-Lady-Clonbrock problem

Another problem, which I identified in the previous post, is the lack of normalisation of individuals’ names. I’m working on ways round this — compiling a dictionary of aliases and having my Python script normalise the names seems the most obvious thing, though, of course, this means knowing who all the people are in the first place.

I’m going to continue work on this: maybe introducing some kind of fuzziness into the searching, or using addresses instead of names, or both, might also be useful.

And finally, it would be nice to know the direction of each letter on the graph: something for another post.