Category Archives: visualizations

One of my colleagues at the Center for Research Computing wanted to find a way to visualize the connections made to a few of our IP’s, down to the port level. Here’s my first crack at it, using data from yesterday. Ports are in Grey, clustered around their IP addresses; IP’s are in red. This only shows the connection during the minute it was opened. We’re thinking this might be useful to give non-technical users a sense of how our network is used.

The graph has been updated to include all tweets before 10:00pm (ish) on Saturday, March 22. It includes Friday’s tweets as well.

In addition, it now includes #ACLA14, since there has been some

I’m sharing what I think is a useful tool for navigating the Twitter activity on #ACLA2014 (&#ACLA14) (though this is more of a potential utility, as there’s not yet enough activity to require this kind of map). There is now enough activity (734 tweets by 225 users, and 196 connections), to produce a navigable map.

The nodes in this graph are people tweeting on #ACLA2014 & #ACLA14, or mentioned by people using that hashtag. If you click on them, you’ll see a user icon, a list of their tweets with links to that content, and below this a list of their connections to other people. Connections represented here: retweets, replies, and in-line mentions (“Loved the panel with @SoAndSo”).

I will update it over the course of the day, and make another for Saturday’s activity (unless people would prefer a multi-day map). I welcome feedback!

My own panel can be found on pages 274-5 of your program. It’s Friday & Saturday, 4:40-6:30, at 25 West 4th C-16. I present on Saturday, and will be talking about Thomas De Quincey and the Netflix & Amazon recommendation algorithms.

One of the things I learned from scott’s post was that I had been drawing co-citational, rather than citational, graphs. Which made a lot of sense of the structures I’d been seeing. Basically, a line on the graph btw A and B doesn’t represent work A citing work B, but instead that A and B are both cited by some third work, not necessarily represented on the graph. All the nodes you have seen in the graphs I have posted are works that have been cited two or more times, and the edges are all representations that those two works have been cited together by two or more separate articles.

What was missing from these exports was the temporal dimension: co-citational network graphs allows us to think visually about how fields organize knowledge, and their own production of it. However, the interactive graphs I published before were static, and so did not allow us to think about how these internal structures developed over time.

I therefore reworked the code to export dynamic graphs (.gexf format only). These graphs register changes in influence and connectedness, over time, of the works cited by a journal.

I believe I wrote this code properly, but it is producing small variances in graph sizes compared to Caren’s original, so if anyone is interested in helping to unpack that, definitely email me. I also considered the usefulness of making modularity classes dynamic,

Back to the graph. My test case is again Studies in Romanticism. Over time, you will see individual nodes and edges changing size based on (respectively) the number of times a given work has been cited, and the number of times two works have been cited together. You will also see clusters develop, and separate from one another. I have not added any decay function, so once works are linked, or once a work has a specific node size, it either keeps that size or grows; no works or links diminish in absolute terms simply because they haven’t been cited in a while.

In relative terms, however, they may fail to keep up with the growing influence of Wordsworth’s Prelude, Coleridge’s Biographia Literaria, or even Milton’s Paradise Lost. I have identified clusters around the big six poets, plus three around Mary Shelley, William Godwin, and Walter Scott. I have also identified the developments of two of these sub-areas of Romantic interest with the publications of major critical works (all shown on graph).

I really think this kind of visualization could be an incredible research aid, if the raw data were cleaned up. But other commitments are likely going to keep me from working on this project for a while. In the meantime, please consider developing the code on GitHub and/or use the tool to create a map of your own field’s evolution. If you do, please email me a link to give me a few minutes’ break 🙂

I’ll be making one more of these graphs (Victorian Studies) before I give it a rest for a while, but I thought I would present a nice coda to the MLA interactions graphs; I have two network graphs (using slightly different scripting and visualization) of Twitter interactions on the hashtag #mla14, in previous blog posts.

To round off this thinking about academic networks in all senses — though I have to say I haven’t been doing much thinking at all on the blog about this as I just try to make the data legible — I thought I would publish a citational network graph for PMLA. For the details on how to navigate these graphs, go to my earlier post on Studies in Romanticism. On this graph, though, metadata doesn’t seem to be doing the job on identifying communities, and the database had a good number of orphan nodes that were causing problems with the graph and had to be removed.

This is my second graph of tweets on this blog. I’m using an old script of mine to capture #mla14 tweets (using Twitter’s REST API). The below graph was built from 9785 tweets, posted between 6pm Friday and 10am Sunday. It shows 1736 users and 3666 interactions.

My version, which will not update with new data, I built with Gephi and my own script, but it loads and runs a bit faster as a result. It would be interesting to hold up the two networks and to see how differences in interpreting mentions create different groupings. This visualization is made using a sigma.js plugin (see below).

For simplicity’s sake and to see how well the community-detection algorithms work across journals. I’m actually quite surprised at how well this seems to have worked (and how coherent the detected communities seem to be). I have a little bit of training in (19th-c ) Americanism, so I’ve gone ahead and identified some of the communities:

Credits:

As usual, some credits: the javascript visualization, which allows this complex graph to be presented in your browser, was written by Alexis Jacomy. The raw data comes from Thomson Reuters’ Web of Science. The parser/analyzer that turns the raw data into a network was written by Neal Caren. And I wrote a patch that allows Caren’s code to talk to Gephi. It occurs to me that these credits might lend themselves to a network graph…

This blog is quickly becoming a library of network graphs, but I really couldn’t resist this.

I dusted off an old script of mine for pulling twitter data, and have built a network graph of all the interactions between people using the hashtag “mla14”. This graph is recent through about 11:15pm Eastern.

This graph did not readily partition into distinct communities. Perhaps someone else will have better luck with the metadata, which I’ve exported here.

But the full graph, which I’ve published using a Gephi sigma.js export plugin, is available below:

As promised, Nancy Armstrong‘s Desire and Domestic Fiction has more indexed citations in the Web of Science database than does Thomas Hobbes’s most famous work. Bigger than Leviathan, as it were. I’m interested in hearing feedback from people about the algorithmically-detected “communities” color-coded in the above graph, and any noteworthy connections that can be found.

I also suggested yesterday that I had some ideas for how best to use this kind of visualization. Here are a couple:

1. As a discovery tool

My medium-term goal (end of summer?) is to turn these graphs into graphical interfaces. Essentially, one would be able to click on a node, and see the article’s related metadata (e.g., abstract, publication information, cited and citing articles), including a link to the JSTOR resource (or Amazon page), displayed in a sidebar. I’ve already written some of the back-end code for this. This would allow users to move seamlessly from a basic exploration of connections to research. If the strength of these graphs is that they show us important connections we weren’t yet aware of, they can only be helped in this area by assisting scholars in turning parts of the network into ad-hoc bibliographies.

2. As a minoritarian discovery tool

A major downside of these graphs for scholarly discovery is that they tend to reinforce our prejudices towards major/great works (both primary and secondary). It would be all too easy to use this tool to create mere checklists of scholars to cite. Moreover, the citational database we are using here bakes this bias in, to a certain degree–1) the “orphan” articles that I’ve removed may very well be cited by works that the Web of Science simply didn’t parse correctly, and 2) these articles certainly cite other works, but whoever they’re citing, the Web of Science didn’t recognize the reference.

A longer-term project would involve fleshing out this graph fully, probably by writing a bibliographic parser and workflow for a human to do error correction on the parsed data. With that full graph, one could create a devil’s dictionary of dead-end citations and pseudo-communities. By looking in the aggregate at small groups of scholars who withered on the vine, one could tell a history of whose work went nowhere, and attempt to explain why. These analytic tools need to be used to actively counteract the confirmation bias that they inherently favor.

3. As dynamic visualizations of the development of fields and subfields

As noted above, the publication dates correspond roughly with the pagination of articles within journal issues and volumes. The use of this data goes beyond a temporal filter on a static graph, though. By time-coding the different works’ appearances and references of one another, and applying a force-based layout as these elements are introduced, we should be able to see the historical emergence of scholarly communities over time, and so to assist attempts to narrativize this information.

But…

I’ve got a (non-digital English Lit) dissertation to write, and for the time being my work on this project will be limited to cleaning up these data sets (probably using JSTOR information with code I’ve been writing). The Web of Science database is fairly solid on citations (though there are plenty of duplicate entries and probably missed connections), but its other bibliographic metadata (title, author, date, &c.) are dismally bad. It’s a rare node on the above graph whose title is complete.

Credits:

Again, some credits: the javascript visualization, which allows this complex graph to be presented in your browser, was written by Alexis Jacomy. The raw data comes from Thomson Reuters’ Web of Science. The parser/analyzer that turns the raw data into a network was written by Neal Caren. And I wrote a patch that allows Caren’s code to talk to Gephi. It occurs to me that these credits might lend themselves to a network graph…

But the reason I learned to code in the first place is that I was interested in building a map of citations between scholars in the humanities, using journal articles as the natural unit for scholarly communities. Over this holiday break, I was able to make some progress on my long-desired project by contributing to Neal Caren’s great citational parser/analyzer (you may recently have seen some of the results of Kieran Healy’s use of the tool).

What you are looking at, then, is a citational network for Studies in Romanticism, 1961-present. A set of nodes (the dots) representing every journal article in SiR AND the works that these articles cite in their footnotes, if those citations are recognized by the Thomson-Reuters Web of Science database. For instance, canonical works like Burke’s Reflections on the Revolution in France are recognized by the database. The lines (edges) that connect these nodes show when one recognized item cites another recognized item two recognized works are cited in the same work (or works) [updated 1/18/14 (significance)]. I have removed all the “orphan” nodes (those that have no edges connecting them). (That’s actually really interesting, because someone should probably figure out what that list of irrelevant citations is–what is it that people write about, who aren’t in an intellectual community?) Naturally, when articles from the same journal reference each other (SiR–>SiR), most of those references are going to be recognized by the database, and so you’ll get a rich interlinking between the journal articles. That interlinking is really the matrix on which the graph grows.

I have not yet had time to clean this data up (removing duplicate entries, cleaning up the text, thinking more critically about the algorithmically-defined “communities” in the above graph, using the publication dates and even page ranges to make the connections and nodes time-coded and the graph dynamic), but it’s already quite interesting:

You can see distinct communities around the big 5 + Blake

Mary Wollstonecraft Shelley (with Mary Poovey) has her own corner of the graph

An end-run of anticolonial sentiment, touching on Percy Shelley, Walter Scott, Saree Makdisi and William Blake to William Godwin, skirts around the edge of the graph to make its connections

Milton is as much the heart of this community as Coleridge

I have also generated a citational network graph for Eighteenth-Century Studies, which I will post in the next day or two. Spoiler alert: Nancy Armstrong is bigger than Hobbes’s Leviathan.

If you’ve noticed some interesting connections, or you want to propose a way that this data could be leveraged for research purposes (I, too, have some ideas…), drop it in the comments below or email me at john_mulligan@brown.edu.