PLOS Biology-Inspired PLOS Biology Articles

This past week I had my first encounter with the concept of graph databases
which lend themselves perfectly to modeling and capturing linked data.

I started reading the free and brilliant book Graph Databases by Robinson, Webber, and Eifrem and
began playing around with Python bulbs by James Thornton.

I further took the data set of 1754 PLOS Biology articles that I have examined on this blog multiple times and created a
Rexster-based graph database from them.
Apart from the obvious authors, DOIs, and titles I also extracted references to other PLOS Biology articles.

In this blog post I will examine these links between PLOS Biology articles.

Let us first take a look at my database to get an idea of what this looks like.

Python bulbs allows us to define classes for our data model which is something I did when creating this graph database in the first place.
These are the node (vertex) types and edge (relationship) types I defined:

Usually we would use Rexster/Bulbs-builtin functions that rely on some internal index but since that index seems to be broken for me right now
I will simply collect all nodes and edges by hand and create Python dictionaries as indeces.

This is okay here to do since our database is very small but would likely be prohibitive for anything marginally bigger.

To get the node at the base of a directed edge we can either query article.inE().inV() (i.e. the in-node of this edge)
or simply ask for the in-node of the article node straight away - this should be equivalent!

forauthorinarticle.inV():printauthor.name

Yury Goltsev
Michael Levine
Dmitri Papatsenko

A quick check online confirms that these are indeed the authors of article.

As I mentioned above, I also collected all references to other PLOS Biology articles in my data set and modeled those as Citation relationships (edges) between articles.

The article we are currently looking at has one such out-edge to another PLOS Biology article:

I think it is sensible to postulate that the more often one article cites another one, the more heavily the work presented in the citer was influenced by the citee.

There is certainly some cut-off at which importance stops increasing - my point is simply that citing another article multiple times in your manuscript probably means that you are basing your work at least partially on the article you cite.

In the above list we can already see that one article titled A sex-ratio Meiotic Drive System in Drosophila simulans. II: An X-linked Distorter is a clear follow-up to the article titled A sex-ratio Meiotic Drive System in Drosophila simulans. I: An Autosomal Suppressor.

One question I am interested in is: How inspired are authors by their own work (generally very inspired I would presume), and how inspiring are articles to a completely different group of authors?

In my opinion, if one group of authors inspires a completely different group of authors to carry out scientific work (be it to follw up, refute, or whatever) then that defines knowledge transfer and a point at which scientific knowledge really becomes worth the time and resources it cost to produce this knowledge in the first place.

(I am certain this statement can be refined further but roughly speaking this is what I think)

Let us redo the above histogram but exclude all cited PLOS Biology articles that have one or more authors in common with the citing article.

(one more bracketed caveat: When constructing my database I assumed that every author name occurs exactly once and is therefore unique - this is a heuristic that breaks easily)