Findings increasingly novel, scientists say…

…was the tongue-in-cheek title of an image that I posted to Twitpic this week. It shows the usage of the word “novel” in PubMed article titles over time. As someone correctly pointed out at FriendFeed, it needs to be corrected for total publications per year.

I’m sure that quite recently, I’ve read a letter to a journal which analysed the use of phrases such as “novel insights” in articles over time, but it’s currently eluding my search skills. So here’s my simple roll-your-own approach, using a little Ruby and R.
Initially, I entered “novel[Title]” at the PubMed website, download all 143 031 results in Medline format and parsed the “DP” (publication date) field. Useful, in that I learned the earliest title (1845); inefficient, in that the resulting download is ~ 397 MB.

Fortunately, BioRuby comes with a nice set of methods for search and retrieval from the NCBI Entrez databases, including esearch_count() – as the name suggests, it simply counts returned results for a query.

So, to search pubmed for (1) all articles published from 1845 – 2009 and (2) those articles with the word “novel” in title or abstract is as simple as this:

Save and run that as pmnovel.rb > pmdata.txt. Obviously, we’re having a bit of fun here. You could search for any terms that you like and in a real script, you’d probably want to specify the terms and date range as command-line options.

Next, load the tab-delimited output file into R for some simple plotting.

And here’s the result (click for full-size version).
There you have it. We see a steady post-WWII increase in total publications (top panel), increasing more sharply around 1995. The exponential increase in “novel” findings (middle panel) looks like it begins in the early 1980s. And the fraction of total publications that are “novel” (bottom panel) also begins to increase in the 1980s and is now at an all-time high. Last year, ~ 6.1% of findings were “novel”, compared with the all-time proportion – sum(pmdata$novel)/sum(pmdata$total) of ~ 2.3%.