In the distant past, before the Internet, doing original research merely meant carefully scanning the one, or two, topical journals in your specialty, attending one or several topical conferences each year, and maintaining contact with a small group of scientists who work in your niche area. I did all these when I was active in research on magnetic materials. However, today, things are more difficult. There's a plethora of online materials to scan, most of which are found, at best, to be not that relevant; or, at worst, completely wrong. There's also Google, which gives way-too-many leads, and commercialdatabases.

While every scientist wants to do research in exciting topic areas, "hot" research areas come and go. The first decade of my tenure in corporate research involved magnetic bubble memory materials, a hot research area at that time. Attendance was huge at conferences at which magnetic bubble research was presented; but, alas, semiconductor technology superseded any advantage that magnetic bubble memory offered, and I needed to migrate to other research areas. If I had learned anything from my undergraduateeconomicscourse, there's a point at which buggy whip manufacturers need to quit.

The Ngram Viewer team assembled a corpus of about 4% of all books ever printed, and they developed software for easy analysis of this database. The Ngram Viewer is essentially a concordance of a random selection of 4% of every publishedword, and the project has its own website, www.culturomics.org. Since this is a concordance, this database of 500 billion words collected from 5,195,769 books, is copyright-free. It should be noted that the words are from books, only, since the dating of periodicals in Google Books was very poor. The database extends to the year 2000, only.

As a cited inventor on many patents in my corporate research career, I know from experience that few patents make enough money to even cover the cost of the patenting process, and fewer still describe world-changing technology. Just a small fraction of patents represent important technological advances.[3] The objective of this study was a metric that allows for early identification of technologies, such as the smartphone, that lead to radical social changes. The trial metrics were tested using a list of historically significant patents.[3]

While it's known that, on average, important patents tend to receive more citations, the correlation is noisy and not that predictive.[3] While statistical analysis demands a large number of citations for best estimates, accumulating citations takes time. Statistics might be fine in historical studies of the importance of older patents, but not that useful for detection of recent important patents.[3]

As they say, "You're known by the company you keep," and that's the idea behind the PageRank concept used by Google in Internet search.[5] In the patent context, the idea is that important patents will be cited by other important patents.[3] It's found that highly-cited patents tend to cite other highly-cited patents, but one problem with both PageRank and simple citation counting is that they are biased towards old patents, and we're really interested in recent patents.[3] To remedy this, the research team used an "age-rescaling" approach that removes the age bias for simple citation counting and simple PageRank (see graph).[3]