News & Events

thousands of hands writing, just two eyes to read it all

World knowledge production comparing 2000-2002 (internal circle) with 2007-2009 external circle. The number of papers is increasing everywhere, with higher growth in Asia although US still leads the way. Credit: Amin Mazloumian and others

Have you ever felt overwhelmed by the amount of information available? Just Google anything and you will see that there are millions pages containing some references to what you are looking for.

IBM researchers along with other researchers from the University of Texas and Baylor College of Medicine have created a search engine that can look inside a document and provide a syntheses of its content relating it to your search. This is an attempt to face the problem of managing the deluge of information created by the publishing of a medical paper every 30"! Just to read what is being published on a single day you would likely need more than five years (assuming you can read, and understand, 2 papers per day... every day, week end included).

At IBM they have developed what they call the Knowledge Integration Toolkit, KnIT, a tool that can help researchers making sense of the over 50 million papers publicly available. It is being targeted specifically at medical research but it can have a more general application.

The amount of information is overwhelming. As an example, the work that has been done on a single protein, p53 that is related to suppress tumours, would require a researcher 38 years to read all papers (over 70,000) available today, assuming she can read 5 papers a day.

Remember Watson, the IBM computer that managed to beat humans at the television show Jeopardy? Well, IBM scientists have been using its cognitive capacities to peruse the 70,000 papers related to the p53 protein and provide a knowledge network that relates the information contained in the various papers, allowing researchers to get a bird's view of the accumulated (but dispersed) knowledge.

According to Lichtarge, the researcher holding the Cullen Foundation EndowedChair at Baylor:

"In the first test using KnIT, the team sought to identify new protein kinases that phosphorylate (or turn on) the protein tumor suppressor p53. There are over 500 known human kinases and 10s of thousands of possible proteins they can target. Thirty-three are currently known to modify p53.

In the study, the team used KnIT to mine the medical literature up to 2003 when only half of the 33 phosphorylating protein kinases had been discovered.

Using KnIT, 74 kinases were extracted as potential modifiers. Of these, prior to 2003, 10 were known to phosphorylate p53, nine were discovered at a later date. Of the 10 already known, KnIT accounted for them in reasoning as well as ranking the likelihood that the other 64 kinases targeted p53. Of the nine found nearly a decade later, KnIT accurately predicted seven.

This study showed that in a very narrow field of study regarding p53, we can, in fact, suggest new relationships and new functions associated with p53, which can later be directly validated in the laboratory,"

What is fascinating to me is seeing that the reasoning that can be performed by a computer can work hand in hand with the reasoning made by researchers. Notice that here it is no longer about number crunching, we have moved into cognitive reasoning, something that just a few years ago we would have said is the hallmark of human capabilities.