Integrating Knowledge from Different Sources

Posted by Rudi Pillich

June 8, 2016

The NDEx Project is involved in a collaboration with the Peter Sorger's lab as part of two DARPA-funded programs: BIG MECHANISM and COMMUNICATING WITH COMPUTERS. One output of these collaborations is The RAS Machine network, available on the NDEx Public Server and automatically updated daily.

The RAS Machine integrates diverse inputs ranging from BEL curated documents, BioPAX from pathway commons, and Natural Language processed papers; this automated process is one use of the INDRA system. INDRA grew out of a plan to extend the Sorger lab's prior work on PySB, incorporating BEL as one input type and building on the concept of knowledge assembly. The intent was to create executable models automatically from multiple sources of knowledge combined with criteria for selecting model boundaries. INDRA is indeed outputting executable models, but it became clear that the rich intermediate data structures in the assembly process could be used to create many kinds of output models – such as the network model currently being written to NDEx.

One interesting point about The RAS Machine is that it demonstrates a methodology that can be used to build focused communities around networks, supporting the NDEx mission to enable scientific discourse via sharing, reviewing, and publishing networks.

RAS biology is the focus of the Big Mechanism project, selected because of its therapeutic relevance. RAS is arguably a hard topic for automated knowledge assembly because RAS is well researched and any output model must compete with a broad selection of existing curated models, review articles, and even textbooks… But many other mechanisms and diseases have much smaller research communities and funding, even though the underlying biology may be just as complex. High-quality manual curation of networks is difficult, expensive and keeping networks up-to-date with the latest research findings is not sustainable.

Instead, the approach of mixing selected Natural Language processing of recent articles with knowledge from pathways, interaction data, and other sources has the potential to bootstrap communities by providing sets of interesting networks that are useful and current and that can engage the researchers in that community, giving them motivation to correct/extend the automatically produced knowledge.