The wisdom of crowds

Uncovering and modeling gene regulatory networks is one of the longstanding challenges in computational biology. While many different methods exist for analyzing and reconstructing gene regulatory networks, it is often difficult to decipher when these techniques will operate successfully, and which method is optimal for exploring different datasets.

Each year, the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project brings together researchers from around the world to tackle different challenges in cellular network inference and quantitative model building in systems biology. These challenges range, from assessing computational models of predicting breast cancer survival to predicting disease phenotypes from systems genetic data.

In 2010, participants were asked to focus on the reconstruction of regulatory networks for microorganisms, performing a blind assessment of more than 30 network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae, and in silico microarray data. Participants made predictions about the different networks and then submitted their results, and information on the different inference methods they used, to DREAM challenge organizers. Each submission was evaluated to see which techniques for gene network analysis were the most successful.

Daniel Marbach, a postdoctoral fellow in Associate Professor Manolis Kellis’ research group at the MIT Computer Science and Artificial Intelligence Lab (CSAIL), analyzed the results in a paper that appeared this month on the cover of Nature Methods. The work was completed in collaboration with Gustavo Stolovitzky at IBM, Jim Collins and James Costello at Boston University, and Robert Küffner at Ludwig-Maximilians University in Munich.

The results were surprising as they showed that there is not one optimal method favored across all datasets; instead, they found that different methods were strongly favored for different networks, suggesting that no single method could be uniformly recommended.

Moreover, by grouping methods according to the type of methodology used, they found that similar approaches led to similar performance patterns across different datasets.

“This project engaged many key players in the network inference community, and taught us a great deal about the state of the art in the field,” said Kellis. “It suggests that some general principles underlie the performance of different prediction methods, and that they capture different aspects of the underlying networks.”

The team then set out to combine these community predictions in order to construct a new predictor that combines the strengths of individual methods. The results upheld a longstanding belief in the wisdom of the crowd, showing that the optimal way to analyze datasets is frequently a middle ground that combines several different methods.

“We tried to leverage the wisdom of crowds to construct a method that builds on the strength of complementary approaches,” said Marbach. “We realized that when you combine the predictions of all the teams you get even more powerful prediction methods that consistently outperform individual approaches over a large range of problems.”

In the study, Marbach and his colleagues compared 35 individual methods for gene regulatory networks, 29 of which were submitted through the DREAM project and six of which were common network inference techniques. By combining different inference methods, they were able to construct high-confidence consensus networks for Escherichia coli, Staphylococcus aureus, and test 53 novel interactions in E.coli, of which 43% were supported, displaying the power of community-based methods for network inference.

“The novelty that we saw in this study is that you can get this improvement of accuracy when combining different methods for network inference. While this has been observed in other fields, this is a new result for network biology,” said Marbach. “We found that this community approach performs consistently across very different settings. Therefore, for a new dataset, the best strategy for network inference may be to apply a set of diverse methods and then combine the resulting predictions.”

While most research groups work independently to solve complex challenges, the DREAM project engaged many different groups of researchers to solve the same problem. Each group applied their methods to reconstruct the regulatory network for the same three microorganisms using the same datasets, and then submitted their results and methods of analysis for evaluation.

“The DREAM project enabled rapid sharing of results, direct comparison of the methods, and led to many new insights on the state of the art, that wouldn’t have been possible with the traditional approach of waiting for each publication”, said Kellis. “The community really came together to participate in this study, and everything we learned relied on the participation and energy of dozens of teams across the world coming together.”

The findings, “Wisdom of crowds for robust gene network inference,” were published in the August 2012 edition of Nature Methods. For more information on the DREAM project, please visit: http://www.the-dream-project.org/. For more information on the MIT Computational Biology group led by Kellis, please visit: http://compbio.mit.edu/.

The DREAM meeting is organized each year by Dr. Gustavo Stolovitzky and colleagues in conjunction with the RECOMB satellites on Regulatory Genomics and Systems Biology, co-organized by Prof. Manolis Kellis at MIT and Dr. Andrea Califano at Columbia University. The joint meeting will be held November 12-15, 2012 in San Francisco. For more information, please visit: http://recomb-2012.c2b2.columbia.edu/.