The Microsoft Research Connections blog shares stories of collaborations with computer scientists at academic and scientific institutions to advance technical innovations in computing, as well as related events, scholarships, and fellowships.

Can scientists predict what happens when they introduce a change into a living system—for example, if they change the structure of a gene or administer a drug? Just as changing one letter can completely change the meaning of a word, the change of a single letter of the genetic code (referred to as a single nucleotide polymorphism, or SNP) can subtly affect the meaning of a gene’s instructions or alter them completely, making the effect of any change extremely hard to predict. Such changes are thought to be responsible for much of the variation between members of a single species—for example, in susceptibility to different diseases. The ability to successfully predict the effect of such changes would accelerate drug discovery and provide a deeper understanding of the processes of life.

One of the researchers’ first tasks was to determine whether it is possible to predict how a complex network of biochemical interactions will change when a SNP (pronounced “snip”) alters the function of one of the network’s components. In an August 2012 paper entitled, “Static Network Structure Can Be Used to Model the Phenotypic Effects of Perturbations in Regulatory Networks” (available at Bioinformatics with paid subscription), the authors describe their success in analyzing static models of biological networks and correctly predicting the response to changes more than 80 percent of the time. This enables the functions of the network to be deduced, the foundation for building a more expressive dynamic model.

Building static networks is a challenge in itself; before beginning this work, the researchers needed to understand which genes are active in a particular cell and what they do. In their latest publication entitled, “Assessing the Relationship between Conservation of Function and Conservation of Sequence Using Photosynthetic Proteins” (available at Bioinformatics with paid subscription), the Ofran lab has shown that, while sets of related genes with similar structure diverge in function more quickly than previously thought, selected smaller pieces of each gene may still be useful in predicting function.

There are many unresolved challenges along the way to the eventual goal of predicting the effect of a SNP—understanding which genes are switched on in which cells and how drugs interact with proteins are just two active areas of investigation—but once the goal is reached, an understanding of the functions of all genes and how changes affect biological systems could lead to the development of computational models to predict and cure many diseases.

—Simon Mercer, Director of Health and Wellbeing, Microsoft Research Connections

Over the past two years, I have watched eScience take root in China. The movement advanced in the first and second Chinese eScience forums and in various eScience projects that were developed by the Computer Network Information Center (CNIC) of the Chinese Academy of Sciences (CAS). During this time, Microsoft Research collaborated closely with the CAS, exchanging ideas through joint workshops, student contests, and lectures such as the keynote that Tony Hey, vice president of Microsoft Research Connections, delivered at the CAS meetings in 2010.

Through these channels, a foundational concept of eScience—that we are entering a new fourth paradigm for science where discovery advances through data-intensive computing—was introduced to the Chinese eScience community and attracted the attention of the CAS. In late 2010, Xiaolin Zhang, the executive director of the National Science Library of the CAS, proposed a Chinese translation of The Fourth Paradigm, a seminal collection of essays that describe the practice and promise of data-intensive science. I am happy to report that through the efforts of the CAS and the support of Microsoft Research, the Chinese edition of The Fourth Paradigm premiered in Beijing on October 23.

Tony Hey and Stewart Tansley, two of the book’s co-editors, joined Lolan Song, Steve Yamashiro, and me at the launch event. On behalf of Microsoft Research, Tony donated copies of the book to more than 80 Chinese university libraries, observing that "The advance of science depends on how well researchers collaborate with one another, and marry science with technology." I, for one, am confident that the publication of the Chinese edition of The Fourth Paradigm will foster just such endeavors.

Jiaofeng Pan, the deputy secretary-general of the CAS and one of the book’s Chinese translators, spoke highly of the Chinese edition. “Building on the studies from the field of eScience, the book proposes the fourth paradigm for scientific research: data-intensive science as well as academic exchange based on big data. This book opens the door to a new paradigm of scientific research, greatly enhancing awareness of the huge impact of the digital revolution in the research and information network.”

Through the release of the Chinese edition, we sincerely hope to help Chinese researchers in a variety of fields to understand and utilize this revolutionary development in research methodology. To further speed the adoption of data-intensive approaches to research, Microsoft Research has agreed to donate 2 million hours of access to Windows Azure cloud resources, as well as 15 terabytes of Windows Azure storage space, to research projects at the CNIC over the next two years, which will enable Chinese researchers to apply the concepts of the fourth paradigm by using the Windows Azure platform.

In 2013, the IEEE International Conference on e-Science and the Microsoft eScience Workshop will be held jointly in Beijing. Looking forward to those events, we anticipate even more progress in eScience research in China.

Antibiotics, antivirals, NSAIDs—the list of modern “wonder drugs” goes on and on. And yet many diseases remain resistant to drug therapy, and in other instances, the side effects of drug treatment are as bad as or worse than the disorder. Why, the public wonders, aren’t more new and better drugs coming to market?

The answer, in a word, is cost. Modern drug discovery involves identifying likely candidates and then screening them for biological efficacy and potential toxicity. This process is enormously, often prohibitively, expensive.

Toxicity prediction in particular remains one of the great challenges of drug discovery. Even after decades of unprecedented funding, scientists still struggle to predict the toxic side effects for any given compound. Traditional statistical models that are based on empirical data, while wonderful in theory, have one key shortcoming. Unless researchers have access to either a state-of-the-art corporate datacenter or one of the world’s few supercomputers, there’s just too much data to analyze efficiently. The identification of compounds that will cause a desired biological effect requires a huge investment in technical infrastructure.

At least it did until recently. Now, the power of cloud computing offers a relatively inexpensive alternative to the huge up-front costs of building out a high-powered computing infrastructure. Researchers from Molplex, a small drug-discovery company; Newcastle University; and Microsoft Research Connections are working together to use cloud computing to help scientists across the globe deliver new medicines faster and at lower cost. This collaborative partnership has helped Molplex develop Clouds Against Disease, an offering of high-quality drug discovery services based on a new molecular discovery platform that draws its power from Windows Azure.

The Clouds Against Disease computational platform runs algorithms to calculate, rapidly, the numerical properties of molecules. As a result, Molplex has been able to produce drug discovery results on a much larger scale than has ever been seen before.

The Molplex method enables researchers to address practical issues when screening compounds. Will the compound be toxic? Will it pass safely through the human intestine? Will it stay in the body long enough? The Molplex process features extreme front loading that identifies viable drug candidates early in the research process. Contrast this with the traditional approach, which involves a great deal of up-front experimental work that is wasted when the researchers later learn that the hoped-for drug is toxic.

Access to Windows Azure, Microsoft’s cloud platform, was critical to the success of Clouds Against Disease. Molplex can take advantage of 100 or more Windows Azure nodes, which are in effect virtual servers, to process data rapidly. The physical-world alternative would be to source, purchase, provision, and then manage 100 or more physical servers, which represents a significant financial investment. Scientists taking this traditional approach would have to raise hundreds of thousands, or even millions of dollars before they could begin drug research. That’s a huge barrier for scientists around the world who want to engage in drug discovery. Windows Azure helps to eliminate start-up costs by allowing new companies to pay for only what they use in computing resources.

One of the biggest potential impacts of Clouds Against Disease lies in its ability to make drug discovery affordable for tropical diseases and niche disorders—categories that have long been low priority for drug companies, due to their limited commercial payoff. The requirement of a multi-million dollar investment before even going into the clinic doesn’t work for scientists studying drugs to combat such diseases. Radically reducing the cost of drug discovery makes it feasible for scientists to tackle these scourges and bring hope to countless sufferers around the world.