BENCH-SIDE COMPUTATION: New Tools to Accelerate Experimental Research

By Katharine Miller

Many experimental researchers rely on computational tools to push the pace and productivity of laboratory research. It’s impossible to predict what the hottest new tools will be, but this column describes a few gems that have recently caught our attention: a new computer vision tool to analyze mouse behavior captured on video; a new protein simulation video database; an Excel program that makes it easier for the average bench scientist to do his or her own bioinformatics work; and a data-mining algorithm that explores complex temporal interactions among genes.

In future issues of Biomedical Computation Review we plan to describe other interesting new tools in this column. So send us ideas for a tool you’d like to have, a tool you’re using, or a tool you’ve developed that you think merits coverage here. We’ll stack it up against other ideas and write about those that catch our eye.

Evaluating Mouse Behavior

To conduct experiments that track mice as they eat, rest, play, and sleep, graduate students spend hours of mind-numbing time viewing video footage and categorizing actions. But a new computer model of the visual system can do that job just as well as humans. In addition to freeing researchers to do more engaging activities, it should provide more objective data and lead to more reproducible results.

“One real prospect is that you can use it over long periods of time to track the time course of disease,” says Thomas Serre, PhD, assistant professor of cognitive and linguistic sciences at Brown University, who developed this system in collaboration with a team of colleagues at the McGovern Institute for Brain Research at Massachusetts Institute of Technology and the California Institute of Technology. “Addition of a second camera would allow computer systems to do better than a single human observer,” he notes.

Serre’s model of the visual system simulates what scientists know about the receptivity of neurons in various parts of the brain. “Neurons in some areas are sensitive to movement. Others are tuned to edges and boundaries,” Serre says. “This program is trying to mimic that.” The model looks at shapes and directions of motion and learns what combinations constitute certain behaviors. Using video sequences (essentially pixels changing in intensity over time), it performed as well as people in identifying eight standard mouse behaviors. His team is now extending the system to watch social behaviors of animals housed in groups. The work was published in Nature Communications in September, 2010 and available online at : http://serre-lab.clps.brown.edu/projects/mouse_behavior/index.html.

“Vision is far from a solved problem,” Serre says. “But little by little we are getting there.”

Videos of Protein Dynamics

Proteins are machines that move. Nevertheless, the pharmaceutical industry uses static protein structures when doing rational drug design. “Everybody knows that is a huge limitation,” says Modesto Orozco, PhD, professor of biochemistry and molecular biology at the Institute for Research in Biomedicine in Barcelona, Spain. So he and his colleagues assembled a large database of proteins in motion—including proteins that are pharmaceutical targets, such as kinases and membrane proteins.

Traditionally, the pharmaceutical industry virtually screens millions of compounds against a single static structure to identify perhaps 100 compounds for testing while missing 1000 that might bind to the structure in a different configuration, Orozco says. “They live with that.” But when people access the simulation results in Orozco’s Molecular Dynamics Extended Library (MoDEL), they can dock compounds with 10,000 structures instead of one. “This increases the possibility of detecting potential ligands,” Orozco says.

Running the simulations to build MoDEL took almost four years and nearly several hundred years of CPU time with jobs running in parallel. “It was very computationally intensive,” Orozco says.

Orozco estimates that MoDEL’s simulations will only increase the probability of finding a successful drug by 5 to 10 percent (since drug development is complicated by many factors beyond ligand binding, such as toxicology and patent issues). Still, given that MoDEL is open to the community (at http://mmb.pcb.ub.es/MoDEL/), why not make use of that potentially valuable 5 to 10 percent?

Excel Anyone?

For some biologists, hiring a bioinformatician to do analyses can be prohibitively expensive. Even the software can be out of reach—as well as hard to learn. That’s why Robin Hallett, a third-year PhD student at McMaster University in Toronto, Canada, decided to create his gene expression analysis tool in a ubiquitous program: Excel.

“There are lots of biologists sitting on data and they don’t have the knowledge or means to extract useful information from it,” Hallett says. “My goal was to make something for myself using programs everyone knows how to use.”

Hallett’s tool takes gene expression data and uses basic statistics to identify predictive gene signatures for diseases such as cancer. To test his algorithm, Hallett used data on 295 breast cancer patients. From half the data (the training set) he identified genes whose expression levels correlated with survival. When tested on the remaining patients’ data (the test set), the highly ranked genes properly segregated the breast cancer patients by prognosis. The work was published in the Journal of Experimental & Clinical Cancer Research in September 2010.

Hallett notes that bioinformaticians haven’t been terribly impressed by his paper (it was rejected by a computer science journal), but they’re not his target audience. The tool was designed for graduate students with the least access to advanced computing resources. And it’s meant for the learning/discovery phase of research rather than at the clinical end, when researchers might want something more powerful.

“It works well, not excellently,” he says. “But more advanced machine learning algorithms are not universally accessible like Excel. To someone unfamiliar with those algorithms, it’s fine.” And perhaps his work will lead to other Excel-based bioinformatics algorithms, making computational biology truly available on any desktop.

Data-Mining for Transcriptional Changes Over Time

Biologists frequently have to grapple with changes in gene expression over time. Some algorithms track how gene expression peaks and what the shape of the peak is. Others try to find oscillatory patterns (such as circadian rhythms). Still others look at gene expression over an entire life cycle to understand different stages of the organism’s life. But all these options only consider changes at the level of single genes.

Naren Ramakrishnan, PhD, professor of computer science at Virginia Tech, and colleagues have taken a new approach, creating a software tool that focuses on groups of genes. Different groups of active genes organize, break up, and coalesce when the cell transitions from one stage to the next, he says. By identifying the timepoints when groups of genes dynamically reorganize, his algorithm automatically determines the stages the cell goes through. “It’s very unsupervised,” he says. “The algorithm is not given any information about where transitions might happen.”

The software, called GOALIE, is available for use by any biologist with time course data. “They can load time series data into our software and explore transitional boundaries,” Ramakrishnan says.