Columbia University Medical Center

2016 News

An example of tumor oncotecture. Transcription factors involved in the activation of mesenchymal glioblastoma subtype are shown in purple. Together, they comprise a tightly knit tumor checkpoint, controlling 74% of the genes in the mesenchymal signature of high-grade glioma. CEBP (both β and δ subunits) and STAT3 regulate the other three transcription factors in the tumour checkpoint, synergistically regulating the state of mesenchymal GBM cells. (Image: Nature Reviews Cancer)

In a detailed Perspective article published in Nature Reviews Cancer, Department of Systems Biology chair Andrea Califano and research scientist Mariano Alvarez (DarwinHealth) summarize more than a decade of work to propose the existence of a universal, tumor independent “oncotecture” that consistently defines cancer at the molecular level. Their findings, they argue, indicate that identifying and targeting highly conserved, essential proteins called master regulators — instead of the widely diverse genetic and epigenetic alterations that initiate cancer and have been the focus of much cancer research — could offer an effective way to classify and treat disease.

ONE of the most important medical insights of recent decades is that cancers are triggered by genetic mutations. Cashing that insight in clinically, to improve treatments, has, however, been hard. A recent study of 2,600 patients at the M.D. Anderson Cancer Centre in Houston, Texas, showed that genetic analysis permitted only 6.4% of those suffering to be paired with a drug aimed specifically at the mutation deemed responsible. The reason is that there are only a few common cancer-triggering mutations, and drugs to deal with them. Other triggering mutations are numerous, but rare—so rare that no treatment is known nor, given the economics of drug discovery, is one likely to be sought.

Facts such as these have led many cancer biologists to question how useful the gene-led approach to understanding and treating cancer actually is. And some have gone further than mere questioning. One such is Andrea Califano of Columbia University, in New York. He observes that, regardless of the triggering mutation, the pattern of gene expression—and associated protein activity—that sustains a tumour is, for a given type of cancer, almost identical from patient to patient. That insight provides the starting-point for a different approach to looking for targets for drug development. In principle, it should be simpler to interfere with the small number of proteins that direct a cancer cell’s behaviour than with the myriad ways in which that cancer can be triggered in the first place. (Read full article.)

Department of Systems Biology bioengineer Harris Wang describes the goals of the Human Genome Project - Write (HGP-write), an international initiative to develop new technologies for synthesizing very large genomes from scratch.

In June 2016, a consortium of synthetic biologists, industry leaders, ethicists, and others published a proposal in Science calling for a coordinated effort to synthesize large genomes, including a complete human genome in cell lines. The organizers of the project, called GP-write (for work in model organisms and plants) or sometimes HGP-write (for work in human cell lines), envision it as a successor to the Human Genome Project (retroactively termed HGP-read), which 25 years ago promoted rapid advances in DNA sequencing technology. As the ability to read the genome became more efficient and less expensive, it in turn enabled a revolution in how we study biology and attempt to improve human health. Now, by coordinating the development of new technologies for writing DNA on a whole-genome scale, GP-write aims to have a similarly transformative impact.

Among the paper’s authors were Virginia Cornish and Harris Wang, two members of the Columbia University Department of Systems Biology whose contributions to the field of engineering biology have in part made the idea of writing large-scale DNA sequences imaginable. We spoke with them to learn more about what GP-write hopes to accomplish, its potential benefits, and how the effort is evolving.

PrePPI predicts the likelihood that two proteins A and B are capable of interacting based on their similarities to other proteins that are known to interact. This requires integrating structural data (green) as well as other kinds of information (blue), such as evidence of protein co-activity in other species as well as involvement in similar cellular functions. PrePPI now offers a searchable database of unprecedented scope, constituting a virtual interactome of all proteins in human cells. (Image courtesy of eLife.)

The molecular machinery within every living cell includes enormous numbers of components functioning at many different levels. Features like genome sequence, gene expression, proteomic profiles, and chromatin state are all critical in this complex system, but studying a single level is often not enough to explain why cells behave the way they do. For this reason, systems biology strives to integrate different types of data, developing holistic models that more comprehensively describe networks of interactions that give rise to biological traits.

Although the concept of an interaction network can seem abstract, at its foundation each interaction is a physical event that takes place when two proteins encounter one another, bind, and cause a change that affects a cell’s activity. In order for this to take place, however, they need to have compatible shapes and physical properties. Being able to predict the entire universe of possible pairwise protein-protein interactions could therefore be immensely valuable to systems biology, as it could both offer a framework for interpreting the feasibility of interactions proposed by other methods and potentially reveal unique features of networks that other approaches might miss.

In a 2012 paper in Nature, scientists in the laboratory of Barry Honig first presented a landmark algorithm and database they call PrePPI (Predicting Protein-Protein Interactions). At the time, PrePPI used a novel computational strategy that deploys concepts from structural biology to predict approximately 300,000 protein-protein interactions, a dramatic increase in the number of available interactions when compared with experimentally generated resources.

Since then, the Honig Lab has been working hard to improve PrePPI’s scope and usefulness. In a paper recently published in eLife they now report on some impressive developments. With enhancements to their algorithm and the incorporation several new types of data into its analysis, the PrePPI database now contains more than 1.35 million predictions of protein-protein interactions, covering about 85% of the entire human proteome. This makes it the largest resource of its kind. In parallel with these improvements, the investigators have also begun to apply PrePPI in new ways, using the information it contains to provide new kinds of insights into the organization and function of protein interaction networks.

By inventing a new computational pipeline called DAMAGES, Chaolin Zhang and Yufeng Shen showed that brain cell types on the left of the plot are more prone to have rare autism risk mutations than cell types at the right. Narrowing the focus to these types of cells also helped to identify a molecular signature of the disorder that involves haploinsufficiency. Figure: Human Mutation.

Autism, a spectrum of neurodevelopmental disorders typically identified during early childhood, is widely thought to be the result of genetic alterations that change how the growing brain is wired. Nevertheless, despite a substantial effort in the field of autism genetics, the specific alterations that place one child at greater risk than another remain elusive. Although the list of alterations associated with autism is growing, it has been difficult to conclusively distinguish those that truly increase disease risk from those that are merely coincident with it. One troubling reason for this is that research so far seems to indicate that specific genetic abnormalities associated with autism risk are extremely rare, with many being found only in single patients. This has made it hard to reproduce findings conclusively.

In a paper recently published in the journal Human Mutation, Department of Systems Biology faculty members Chaolin Zhang and Yufeng Shen describe a method and some new findings that could help to more precisely identify rare autism-driving alterations. A new analytical pipeline they call DAMAGES (Disease Associated Mutation Analysis using Gene Expression Signatures) uses a unique approach to identifying autism risk genes, looking at differences in gene expression among different cell types in the brain in order to focus more specifically on mechanisms that are likely to be relevant for autism. Using this approach, they identified a pronounced molecular signature that is shared by disease risk genes due to haploinsufficiency, a type of genetic alteration that causes a dramatic drop in the expression of a particular protein.

The new course will provide a Master’s level overview of how systems biology is helping to address today’s grand challenges in biomedical research, what it can realistically be expected to achieve, and where it promises to have the most significant impact. Combining critical readings, discussions, tutorials, presentations, projects, and other activities, the course is designed for anyone interested in understanding the implications of systems biology across the sciences — including how it is affecting such fields as precision medicine, vaccine and antibiotic development, agriculture, science policy, and regulation.

RNA sequencing (RNA-Seq) has become a workhorse technology for research in systems biology. Unlike genome sequencing, which reveals a sample’s DNA blueprint, RNA-Seq catalogs the constantly changing transcriptome; that is, it itemizes and quantifies the complete set of messenger RNA transcripts that are present in cells at a specific time and under specific conditions. In this way, RNA-Seq makes it possible to investigate how the information encoded in the genome is functionally transformed into observable traits, and provides valuable data for defining and comparing different biological states.

Conventional RNA-Seq generates an average summary of mRNA abundance across all of the cells in a sample. Recent research, however, has created a demand for higher resolution technologies capable of generating mRNA profiles at the level of single cells. In cancer biology, for example, there is an increasingly acute awareness that gene expression in the cells that make up malignant tumors is highly heterogeneous. This suggests that in order to understand how the cells work together to drive a tumor’s cancerous behavior, scientists need better methods for characterizing the entire ecology of cells of which it is made. Being able to quantify differences in gene expression cell by cell could be one valuable way to explore such complex environments and understand how they sustain malignancy.

Although several single cell RNA-seq technologies have been unveiled in the past two years, they are expensive to operate and are not optimized to produce data on the scale that is required for systems biology research, particularly in tissue specimens with limited numbers of cells. In a new paper just published in the journal Scientific Reports, however, researchers in the laboratory of Department of Systems Biology Assistant Professor Peter Sims describe a novel approach that offers several important advantages over other existing methods.

The new, automated platform builds on previous innovations in the Sims Lab to offer a cheap, efficient, and reliable way to simultaneously measure gene expression in thousands of individual cells from a single tissue sample. Using custom designed microwell plates, microfluidics, temperature control systems, and software, the technology captures, tags, and generates a readout of the complete transcriptome in each cell, providing robust data that can then be analyzed to distinguish functional diversity among the cells in the sample. Already, the technology is playing a key role in several research projects being conducted in the Department of Systems Biology and promises to become even more powerful as the field of single cell genomics continues to evolve.

In a recent paper published in Molecular Systems Biology, Kam Leong describes a two-compartment microfluidic device that consists of a chamber within which is embedded a "microbial swarmbot" that is isolated by a permeable hydrogel shell. In collaboration with Lingchong You (Duke University), Leong used the device to regulate the dynamics of a population of bacteria containing a genetically engineered switch that reacts to population size. The scale bar in panel 1 represents a length of 250 micrometers.

With a restless curiosity, Kam Leong always seems to be on the lookout for new problems to solve. A versatile biomedical engineer originally trained in chemical engineering, he has developed an impressive array of innovative nanotechnologies that have opened up new opportunities in biomedical research and drug delivery.

The most widely known of his designs resulted from his work as a postdoc in the laboratory of MIT’s Robert Langer. While there, Leong played a critical role in the development of Gliadel, a controlled-release therapy that uses biodegradable polymer particles to deliver an anticancer drug to a brain tumor site following surgery. Since then his name has appeared on more than 70 patents covering a wide range of inventions — from microfluidics technologies, to scaffolds for growing organic tissues, to nanoscale fluorescent probes, to a method that uses nanoparticles instead of viruses for the oral delivery of gene therapies. These achievements have gained him widespread respect within the engineering community, as evidenced by his 2013 election to both the National Academy of Engineering and the National Academy of Inventors.

Dr. Leong joined Columbia University in 2014. Although his primary affiliation is with the Department of Biomedical Engineering, he was also attracted by the chance to assume an interdisciplinary faculty appointment in the Department of Systems Biology. Since his arrival he has been developing collaborations with several Systems Biology faculty members as well as other scientists at Columbia University Medical Center, and plans are underway for his lab to move into the Lasker Biomedical Research Building to better facilitate interactions with systems biology and clinical investigators. In the following interview, Leong describes why opportunities to interact with scientists in other disciplines is so important to his work, and how the kinds of technologies he has developed could be relevant for systems biology research, as well as for improving treatment of human diseases.

At this year's retreat Alexander Hsieh, Rotem Rubinstein, Jinzhou Yuan, and Jiguang Wang (clockwise from top left) were named winners in the Best Poster Competition.

On September 15, 2016, members of the Columbia University Department of Systems Biology gathered in Tarrytown, New York for the Department’s annual retreat. Although the tranquil setting overlooking the Hudson River was familiar, the event’s timing was new, taking place for the first time at the beginning of the academic year to enable first-year graduate students to become acquainted with the Department as they begin their studies. With a full day of scientific talks, a poster session, and ample time for informal conversation, the retreat provided an up-to-date survey of the diverse research taking place in the Department's laboratories.

An essay coauthored by Andrea Califano (Chair, Department of Systems Biology) and Gideon Bosker and published in the Wall Street Journal asks whether quantitative modeling could reveal the keys for turning cancer off. They write:

Disappointed with the slow pace of discovery and inclined to look for elegant, universal explanations for nature’s conundrums, many cancer researchers have increasingly been asking: Is there some sort of “Da Vinci Code” for cancer? And can we crack it using mathematics?

Quantitative modeling has been extremely successful in disciplines as diverse as astronomy, physics, economics and computer science. Can “cancer quants”—scientists applying quantitative analyses to the landscape of cancer biology—find the answers we seek? And, if so, what would the new paradigm look like?

The essay goes on to describe how computational methods developed in the Califano Lab are being tested in personalized N of 1 clinical trials to identify essential checkpoints in the molecular regulatory networks that sustain individual patients' tumors — as well as drugs capable of targeting them.

On the surface, birth defects and cancer might not seem to have much in common. For some time, however, scientists have observed increased cancer risk among patients with certain developmental syndromes. One well-known example is seen in children with Noonan syndrome, who have an eightfold increased risk of developing leukemia. Recently, researchers studying the genetics of autism also observed mutations in PTEN, an important tumor suppressor gene. Although such findings have been largely isolated and anecdotal, they raise the tantalizing question of whether cancer and developmental disorders might be fundamentally linked.

According to a paper recently published in the journal Human Mutation, many of these similarities might not be just coincidental, but the result of shared genetic mutations. The study, led by Yufeng Shen, an Assistant Professor in the Columbia University Departments of Systems Biology and Biomedical Informatics, together with Wendy Chung, Kennedy Family Associate Professor of Pediatrics at Columbia University Medical Center, found that cancer-driving genes also make up more than a third of the risk genes for developmental disorders. Moreover, many of these genes appear to function through similar modes of action. The scientists suggest that this could make tumors “natural laboratories” for pinpointing and predicting the damaging effects of rare genetic alterations that cause developmental disorders.

“In comparison with cancer, there are relatively few patients with developmental disorders,” Shen explains, “For geneticists, this makes it hard to identify the risk genes solely based on statistical evidence of mutations from these patients. This study indicates that we should be able to use what we learn from cancer genetics — where much more data are available — to help in the interpretation of genetic data in developmental disorders.”

The Columbia University Department of Systems Biology has been named one of four inaugural centers in the National Cancer Institute’s (NCI) new Cancer Systems Biology Consortium. This five-year grant will support the creation of the Center for Cancer Systems Therapeutics (CaST), a collaborative research center that will investigate the general principles and functional mechanisms that enable malignant tumors to grow, evade treatment, induce disease progression, and develop drug resistance. Using this knowledge, the Center aims to identify new cancer treatments that target master regulators of tumor homeostasis.

CaST will build on previous accomplishments in the Department of Systems Biology and its Center for Multiscale Analysis of Genomic and Cellular Networks (MAGNet), which developed several key systems biology methods for characterizing the complex molecular machinery underlying cancer. At the same time, however, the new center constitutes a step forward, as it aims to move beyond a static understanding of cancer biology toward a time-dependent framework that can account for the dynamic, ever-changing nature of the disease. This more nuanced understanding could eventually enable scientists to better predict how individual tumors will change over time and in response to treatment.

Following gene transcription and translation, a protein can undergo a variety of modifications that affect its activity. By analyzing downstream gene expression patterns in single tumors, VIPER can account for these changes to identify proteins that are critical to cancer cell survival.

Developed by Mariano Alvarez as a research scientist in the Califano laboratory, VIPER has become one of the cornerstones of Columbia University’s precision medicine initiative. Its effectiveness in cancer diagnosis and treatment planning is currently being tested in a series of N-of-1 clinical trials, which analyze the unique molecular characteristics of individual patients’ tumors to identify drugs and drug combinations that will be most effective for them. If successful, it could soon become an important component of cancer care at Columbia University Medical Center.

According to Dr. Califano, “VIPER makes it possible to find actionable proteins in 100% of cancer patients, independent of their genetic mutations. It also enables us to track tumors as they progress or relapse to determine the most appropriate therapeutic approach at different points in the evolution of disease. So far, this method is looking extremely promising, and we are excited about its potential benefits in finding novel therapeutic strategies to treat cancer patients.”

The researchers' model of tumor evolution indicates that different clonal lineages branch from a common ancestral cell and then diversify, independently causing aggressive tumor behavior at different stages of disease.

Glioblastoma multiforme (GBM) is the most common and most aggressive type of primary brain tumor in adults. Existing treatments against the disease are very limited in their effectiveness, meaning that in most patients tumors recur within a year. Once GBM returns, no beneficial therapeutics currently exist and prognosis is generally very poor.

MD/PhD students Andrew Anzalone and Sakellarios Zairis combined approaches based in chemical biology, synthetic biology, and computational biology to develop a new method for protein engineering.

The ribosome is a reliable machine in the cell, precisely translating the nucleotide code carried by messenger RNAs (mRNAs) into the polypeptide chains that form proteins. But although the ribosome typically reads this code with uncanny accuracy, translation has some unusual quirks. One is a phenomenon called -1 programmed ribosomal frameshifting (-1 PRF), in which the ribosome begins reading an mRNA one nucleotide before it should. This hiccup bumps translation “out of frame,” creating a different sequence of three-nucleotide-long codons. In essence, -1 PRF thus gives a single gene the unexpected ability to code for two completely different proteins.

Cofactors work with transcription factors (TFs) to enable efficient transcription of a TF's target gene. The Bussemaker Lab showed that genetic alterations in the cofactor gene (cQTLs) change the nature of this interaction, affecting the connectivity between the TF and its target gene. This, combined with other factors called aQTLs that affect the availability of the TF in the nucleus, can lead to downstream changes in gene expression.

When different people receive the same drug, they often respond to it in different ways — what is highly effective in one patient can often have no benefit or even cause dangerous side effects in another. From the perspective of systems biology, this is because variants in a person’s genetic code lead to differences in the networks of genes, RNA, transcription factors (TFs), and other proteins that implement the drug’s effects inside the cell. These multilayered networks are much too complex to observe directly, and so systems biologists have been developing computational methods to infer how subtle differences in the genome sequence produce these effects. Ultimately, the hope is that this knowledge could improve scientists’ ability to identify drugs that would be most effective in specific patients, an approach called precision medicine.

In a paper published in the Proceedings of the National Academy of Sciences, a team of Columbia University researchers led by Harmen Bussemaker proposes a novel approach for discovering some critical components of this molecular machinery. Using statistical methods to analyze biological data in a new way, the researchers identified genetic alterations they call connectivity quantitative trait loci (cQTLs), a class of variants in transcription cofactors that affect the connections between specific TFs and their gene targets.

Launched in 2014 by investigators in the Mailman School of Public Health, the CUMC Microbiome Working Group brings together basic, clinical, and population scientists interested in understanding how the human microbiome—the ecosystems of bacteria that inhabit and interact with our tissues and organs—affects our health. Computational biologists in the Department of Systems Biology have become increasingly involved in this interdepartmental community, contributing expertise in analytical approaches that make it possible to make sense of the large data sets that microbiome studies generate.

The International Society for Computational Biology has elected Professor Barry Honig to its 2016 ISCB Class of Fellows. The award recognizes distinguished ISCB members who shown excellence in research and/or service to the computational biology community. Dr. Honig’s award acknowledges his “seminal contributions to protein structure prediction and molecular electrostatics, and his more recent work on protein function prediction, protein-DNA recognition, and cell-cell adhesion.”

The International Society for Computational Biology is the largest professional society for scientists working in the fields of computational biology and bioinformatics. The 2016 Class of Fellows will be presented at its annual Intelligent Systems for Molecular Biology (ISMB) conference, to be held July 8-12, 2016 in Orlando, Florida.

Nicholas Tatonetti is an assistant professor in the Department of Biomedical Informatics and Department of Systems Biology.

A team of Columbia University Medical Center (CUMC) scientists led by Nicholas Tatonetti has identified several drug combinations that may lead to a potentially fatal type of heart arrhythmia known as torsades de pointes (TdP). The key to the discovery was a new bioinformatics pipeline called DIPULSE (Drug Interaction Prediction Using Latent Signals and EHRs), which builds on previous methods Tatonetti developed for identifying drug-drug interactions (DDIs) in observational data sets. The results are reported in a new paper in the journal Drug Safety and are covered in a detailed multimedia feature published by the Chicago Tribune.

The algorithm mined data contained in the US FDA Adverse Event Reporting System (FAERS) to identify latent signals of DDIs that cause QT interval prolongation, a disturbance in the electrical cycle that coordinates the heartbeat. It then validated these predictions by looking for their signatures in electrocardiogram results contained in a large collection of electronic health records at Columbia. Interestingly, the drugs the investigators identified do not cause the condition on their own, but only when taken in specific combinations.

Previously, no reliable methods existed for identifying these kinds of combinations. Although the findings are preliminary, the retrospective confirmation of many of DIPULSE’s predictions in actual patient data suggests its effectiveness, and the investigators plan to test them experimentally in the near future.

Students participating in a new course gain experience using the Department of Systems Biology's computing cluster, a Top500 supercomputer dedicated to biological research.

As more and more biological research moves to a “big data” model, the ability to use high-performance computing platforms for analysis is rapidly becoming an essential skill set. To prepare students to work with these new tools more successfully, the Columbia University Department of Systems Biology recently partnered with the Mailman School of Public Health in launching a new graduate level class focused on providing a strong grounding in the fundamental concepts behind the technology.

In a similar manner to the ways in which countries make and trade goods, microbial cells within bacterial communities exchange metabolites to promote cell growth. This perspective could provide a way of studying microbial communities from the perspective of economics.

An article in the Wall Street Journal reports on a recent collaboration involving Columbia University Department of Systems Biology Assistant Professor Harris Wang and Claremont Graduate University economist Joshua Tasoff that identified some intriguing similarities between economic markets and the exchange of resources among microbes within bacterial communities.

In an unusual marriage, biology and economics appear to be a match made in heaven.

Four years ago, two former roommates reunited at a friend’s wedding had time to catch up. The first, an economist, asked: “What are you working on?” The second, a biologist, answered: “How microbial communities interact. It’s kind of like in economics.”

By using statistical methods to compare genomic data across species, such as chimpanzees and humans, the Przeworski Lab is gaining insights into the origins of genetic variation and adaptation. (Photo: Common chimpanzee at the Leipzig Zoo. Thomas Lersch, Wikimedia Commons.)

Launched approximately 100 years ago, population genetics is a subfield within evolutionary biology that seeks to explain how processes such as mutation, natural selection, and random genetic drift lead to genetic variation within and between species. Population genetics was originally born from the convergence of Mendelian genetics and biostatistics, but with the recent availability of genome sequencing data and high-performance computing technologies, it has bloomed into a mature computational science that is providing increasingly high-resolution models of the processes that drive evolution.

Molly Przeworski, a professor in the Columbia University Departments of Biological Sciences and Systems Biology, majored in mathematics at Princeton before beginning her PhD in evolutionary biology at the University of Chicago in the mid-1990s. While there, she realized that the availability of increasingly large data sets was changing population genetics, and has since been interested in using statistical approaches to investigate questions such as how genetic variation drives adaptation and why mutation rate and recombination rate differ among species. In the following interview, she describes how population genetics is itself evolving, as well as some of her laboratory’s contributions to the field.