In this talk, we will introduce Affymetrix's exon arrays that query expression levels of known and speculative exons. In particular, we will discuss the design of the exon arrays and other special features of the array that are important for analysis of the exon expression level. We will also give a brief introduction to microarray technology in the process.

Biological networks of play a central role in the biology of cancer. Yet there remains much uncertainty regarding the connectivity of biological networks, and modifications to such connectivity in various cancers. In recent years there has been increasing interest in data-driven methods for characterizing networks of genes, proteins and metabolites. However, the complexity of these networks means that such analyses must address a number of challenging statistical issues. This tutorial will provide an introduction to the use of Markov chain Monte Carlo (MCMC) methods for learning biological networks, focusing on a class of multivariate statistical models called probabilistic graphical models. I will discuss how MCMC may be used to draw samples from posterior distributions over network structures and how such samples may be used to address questions pertaining to features of biological networks such as edges, classes of edges or paths.

A typical microarray experiment compares two or several conditions
(for example, cancer vs. normal cells) and produces a list of genes
that are differentially expressed. The standard approach is to score
each gene individually. However, real genes do not act alone: they
interact in various ways, and cellular processes involve pathways that
contain multiple genes. Many of those pathways and interactions are
known.
Hence an analysis that combines the microarray data with prior
biological knowledge has the potential to be both more biologically
relevant and statistically powerful. We survey some of the methods
that have been proposed to address this problem, with a focus on
algorithms based on interaction graphs.

The genomes of many cancers exhibit extensive chromosomal rearrangements including translocations, inversions, duplications and deletions. I will introduce computational methods for studying these rearrangements using data from paired-end sequencing of cancer genomes. In the paired-end approach, short “tag” sequences are obtained from fragments of cancer DNA, and these tags are aligned to the reference human genome sequence identifying locations of rearrangements. I will discuss two topics in the analysis of this data. First is the question of determining the amount of sequencing required to detect rearrangement breakpoints and to localize them precisely. I will show how extensions of the Lander-Waterman statistics address this question. Second, I will examine the possibility of deriving a parsimonious sequence of rearrangements that transform the normal human genome into a cancer genome. For this problem, I will introduce the Hannenhalli-Pevzner theory, which yields a sequence of inversions and translocations that produce one genome from another, and describe how duplications in cancer genomes can also be analyzed.

New in situ data for the analysis of interactions between cancer and the immune system in auxillary lymph nodes
Susan Holmes (Stanford University)

Location

--

Video

--

Abstract

Recent work (Kohrt et al, 2005) has shown that T cell profiles in lymph nodes are much more accurate predictors of cancer survival in breast cancer than tumor size. In order to extend this preliminary staudy on 76 patients to a very large cohort, we needed to collect large amount of data on stained images. We have built {GemIdent: http://www.gemident.com}} a novel object identification algorithm in Java to locate immune and cancer cells in images of immunohistochemically-stained lymph node tissue from the Kohrt study and also shows promise in other domains. Our method leans heavily on the use of color and the relative homogeneity of object appearance. As is often the case in segmentation, an algorithm specifically tailored to the application works better than using broader methods that work passably well on any problem. Our main innovation is interactive feature extraction from color images. We also enable the user to improve the classification with an interactive visualization system. This is then oupled with the statistical learning algorithms and intensive interaction with the user over many classification-correction iterations, resulting in a highly accurate and user-friendly solution. At the web site we have made available both a detailed manual, movies showing the various stages of analyses and the code as well as the documented source code available to academics.
Kohrt, H.~E., N.~Nouri, K.~Nowels, D.~Johnson, S.~Holmes, and P.~P. Lee, 2005:
Profile of immune cells in axillary lymph nodes predicts disease-free
survival in breast cancer. {\em PLoS Med\/}, {\bf 2(9)}, e284.
Kapelner, A., P.~Lee, and S.~Holmes, 2007: An interactive statistical image segmentation and visualization system. {\em Medivis\/}, {\bf 00}, 81--86.

Analysis of the regulators of caspase activation by death receptor ligands
Suzanne Gaudet

Location

--

Video

--

Abstract

In multicellular organisms, appropriate control of apoptosis, or programmed cell death, is critical to homeostasis and health. Death receptor ligands are a family of cytokines that can induce the activation of caspases, the proteases that are the effectors of cell death. To characterize the cell death decision process in human cancer cells, we are combining quantitative measurements from single cells and populations of cells with analysis of ODE-based models of the relevant protein signaling networks. We seek a quantitative understanding of the factors that control caspase activation dynamics; in particular, whether or not caspases are activated, how long is the delay before their activation and how quickly is the activation process completed.

Published reports suggest that DNA microarrays identify clinically meaningful subtypes of lung denocarcinomas not recognizable by other routine tests. I will describe work done in my lab to validate the reproducibility of the reported tumor subtypes. I will focus primarily on three independent cohorts of patients with lung cancer were evaluated using a variety of DNA microarray assays. Using the integrative correlations method, a subset of genes was selected whose reliability was acceptable across the different DNA microarray platforms. Tumor subtypes were selected using consensus clustering and genes distinguishing subtypes were identified using the weighted difference statistic. Gene lists were compared across cohorts using centroids and gene set enrichment analysis (GSEA). Particular attention will be focused on the importance of defining cohorts of similar composition before entertaining unsupervised analyses. Having defined reproducible subtypes, attention will be given to extending the classification to the clinical setting.

A Bistable Rb/E2F Switch: A Model for Mammalian Cell Cycle Entry
Lingchong You

Location

--

Video

--

Abstract

The restriction point (R-point) marks the critical event when a mammalian cell commits to proliferation and becomes independent of growth stimulation. It is fundamental to normal differentiation and tissue homeostasis and appears dysregulated in virtually all cancers. Although the R-point has been linked to various activities involved in G1/S control, the underlying mechanism remains elusive. Using single-cell measurements, here we show that the Rb/E2F pathway functions as a bistable switch to convert graded serum inputs into all-or-none E2F responses. Once turned ON by sufficient serum stimulation, E2F can memorize and maintain this ON state, independent of continuous serum stimulation. We further show that, in both critical dose and timing responses, bistable E2F activation directly correlates with a cell's ability to traverse the R-point.

Our objective is to reveal how kinetochore proteins regulate kinetochore microtubule (kMT) dynamics and along what pathways kinetochore proteins convert chemical and mechanical signals into kMT regulation, taking the budding yeast S. cerevisiae as our model system. We want to achieve this by building a model of the network of functional interactions between kinetochore proteins and devising a biophysical description of how the different kinetochore network states affect kMT dynamics. Model parameters will be estimated using large sets of wildtype and mutant kMT trajectories measured via live-cell light microscopy. Data from wildtype and mutant strains will be used together for model calibration in order to limit the space of possible kinetochore states. Due to the stochasticity of kMT dynamics, simulated and experimental kMT trajectories cannot be compared and matched on a time-point by time-point basis. A method that allows the calibration of probabilistic models using stochastic data is the method of indirect inference. In this approach, the matching between model prediction and experimental observations is achieved at the level of a set of intermediate statistics, or descriptors, that represent the essential features in the data. We have established autoregressive moving average (ARMA) model parameters as a unique and complete set of descriptors of S. cerevisiae kMT dynamics, making them ideal intermediate statistics for model calibration. ARMA models extract the dependence of kMT length on its history and on a related random state fluctuations series which embodies the stochastic nature of kMT dynamics. The comparison of ARMA descriptors is done within a statistical hypothesis testing framework, using the variance-covariance matrices of the descriptors. The p-values from the statistical tests can be used to construct an objective function to match model-generated and experimental kMT trajectories. They can be also used to detect differences between kMT dynamics under different experimental conditions on a continuous scale. This has allowed us to cluster kMT dynamics in different mutants and to identify functional groups among kinetochore proteins. Via ARMA analysis, we found that the regulation of kMT dynamics varies with temperature and cell cycle, but not with the chromosome an MT is attached to. We also found that the essential proteins Okp1p, Ipl1p and Dam1p and the nonessential motor Kip3p play a role in the regulation of kMT dynamics. In particular, the mutants ipl1-321, dam1-1 and kip3Δ exhibit the same misregulation of kMT dynamics, suggesting that the corresponding three proteins form a functional group. These results illustrate that ARMA descriptors are sensitive enough to detect the subtle changes in kMT dynamics resulting from kinetochore protein mutation and that the clustering of kMT dynamics in different kinetochore mutants based on ARMA descriptors reveals functional groups among kinetochore proteins, assisting in the construction of a mechanistic model of kinetochore-kMT interactions.

Affymetrix has created a new generation of expression chips, based on placing probes across the entire length of a transcript. This includes the Exon arrays and the newer Gene arrays. In this talk, I will discuss a performance comparison of gene-level expression summaries using publicly available data from a tissue
experiment and a designed mixture experiment. We make comparisons of
the 3 platforms in terms of the coverage of expressed content, reproducibility and detection rates.

The Huang lab has been a pioneer in high-throughput global CpG island methylation detection: The differential methylation hybridization (DMH) methodology allows one to simultaneously observe methylation signatures of thousands of DNA CpG islands. As with all microarray data, the power of the global assays can be greatly reduced by the large amounts of non-biologically relevant signal (a.k.a noise). In an approach to remove signal from the data related to probe composition (i.e., nucleotide related hybridization bias) and DNA characteristics (i.e., over or under abundance of methyl-sensitive restriction cut sites), we have developed a regression model approach to signal preprocessing tailored to the DMH experimental protocol. Concurrently, we are developing a hidden Markov model (HMM) for analyzing the differential methylation signature: The methyl-status of the CpG dinucleotides are modeled as the hidden signal and the probe intensities as the observed. In this manner we are able to assign statistical significance to the methylation status of a given DNA region. As a means of testing both the preprocessing methodology as well and the HMM signature detection technique, we have developed data simulation strategies that model all aspects of the DMH protocol as well as the DNA-probe interactions.

Over the past few years, microarray experiments have supplied much information about the disregulation of biological pathways associated with various types of cancer. Many studies focus on identifying subgroups of patients with particularly agressive forms of disease, so that we know who to treat. A corresponding question is how to treat them. Given the treatment options available today, this means trying to predict which chemotherapeutic regimens will be most effective.
We can try to predict response to chemo with microarrays by defining signatures of drug sensitivity. In establishing such signatures, we would really like to use samples from cell lines, as these can be (a) grown in abundance, (b) tested with the agents under controlled conditions, and (c) assayed without poisoning patients. Recent studies have suggested how this approach might work using a widely-used panel of cell lines, the NCI60, to assemble the response signatures for several drugs. Unfortunately, ambiguities associated with analyzing the data have made these results difficult to reproduce.
In this talk, we will discuss the steps involved in attacking response prediction, and describe how we have analyzed the data. We will cover some specific ambiguities we have encountered, and in some cases how these can be resolved. Finally, we will describe methods for making such analyses more reproducible, so that progress can be made more steadily.

As part of the LBNL Integrated Cancer Biology Program, copy number changes in 51 breast cancer cell lines were measured using array Comparative Genomic Hybridization (aCGH). However, other genomic alterations in these cell lines are invisible to aCGH including copy neutral rearrangements such as inversions and translocations. These rearrangements are also common in cancer and can produce novel fusion genes or alter gene regulation. A technique called End Sequence Profiling (ESP) can identify these rearrangements by mapping paired-end sequences of cancer genome fragments to the reference human genome sequence. We have performed ESP for three of the cell lines in the LBNL ICBP. I will describe the ESP data and its integration with other genomic data. In particular, I will show how to determine candidate fusion genes using ESP data, and I will describe how to combine data from both ESP and CGH to derive the organization of the cancer genome. Finally, I will demonstrate potential consequences of cancer genome organization by examining Chromatin Immunopreciptation (ChIP) data in the context of the rearranged cancer genome.

Cancer progression often involves alterations in DNA sequence copy number. Multiple microarray platforms now facilitate high-resolution copy number assessment of entire genomes in single experiments. This technology is generally referred to as array comparative genomic hybridization (array CGH). I will discuss our technique for identifying regions of abnormal copy number in array CGH data, which is called circular binary segmentation (CBS). The first published version of CBS was criticized for being slow. I will present our methods for greatly speeding up the procedure. I will also show our approaches to recent copy number applications including allele-specific copy number, clonality, and copy number variation.
This is joint work with E.S. Venkatraman.