This is the end of the preview. Sign up
to
access the rest of the document.

Unformatted text preview: 1 6.874/6.807/7.90 Computational functional genomics, lecture 6 (Jaakkola) Expression arrays, normalization, and error models There are a number of different array technologies available for measuring mRNA transcript levels in cell populations, from spotted cDNA arrays to in-situ synthesized oligoarrays, and other variants. Our goal here is to illustrate basic computational methods and ideas involved in teasing out the relevant signal from array measurements. For simplicity we will focus exclusively on spotted cDNA arrays. Spotted cDNA arrays are geared towards measuring relative changes in the mRNA levels across two populations of cells, e.g., cells under normal conditions and those undergoing a specific treatment (e.g., nutrient starvation, chemical exposure, temperature, gene deletion, and so on). The mRNA extracted from the cells in each population is reverse transcrib ed into cDNA and labeled with a uorescent dye (Cye3 or Cye5) specific to the population. The resulting populations of differently labeled cDNAs are subsequently jointly hybridized to the matrix of immobilized probes, complements of the cDNA targets we expect to measure. Each array location or spot contains a number of probes specific to the corresponding target to ensure ecient hybridization. We wont consider here the question of how the probes are/should be chosen, for example, to minimize potential cross-hybridization (target hybridizing to a probe other than the intended one). By exciting the uorescent dyes of the hybridized targets on the array, we can read off the amount of each cDNA target (hybridized to a specific location on the array) corresponding to each population of interest. By jointly hybridizing the two populations we can more directly gauge any changes in the mRNA levels across the two populations without necessarily being able to capture the actual transcript levels in each. This type of internal control helps determine whether a gene is up or down regulated relative to the control. Array measurements are limited by the fact that we have to use a large number of cells (10,000 or more) to get a reasonable signal. When the cell population of interest is relatively uniform this typically doesnt matter. However, when there are two or more distinct cell types in the population, we might draw false inferences from the aggregate measurements. Suppose, for example, that gene A is active and gene B is inactive in cell type 1 and that the converse holds for cell type 2. We would see both genes active in the array measurement but this conclusion matches neither of the two underlying cell types. We will return to this issue later on in the course. 2 6.874/6.807/7.90 Computational functional genomics, lecture 6 (Jaakkola) Normalization To use the arrays we have to first normalize the signal so as to make two different ar- ray measurements or the two channels (specified by the uorescent dyes) within a single array mutually comparable. By normalizing the signal we aim to remove any systematic...
View Full
Document