Completion of genome sequencing efforts of humans and model organisms123 have provided exciting opportunities for vision research. Questions that were hard to envisage, let alone resolve, suddenly appear within reach. We are on the verge of untangling gene networks that regulate development and cell fate and cellular processes that determine adaptation or apoptosis. Patterns of gene expression specific for a particular cell at a given time can now be determined at the genome level. Microarray and related technologies are revolutionizing vision research by making possible simultaneous analysis of thousands of genes, and hopefully soon, the entire transcriptome.4 The challenge is to understand how relatively simple variations in DNA sequence and gene expression contribute to complex biological phenomena, including normal development, aging, disease pathogenesis, and differences among individuals. We will briefly review strategies for designing microarray experiments for expression profiling, image processing including normalization, and data analysis including statistics, clustering, and pathway construction.

Types of Microarrays

Microarray experiments include two components: (1) immobilized DNA on solid surfaces called “probes” and (2) labeled cDNAs (or cRNAs) derived from RNA samples that are being analyzed called “targets.” We will discuss commonly used cDNA arrays on glass-slides5 and short oligonucleotide arrays on quartz chips (Affymetrix, Santa Clara, CA), with their unique advantages and shortcomings.

Study Design

The goal of microarray experiments is to measure the concentration of a given mRNA species in cells or tissues of interest. The measurements are typically relative, although we would often prefer to measure absolute concentrations. Figure 1 illustrates a generic microarray design. In general, mRNA is reverse transcribed into cDNA, labeled with appropriate dye, and hybridized against the array. Most microarray gene-profiling studies involve single-factor comparisons and/or time-series analysis. In a single-factor comparison, expression profiles from two or more groups are compared to each other, such as wild-type versus knockout, or normal versus diseased tissue. In time-series analysis, expression changes are compared across a group of samples at different time points—for example, cultures exposed to trophins or toxins for different periods, or samples from different age groups. In cDNA arrays, two experimental options are available. In direct comparisons, RNA from one sample is labeled with Cy-3, and RNA from a different sample with Cy-5. In indirect comparisons, a common reference RNA is labeled with Cy-3, and RNA from each of the experimental samples is labeled with Cy-5 and hybridized separately against the labeled reference RNA. The choice of the approach depends on the biological question and available resources. Direct comparison can provide a more precise measurement of the difference in expression between two samples in single-factor experiments. In contrast, indirect comparison allows assessment across several targets and makes possible comparison across multiple groups and different experiments. Overall, this method requires less RNA and fewer slides when comparing numerous samples, and though it has slightly higher noise, it provides equivalent results.678 There are more complex experimental paradigms, such as 2 × 2 factorial and loop designs, but their discussion are covered elsewhere.6910

An essential aspect of good microarray study design is replication, both to increase the likelihood of identifying true positives and decrease the risk of false positives. There are two types of replication: (1) technical replication in which the same RNA sample (or aliquots from the same source) are applied to different arrays and (2) biological replication in which RNA samples from independent sources are used. Technical replicates are essential in quantifying the experimental variation due to “technical” aspects of the experiment, such as those arising from differences in RNA preparation, degradation, labeling, array irregularities, hybridization, washing, and image analysis. Biological replicates help quantify the variation from the actual biological system being studied, such as differences between experimental samples caused by litter effects, pathogen load, or diet. Both types of replicates are essential to allow appropriate statistical tests to deduce meaningful conclusions about the sampled population. In cDNA arrays, technical replicates often include a dye swap in which the same RNA sample (labeled once with one dye and then with the other) is hybridized against two separate arrays. These experiments provide a measure of bias in the labeling and hybridization of each dye. The spots on cDNA arrays are often duplicated on each slide. Although not independent replicates, the duplicate spots can assist in determining the quality of the hybridization. In general, multiple independent biological replicates for each condition should be included.

A common dilemma is how many replicates are appropriate to detect true biological change. To answer this, we need to know the variance in signal intensity, the magnitude of the effect to be detected (i.e., level of fold change), the rate of false positives, and the desired power of the study (i.e., the probability of detecting the specified fold change). In microarrays, it is difficult to estimate the variance in signal intensity before performing experiments, because variance for each probe differs, depending on sequence, expression level, tissue source, and complexity (cross hybridization).11 In general, at least three to five replicates are needed.1213 Increasing the sample size naturally reduces the error term. Assuming that the within-group error term is approximately normal, the critical values of the t-test may be used to estimate the probabilities of both type I and type II errors.

To maximize the extraction of meaningful data, it is critical to reduce sample variation that is not due to the effect or treatment being tested. When human tissues are used, the variables include obvious differences between samples (sex, ethnicity, age, and health status), premortem conditions (such as, drugs, cause of death, body temperature, and number of days on ventilators), postmortem interval, dissection, and tissue preservation methods. Although animal experiments are easier to standardize than human studies, even comparisons between seemingly identical samples from isogenic mice can reveal significant variability.14 Experiments involving tightly controlled populations of animals must still consider the effect of hormones, immune system status, light level, and time of day at tissue collection.

Batch effects can also introduce spurious results. For example, if one performs all studies of male retinas in the first phase of a study and female retinas in the second phase a few months later, one has unintentionally introduced a serious design confound between sex and batch. An interleaved experimental design (each group represented equally in each batch) with technical replications across batches is almost essential. Technical sources of variation are also a major source of false-positive results and batch effects. One should use the same RNA extraction protocol performed, if possible, by the same person for all samples in any given experiment. The quality of RNA used as starting point for microarrays is an essential variable, as degraded RNA samples can yield false positive results.1516 Additional problematic batch effects can be introduced by reagents used to transcribe, amplify, and label mRNA. Batch effects can be detected by including a common “spiked” technical replicate in each batch. However, batch corrections are usually transcript-specific, and with sufficient biological and technical overlap between batches, one can compute batch corrections for each transcript. This is not possible with small batches, and large batches are often necessary. When using oligonucleotide arrays, it is preferable to use arrays from the same lot for all samples, since arrays from different lots can exhibit different background levels. Similarly, if using cDNA arrays, it is better to use slides from the same printing, though the use of a reference target in a two-color system offers obvious advantages. It is also important to treat control and experimental samples evenly rather than using one set of reagents/arrays for control samples and another for experimental samples.

One potential limitation of using any tissue as the source of RNA is the heterogeneity of cell types. Low-abundance transcripts present in a tissue often are not detected by microarrays. For example, the Affymetrix arrays can detect mRNAs with a relative abundance down to 5 to 10 transcripts per million.17 This corresponds roughly to a few transcripts per cell. In some cases, an abundant transcript in a rare cell type is diluted and is not detectable in a bulk analysis. Methods, such as single cell isolation,18 laser capture microdissection,1920 and flow sorting of cells,21 allow researchers to isolate specific cell types and generate their expression profile. RNA may have to be amplified in one or more rounds to produce sufficient quantities of cRNA or cDNA, but this may introduce changes in relative abundance of transcripts due to nonlinear amplification. However, several studies have compared expression profiles obtained from amplified RNA and total RNA and have found good correlations between the two samples.2223 Despite this host of somewhat intimidating factors, reliable and comparative gene expression profiles can be obtained if the study is designed correctly, controlling for variations and using appropriate replicates.

Data Analysis

A major challenge in obtaining biologically relevant results with microarray studies is the analysis of large, complex, and sometimes noisy data sets. We will discuss three levels of data analysis. Low-level analysis consists primarily of normalization and filtering data sets (generating lists of interesting transcripts), whereas mid- and high-level analyses consist of more advanced statistical methods, including clustering, network analysis, and integration of known biological patterns.

Low-Level Analysis

Microarray data (i.e., signal intensities) can be transformed from the original scale on which hybridization signal is measured. The original distribution usually has a right skew due to a small number of genes that exhibit high expression levels. Taking the logarithm (usually base-2) typically produces a distribution of values that is close to normal. More sophisticated data-transformation methods24 exploit an accurate statistical model for the raw responses and account for background intensity variations that can cause biases in log-transformed data. To compare expression levels effectively across multiple samples and arrays, data must be adjusted in several additional steps. For example, there is often large variation within and between slides (especially problematic with cDNA arrays). Factors including differences in dye incorporation, hybridization efficiencies, and scanner settings can contribute to technical variation and idiosyncratic nonlinearities. Normalization procedures, in general, attempt to soften the impact of these extrinsic sources of variation while preserving the underlying biological variation that is the focus of the research. There are two levels of normalization: within-array and across arrays. Within-array normalization is often used to compensate for uneven spotting and hybridization. It can be performed locally (i.e., at pin level) or globally. Local normalization may be more appropriate when there are inconsistencies within an array, such as those caused by differences in the spotting pins, surface variation within slide, and differences in hybridization quality across the slide. Several methods of normalization are available for each type of array, and new methods are continually being developed.

We briefly mention a few widely used normalization methods (for details see Refs. 252627 ). With respect to cDNA arrays, a robust, locally weighted scatterplot smoothing (lowess) method28 has been developed for normalization. It offers the advantage of removing intensity-dependent effects on the log2 ratio values by reducing the contribution from array spots that are on the two extremes of signal intensity. Moreover, lowess and other normalizations may be applied globally or locally. Another approach is data-driven lowess normalization, which uses a cluster of least-altered genes on the array identified by a rank-based algorithm.7 In the case of short oligonucleotide arrays from Affymetrix, the key programs, MAS 5.0 and its successor, GREX, offer the option of using different methods for global normalization. A robust multiarray analysis (RMA) has been developed that offers the advantage of performing normalization at probe level for all the probes on an array.29 RMA and the Affymetrix implementation called GREX, allows users to conduct a global background adjustment and across array normalization. Another normalization method is z-score transformation, which allows standardization of data across multiple arrays and comparison of the data independent of the hybridization intensities.30 Bolstad et al.29 describe quantitative comparisons between some of the normalization methods for oligonucleotide arrays.

Mid-Level Analysis

The next step is to determine whether an observed change in gene expression represents a true biological effect or a random fluctuation. Many studies have used arbitrary fold change threshold levels (often twofold) on sample mean probe response to identify differentially expressed genes, without accounting for statistical variations. Although it may seem logical to pursue transcripts that exhibit a comparatively high level of modulation, large differences in sample mean response can be statistically insignificant, whereas much smaller differences may be highly significant and biologically relevant. Fold change cutoffs are too arbitrary to be useful in general and should be accompanied by statistical tests. Conventional t-tests are appropriate when the error variance within each group appears to be normally distributed. If not, then nonparametric, rank-order tests are more appropriate and more robust. In either type, P-values must be adjusted for the number of comparisons performed. Given the large number of comparisons in microarrays, the likelihood is high that many of the putative differentially expressed genes are false positives. Several modified versions of t-tests have been developed for analysis of microarray data (Bioconductor Web site). Alternatively, false-discovery rate (FDR) can be used to estimate the proportion of false-positives among all the differentially expressed genes. FDR is more powerful than applying a family-wise error rate (FWER), with Bonferroni’s correction to the P-value, for multiple testing. Fold change cutoff criteria can be incorporated into the FDR framework by implementing false discovery rate confidence interval (FDR-CI) analysis.31 This allows the experimenter to control statistical significance and biological significance simultaneously, in reporting positive differential responses. Finally, analysis of variance (ANOVA) can be used in more complex microarray experiments involving more than two conditions. For an excellent review, see Cui and Churchill.32

High-Level Analysis

It is often helpful to identify transcripts with similar patterns of change, because they may provide insights into functional pathways or common regulatory mechanisms. Hierarchical clustering, K-means clustering, self-organizing maps, and mixture-model clustering have been used for grouping and discrimination among the many “signatures” implicit in array data sets. Hierarchical clustering is an unsupervised method that uses similarity or distance measures to distinguish among samples. Using a list of differentially expressed genes, genes with similar expression covariance across a set of conditions/samples are clustered together on the vertical axis. If a large number of samples are available, a subset could be used for supervised clustering by cross validation. Here, a subset of data is used as a training set, and then this learned information is used to analyze the remaining data to predict how well remaining data fit into predicated classes. This has been used successfully in cancer research to predict cancer subtypes and to generate prognostic and diagnostic biomarkers.33 Self-organizing maps (SOMs) and K-means clustering are also used to visualize similar gene expression patterns across multiple samples or conditions or time-points. K-means clustering can optimally partition genes into a fixed number of clusters by minimizing within-group dissimilarity for a specified number of groups. The quality of clusters obtained may depend on the number of clusters specified. Hence, the number of clusters must be chosen carefully and should be repeated several times with different numbers to find the best clusters. Similarly in SOM, the user must define the number of clusters. The clusters are arranged so that neighboring clusters are similar to each other. In general, as the number of clusters increases, the more similar expression patterns are obtained among genes in any given cluster; however, this is accompanied by higher statistical variation of clusters, leading to unstable clustering results. The tradeoff between high specificity (many clusters) and stability (few clusters) makes a priori selection of the number of clusters difficult. Unsupervised mixture-model-based clustering algorithms34 that automatically select the number of clusters from the data have been adapted recently to clustering gene time courses in cDNA microarrays.8 These algorithms characterize the clusters as locally Gaussian and use a complexity penalized expectation-minimization (EM) maximum-likelihood strategy to fit cluster groupings to the data. A very different grouping strategy, called Pareto analysis, was recently introduced for detecting trends in gene expression from a time-course microarray experiment.35 In this method, genes are filtered through multiobjective optimization to identify the genes that maximize a set of user-specified fitness criteria yet are statistically stable.36 Various other clustering methods for gene microarray experiments are described in detail in a book.37

After generating a list of differentially expressed genes, one can ask how these genes interact with each other and how they contribute to the phenotype of interest. In yeast38 and Drosophila,39 extensive networks of transcriptional regulators have been constructed using microarray data to identify interacting transcripts. In mammalian systems, the progress is hampered by our lack of knowledge about function of many known genes and the organization of pathways. One available approach involves exploiting gene classification schemes formalized as gene ontologies. A gene ontology is a structure that can help highlight pathway components and categories of molecules affected by a treatment or disease process. Because a gene may be involved in more than one pathway, many different pathways may be represented. Additional publicly available resources are Gene MicroArray Pathway Profiler, the Kyoto Encyclopedia of Genes and Genomes, and Biocarta. One of the most powerful tools for exploring networks of interacting genes is WebQTL, which allows one to explore arbitrary complex associative networks built around essentially any transcript, including poorly annotated expressed sequence tags (ESTs).40 In the next few years we can expect many novel programs to be developed to explore and dissect gene networks and pathways.

Validation

Microarray studies often yield a large number of differentially expressed genes that may require validation depending on study design and statistical methods used. In simple comparisons involving a few groups and samples, the expression changes in a subset of genes (defined based on functional interest, fold change levels, or FDRs) should be confirmed by independent methods (such as real-time RT-PCR). In more complex studies involving large sample sizes, validations may be impractical. Thus, considerable attention must be paid to proper study design and statistical methods so that results can be interpreted with confidence.

Applications

A PubMed search revealed over 70 published studies in which microarrays were used in vision research. Although similar studies have been performed in lens4142 and other eye tissues,4344 a few examples from retinal tissue are given here. The microarray studies have ranged from identification of genes altered during aging of human retina,1145 genes involved in organization of developing retina in embryonic mice,46 identification of a disease-causing gene based on differential expression between wild-type and rhodopsin knockout mice,47 and identification of novel genes that may be preferentially expressed in the retina.48 Although all the studies have made significant contributions to ophthalmic research, greater success could be achieved if arrays that contain all the genes expressed in the eye are used. Currently, there is an underrepresentation of the eye-specific genes in the publicly available databases. Several groups have spent considerable effort and resources to produce custom eye expression arrays. cDNA arrays containing more than 3000 ESTs from three mouse eye/retina cDNA libraries are available.49 The NEIbank has constructed cDNA libraries from many sections, including retina and RPE/choroid that are available to the vision-research community.5051 Recently, annotation and analysis of 10,000 ESTs was reported from adult mouse retina and from mouse eyes at embryonic day 15.5 and postnatal day 2.52

Summary

Microarray technology has been a rapidly evolving field and will continue to advance at a swift pace. As the use of arrays in ophthalmic research becomes more pervasive, it is essential that this technology be used appropriately. With this much experimental power, there is substantial risk of generating a large body of intriguing experimental artifacts. There is an equally serious possibility of missing significant results by applying procrustean or superficial statistical analyses. Figure 2 provides a flow chart that should be helpful for designing appropriate microarray experiments. It is imperative to consider study designs and the objectives carefully before embarking on microarray experiments. Appropriate normalization and statistical treatment of microarray data will extract useful new information, thereby allowing efficient generation and validation of hypotheses.

Looking to the Future

Genome-wide expression profiling with microarrays is expected to yield tremendous amounts of data. Efficient mining of these data is currently a challenge for vision scientists and computational biologists. It is critical to develop acceptable standards for microarray experimentation and guidelines for data presentation, storage, and sharing.53 We are approaching a new frontier of vast information. Translation of this technology to uncover fundamental interacting gene networks and to perform targeted drug design for treatment of blinding diseases will require a conscious and concerted effort.

Electronic Database Information

A few useful web sites for obtaining additional information on microarrays are provided below.

The FDRCI method has been implemented in R package (umarray.fdrci) and is now publicly available at http://www.umich.edu/∼retina/microarray.html.

Supported by grants from the National Eye Institute, The Foundation Fighting Blindness (Owings Mills, Maryland), the Macula Vision Research Foundation (West Conshohocken, Pennsylvania), the Elmer and Sylvia Sramek Charitable Foundation (Chicago, Illinois), and Research to Prevent Blindness (RPB; New York, New York). AS is the Harold F. Falls Collegiate Professor and DJZ the Guerrieri Professor of Genetic Engineering and Molecular Ophthalmology. AS and DJZ are RPB Senior Scientific Investigators.

A schematic representation of microarray experiments. cDNA arrays use a two-color hybridization system, in which cDNAs from one sample are labeled with one dye, often Cy-3, and cDNAs from another sample are labeled with a different dye, often Cy-5. By measurement of the amount of each dye bound to every spot, the relative abundance of mRNA species is estimated. With Affymetrix (Santa Clara, CA) oligonucleotide arrays, mRNAs from a single sample are reverse transcribed into cDNAs, which serve as a template in an in vitro transcription reaction to produce biotin-labeled cRNAs that are fragmented and hybridized to the array.

Figure 1.

A schematic representation of microarray experiments. cDNA arrays use a two-color hybridization system, in which cDNAs from one sample are labeled with one dye, often Cy-3, and cDNAs from another sample are labeled with a different dye, often Cy-5. By measurement of the amount of each dye bound to every spot, the relative abundance of mRNA species is estimated. With Affymetrix (Santa Clara, CA) oligonucleotide arrays, mRNAs from a single sample are reverse transcribed into cDNAs, which serve as a template in an in vitro transcription reaction to produce biotin-labeled cRNAs that are fragmented and hybridized to the array.

A schematic representation of microarray experiments. cDNA arrays use a two-color hybridization system, in which cDNAs from one sample are labeled with one dye, often Cy-3, and cDNAs from another sample are labeled with a different dye, often Cy-5. By measurement of the amount of each dye bound to every spot, the relative abundance of mRNA species is estimated. With Affymetrix (Santa Clara, CA) oligonucleotide arrays, mRNAs from a single sample are reverse transcribed into cDNAs, which serve as a template in an in vitro transcription reaction to produce biotin-labeled cRNAs that are fragmented and hybridized to the array.

Figure 1.

A schematic representation of microarray experiments. cDNA arrays use a two-color hybridization system, in which cDNAs from one sample are labeled with one dye, often Cy-3, and cDNAs from another sample are labeled with a different dye, often Cy-5. By measurement of the amount of each dye bound to every spot, the relative abundance of mRNA species is estimated. With Affymetrix (Santa Clara, CA) oligonucleotide arrays, mRNAs from a single sample are reverse transcribed into cDNAs, which serve as a template in an in vitro transcription reaction to produce biotin-labeled cRNAs that are fragmented and hybridized to the array.