Microarray Comparison Analysis Information

Upon completion of single array analysis, comparisons can be made between samples hybridised to different GeneChips to determine whether changes in gene expression occur between the two sample sets. One chip is designated as the ‘experiment’ chip, this is compared against the other ‘baseline’ chip. After scaling and normalisation of the two chips, analysis is performed to compare differences in value between probe sets.

The use of two algorithm sets creates statistical significance metrics for each probe set. The first algorithm creates a ‘Change p-value’ and an associated ‘Change Call’. The other algorithm generates a Signal Log Ratio (SLR) call based upon a quantitative estimate of change in gene expression.

Signal Log Ratio Algorithm (SLR)

The Signal Log Ratio algorithm measures the magnitude and direction of change between transcript levels of the experimental and baseline chips. The use of logs in the analysis between the probe sets eliminates difficulties caused by one very high data point in the set masking information from lower valued data points. Base 2 is used as the log scale, therefore a SLR of 1 represents a two-fold increase in abundance of an mRNA and a value of –1 represents a two-fold reduction in transcript expression.

The following formula can be used to calculate linear fold-change from the Signal Log Ratio value.

When SLR is greater or equal to one then Fold change = 2 SLR

When SLR is lower or equal to minus one then Fold change = (-1) X 2 –(SLR)

Affymetrix Problems

Analysis of Arabidopsis gene expression using Affymetrix GeneChips is complicated by difficulties with the associated Affymetrix data files. For example, the Affymetrix Arabidopsis annotation file does not contain GenBank identifiers and genes in the Affymetrix annotation tables were quickly outdated due to continuous updating of the TIGR Arabidopsis genome annotation.

To address these problems Microsoft excel files which contain tables of updated AGI identifiers plus gene, protein, and promoter sequences were made (Ghassemian et al., 2001;
www-biology.ucsd.edu/labs/Schroeder/genechip.html). Similarly, analysis of the AtGenome1 GeneChip by the Sheen laboratory (http://genetics.mgh.harvard.edu/sheenweb) revealed three categories of discrepancy with the Arabidopsis genome data. Some BAC accession numbers have protein prediction errors and this was corrected by performing Blast searches using specific GeneChip sequences against the genome databases.

Also the MIPS and TIGR databases sometimes gave different AGI identifiers for the same gene sequences, when discrepancies arose the TIGR genome database was used. Finally, target sequences used in the GeneChip were sometimes too short and not unique, enabling more than one cDNA to bind to the probe.

The annotations of the Arabidopsis ATH1 GeneChip were recently improved taking account of these issues (Ghassemian et al., 2001) . This analysis revealed that 22,132 of the 22,746 GeneChip probe and target sequences have either 100% identity and match length >=50, or 98% identity and match length equal to the length of the target sequence. Furthermore, 133 genes had different AGI identifiers in the Affymetrix list compared to the TIGR database.