Abstract

MAPPFinder is a tool that creates a global gene-expression profile across all areas
of biology by integrating the annotations of the Gene Ontology (GO) Project with the
free software package GenMAPP http://www.GenMAPP.orgwebcite. The results are displayed in a searchable browser, allowing the user to rapidly
identify GO terms with over-represented numbers of gene-expression changes. Clicking
on GO terms generates GenMAPP graphical files where gene relationships can be explored,
annotated, and files can be freely exchanged.

Background

DNA microarray experiments simultaneously measure the expression levels of thousands
of genes, generating huge amounts of data. The analysis of these data presents a tremendous
challenge to biologists and new tools are needed to help gain biological insights
from these experiments. Although the data are generated for individual genes, examining
a dataset on a gene-by-gene basis is time consuming and difficult to carry out across
an entire dataset. One way of accelerating the pace of data analysis is to approach
the data from a higher level of organization. This can be done using data-driven methods,
such as hierarchical clustering and self-organizing maps [1,2], which identify groups of genes with similar expression patterns. A complementary
approach is to view the data at the level of known biological processes or pathways.
Identifying those groups of biologically related genes that are showing a large number
of gene-expression changes will create an informative description of the biology that
is occurring in a particular dataset, making it possible to generate new hypotheses
and identify those specific areas of biology that warrant more detailed investigation.

One tool that assists in the identification of important biological processes is GenMAPP
(Gene MicroArray Pathway Profiler) [3], a program for viewing and analyzing microarray data on microarray pathway profiles
(MAPPs) representing biological pathways or any other functional grouping of genes.
When a MAPP is linked to a gene-expression dataset, GenMAPP automatically and dynamically
color codes the genes on the MAPP according to criteria supplied by the user. GenMAPP
is a useful starting point for pathway-based analysis of gene-expression data, but
there are several critical requirements to be met before this tool can be used to
identify correlated gene-expression changes across all biology. On a practical level,
pathway-based analysis of microarray data needs to be automated, so that all possible
pathways can be explored. Identifying correlated gene-expression changes in an individual
pathway is often interesting, but it is necessary to know if the gene-expression changes
seen on a particular pathway are unique to this pathway or are occurring in many other
pathways. Equally important to automation is expanding the pathway information that
is digitally represented. GenMAPP currently has over 50 MAPP files depicting various
biological pathways and gene families, but this is still only a small fraction of
all known biology [3]. Several other pathway programs such as KEGG [4], EcoCyc/MetaCyc [5], Pathway Processor (which uses KEGG) [6] and ViMAc [7] are available for integration with microarray data analysis, but these programs
focus on well-defined metabolic pathways, and like GenMAPP, would benefit from a broader
base of pathway information.

To address this issue, we have used information available from the Gene Ontology (GO)
Consortium [8]. The GO Consortium is creating a defined vocabulary of terms describing the biological
processes, cellular components and molecular functions of all genes. The GO is built
in a hierarchical manner, with a parent-child relationship existing between GO terms.
Curators at the public gene databases are assigning genes to GO terms to provide annotation
and a biological context for individual genes. In addition to providing gene annotation,
GO also provides a structure for organizing genes into biologically relevant groupings.
These groupings can serve as the basis for identifying those areas of biology showing
correlated gene-expression changes in a microarray experiment. While GO has been used
to annotate microarray data both by hand and by some software packages [9,10,11], there has been no automated way to use it for pathway-based analysis.

We have developed a tool called MAPPFinder that dynamically links gene-expression
data to the GO hierarchy. For each of the 11,239 ([12]; as of May 6, 2002]) GO biological process, cellular component and molecular function
terms, MAPPFinder calculates the percentage of the genes measured that meet a user-defined
criterion. This is done for each specific GO node, and for the cumulative total of
the number of genes meeting the criterion in a parent GO term combined with all of
its children, giving a complete picture of the number of genes associated with a particular
GO term. Using this percentage and a z score (see Materials and methods), the user can rank the GO terms by their relative
amounts of gene-expression changes. MAPPFinder therefore generates a gene-expression
profile at the level of biological processes, cellular components and molecular functions,
rapidly identifying those areas of biology that warrant further study (Figure 1).

Figure 1. How MAPPFinder works. Microarray data is imported into MAPPFinder as a GenMAPP gene-expression
dataset. Using a relational database and the gene-association files from GO, MAPPFinder
assigns the thousands of genes in the dataset to the thousands of GO terms. Using
a user-defined criterion for a significant gene-expression change, MAPPFinder calculates
the percentage of genes meeting the criterion and a statistical score for each GO
term. Using the ranked list and GO browser generated by MAPPFinder the user can quickly
identify interesting GO terms with high levels of gene-expression changes. The specific
genes involved in these GO terms can be examined on automatically generated MAPPs
using GenMAPP.

Results and discussion

To demonstrate the utility of MAPPFinder, we used the program to analyze the publicly
available mouse microarray dataset, the FVB benchmark set for cardiac development,
maturation and aging [14]. This dataset measures gene-expression levels in the hearts of 12.5-day embryos
and adult mice. We have used the 12.5-day embryonic time point to identify those biological
processes that show differentially expressed genes between embryonic and adult hearts.
We ran the MAPPFinder analysis on this dataset using two criteria, either an increase
(fold change > 1.2 and p < 0.05) or decrease (fold change < -1.2 and p < 0.05) in gene expression for the 12.5-day embryo. We chose this dataset for demonstration
because of the large number of differences in gene expression observed in the 12.5-day
embryo compared to the adult mouse heart tissue.

MAPPFinder linked the 9,946 probe sets measured in this experiment to the 11,239 GO
terms [12] in the hierarchy and calculated the percentage of genes meeting the criterion and
a z score for each GO term. Table 1 gives an overall summary of the linkages made between the dataset and GO and calculations
carried out by MAPPFinder. Nearly half of the 9,946 probe sets measured in the FVB
benchmark dataset were connected to a GO term, representing approximately 70% of the
mouse genes associated with GO terms [15] and covering a good portion of what is currently known about mouse biology. The
proportion of genes in the microarray dataset that link to GO terms will increase
as more GO terms and gene associations are added by the Mouse Genome Database (MGD)
[16].

After MAPPFinder assigns the genes in the microarray dataset to the GO structure,
it calculates for each GO term the percentage and z score (see Materials and methods) for the genes that meet the user's criterion. These
two values can be used to identify GO terms with an over- (or under-) represented
number of gene-expression changes. The MAPPFinder results are displayed in two forms.
The first is a GO browser that graphically displays the MAPPFinder results in the
structure of the GO hierarchy (Figures 2a,3a). The second is a text file listing all the GO terms measured, ranked by the z score. The number of genes meeting the criterion, the number of genes measured in
the experiment, and the number of genes assigned to each GO term by MGD are given,
along with the respective percentages and z score, in the text file and GO browser (Figure 2b). Table 2 shows the list of process, component and function terms with a z score greater than 2 for the significantly increased and decreased criteria at the
12.5-day embryonic time point. GO terms that had fewer than 5 or more than 100 genes
changed were removed from the list because these terms were either too specific or
too general for our data analysis. This filter identified the top 108 (8.0%) GO terms
for the significantly increased criterion and the top 63 (4.8%) GO terms for the significantly
decreased criterion. The stringency of this filter can be increased or decreased by
raising or lowering the z score cutoff, or by including terms with larger or smaller numbers of genes. The filtered
list was then pruned by hand for related GO terms to remove any over-represented branches
of the GO hierarchy (for the complete results, see Additional data files). When both
a parent and a child term were present in the list, the parent term was removed if
its presence was due entirely to genes meeting the criterion for the child term. The
remaining terms on the list still have a large degree of interrelatedness, but have
been retained here for completeness.

Figure 2. The MAPPFinder browser. (a) The branch of the GO hierarchy rooted at the biological process term 'RNA processing'
is shown. The terms are colored with the MAPPFinder results for genes significantly
increased in the 12.5-day embryo versus the adult mice. Terms with 0-5% of genes changed
are colored black, 5-15% purple, 15-25% dark blue, 25-35% light blue, 35-45% green,
45-55% orange, and greater than 55% red. The term RNA processing is highlighted in
yellow, indicating that it meets the search or filter requirements. (b) The MAPPFinder results. The term RNA processing is shown with the various MAPPFinder
results labeled. The percentage of genes meeting the criterion and the percentage
of genes in GO measured in this experiment are calculated. The results are calculated
for both this node individually and in combination with all of its child nodes (that
is, nested results). The z score indicates whether the number of genes meeting the criterion is higher or lower
than expected. A positive score indicates that more genes are changed than expected;
a negative score means fewer genes are changed than expected, and a score near 0 indicates
that the number of changes approximates to the expected value for that GO term.

Figure 3. Linking MAPPFinder to GenMAPP. (a) The MAPPFinder browser displaying the 12.5-day embryo increased results for the GO
process term 'glycolysis'. Color-coding of GO terms is the same as in Figure 2. (b) Clicking on the GO term glycolysis in the MAPPFinder browser builds the corresponding
GenMAPP MAPP file. This MAPP file contains a list of genes associated with this term
and all of its children. (c) Genes in the GO list were rearranged with the tools in GenMAPP to depict the glycolysis
pathway with the metabolic intermediates and cellular compartments. Color-coding of
genes for (b) and (c) is as follows: Red, fold change >1.2 and p < 0.05 in the 12.5-day embryo versus adult mice. Blue, fold change < -1.2 and p < 0.05. Gray, neither of the above criteria met. White, gene not found on the array.

The MAPPFinder results present a global picture of the biological processes, cellular
components and molecular functions that are increased and decreased in the 12.5-day
embryo compared with the adult mouse (Table 2). Using the criterion for a significantly increased gene-expression change, MAPPFinder
primarily identified GO terms involved in cell division and growth. Notable GO terms
include the processes 'mitotic cell cycle' (62.9% of 70 genes, z score of 8.1), 'mRNA splicing' (90.5% of 21 genes, z score of 7.5), and 'protein biosynthesis' (50% of 104 genes, z score of 6.8). The top-ranked component and function terms reflected the same biological
processes. For example, the component term 'spliceosome' shows that 17 out of 20 genes
(85%, z score of 6.7) were upregulated. The upregulation of these processes is consistent
with the fact that cardiomyocytes remain mitotically active throughout embryonic development
[17]. Apart from processes involved in cell division and growth, the MAPPFinder results
indicate that the processes 'transmembrane receptor protein serine/threonine kinase
signaling pathway' and 'induction of apoptosis' are upregulated, with a z score of approximately 2. The presence of the term 'transmembrane receptor protein
serine/threonine kinase signaling pathway' is due to the upregulation of genes involved
in transforming growth factor-β (TGFβ) receptor signaling, which is thought to regulate
the induction of apoptosis required for morphogenesis during heart development [18,19].

Genes involved in energy metabolism showed the highest levels of downregulation in
the 12.5-day embryo heart versus the adult heart. In particular, the process terms
'fatty acid metabolism' (63.3% of 30 genes, z score of 5.9) and 'main pathways of carbohydrate metabolism' (51.3% of 39 genes, z score 4.8), which is the parent of the terms 'glycolysis' and 'tricarboxylic acid
cycle', indicate that metabolic genes as a whole are downregulated in an embryo when
compared to an adult mouse. In addition, the component term 'mitochondrion' shows
that 88 out of 187 genes (47.1%, z score of 9.1) are downregulated. The downregulation of genes involved in fatty-acid
metabolism is consistent with research that has shown that the developing heart, unlike
the adult heart, does not derive its energy from fatty acids [20].

Overall, the MAPPFinder results provide a global perspective of the processes that
are up- and down-regulated in the 12.5-day embryonic heart compared to an adult heart.
The results confirmed what was expected: when compared to the adult heart, the embryonic
heart is undergoing increased cell division and growth and has decreased energy metabolism.
In addition, the global gene-expression profile presented by MAPPFinder allows the
gene-expression changes observed for cell division and growth and energy metabolism
to be put in the context of other regulatory and developmental processes such as TGFβ
signaling and apoptosis.

The MAPPFinder browser

Viewing the MAPPFinder results as a ranked list is informative, but it does not take
full advantage of the fact that GO is arranged in a hierarchy. MAPPFinder also presents
the results in the context of the GO hierarchy (Figures 2a,3a) showing the entire hierarchy, color-coded by the percentage of genes changed. Users
can step through the hierarchy, expanding those branches of the tree that are showing
gene expression changes, moving from broad terms to more specific categories. Often
the ranked list of terms will show many interrelated terms, and it is necessary to
view the results in the hierarchy to identify the relationships among them. For example,
the terms 'RNA metabolism', 'RNA processing', 'mRNA processing', and 'mRNA splicing'
appear as upregulated in Table 2. However, the tree view (Figure 2a) clearly shows that mRNA splicing is a child term of both RNA splicing and mRNA processing,
which are in turn child terms of RNA metabolism. Similarly, the terms 'main pathways
of carbohydrate metabolism', 'catabolic carbohydrate metabolism', and 'glycolysis'
also appear as downregulated in Table 2. The MAPPFinder browser (Figure 3a) shows that 'glycolysis' is related to 'main pathways of carboyhydrate metabolism'
through the hierarchical relationship between these terms.

The MAPPFinder browser also provides three search and navigation functions. First,
the user can search by a keyword or an exact GO term name. Second, the user can search
by a gene identifier to find which GO term(s) the gene is associated with. For example,
searching for the gene alpha-myosin heavy chain using its SWISS-PROT identifier MYH6_MOUSE
or its MGD identifier MGI:97255 finds the GO process terms 'striated muscle contraction',
'cytoskeleton organization and biogenesis', 'protein modification', and 'muscle development'.
Third, the user can expand the GO tree automatically to show all nodes with a minimum
number of genes or minimum percentage of genes meeting the criterion or with a minimum
z score. The terms meeting the filter are highlighted in yellow to clearly indicate
the results of the search.

Once the GO terms of interest have been identified with MAPPFinder, the user will
want to know exactly which genes are associated with these terms and exactly which
genes are being differentially expressed. This can be accomplished using GenMAPP.
Selecting a GO term in the MAPPFinder browser automatically builds a MAPP containing
the genes associated with that GO term and all of its children, and opens this MAPP
in GenMAPP. Figure 3b shows the MAPP generated by selecting the GO term 'glycolysis' in the MAPPFinder
browser. The genes on the MAPP are color-coded with the same criteria used to calculate
the MAPPFinder results, significantly increased and decreased at the 12.5-day embryo
time point. Clicking on a gene on the MAPP opens a 'back page' containing annotations,
gene-expression data and hyperlinks to that gene's page in the public databases. By
integrating GenMAPP and MAPPFinder, it is possible to seamlessly move from a global
gene-expression profile at the level of all biological processes, components and functions
to a detailed description of the gene-expression levels for the specific genes involved.
For example, a closer examination of the glycolysis MAPP indicates that hexokinase
I is upregulated in the 12.5-day embryo and isoforms II and IV are downregulated,
as compared with the adult heart. This is consistent with hexokinase I being the predominant
isoform in the embryonic heart [21].

Expanding MAPPFinder beyond GO

GO is a good starting point for analyzing microarray data in the context of biological
pathways, but this is by no means the only way to group related genes. Instead of
representing each GO process as an alphabetical list on a MAPP, it would be more useful
to represent the relationships between these genes as a fully delineated pathway.
As a start in this direction, GenMAPP.org [13] has created over 50 MAPPs depicting metabolic pathways, signaling pathways and gene
families. MAPPFinder can incorporate any MAPP file into its analysis to augment the
GO hierarchy. For the FVB benchmark developmental dataset, we have run MAPPFinder
on an archive of 54 mouse MAPPs available from [13] (see Additional data files for the complete results). These results for the 12.5-day
embryonic time point agree with the GO results, showing that the expression of genes
involved in the metabolic pathways 'tricarboxylic acid cycle' (83.3% of 12 genes measured,
z score of 5.91) and 'fatty acid degradation' (69.2% of 13 genes measured, z score 4.82) is significantly decreased. In addition, the significantly increased criterion
identified genes encoding ribosomal proteins (71.1% of 45 genes, z score 6.75) and genes involved in the cell cycle (53.3% of 15 genes, z score 2.4).

The archive of MAPPs provided by GenMAPP is in no way comprehensive. The growth of
this archive depends on assistance from the entire biological community. Our hope
is that, as MAPPFinder users see the added utility of viewing the GO biological processes
as fully delineated pathways, they will use GenMAPP to organize the gene lists into
more descriptive biological pathways. Figure 3c gives an example of how the genes from the GO term 'glycolysis' can be rearranged
using the tools in GenMAPP to depict the full pathway showing the direction of the
enzymatic cascade, metabolic intermediates and cellular compartments. GenMAPP.org
is currently accepting submissions of new MAPP files. MAPPs contributed by the community
will be included in the downloadable MAPP archive.

MAPPFinder is a necessary complement to current analysis tools

By approaching large datasets from a higher level or organization, MAPPFinder helps
to ease the data analysis and shorten the time necessary to gain a biological understanding
of the microarray data. MAPPFinder has greatly expanded current pathway-based tools
by using the large amount of annotations available from the GO. This broad analysis
will help identify biological processes that have not yet been implicated in a particular
experimental condition and begin to make connections between biological processes
previously thought to be unrelated.

MAPPFinder is available for yeast, mouse and human data. We plan to extend the program
to many of the other species that are in GO and updates will be available at [13].

Materials and methods

Gene-expression data

The publicly available mouse microarray dataset, the FVB benchmark set for cardiac
development, maturation and aging, was obtained from the CardioGenomics Program for
Genomics Applications [14]. These data compare healthy mouse hearts at different time points during development,
using male and female FVB/N mice. Specifically, this dataset examines heart tissue
from 12.5-day embryos, 1-day neonatal mice, 1-week mice, 4-week mice, and adult mice
at 5 months and 1 year. Our analysis focused on the 12.5-day embryonic time point
and the control adult mice. Three Affymetrix U74A version 1 arrays were used for each
time point. For the embryonic time point, three hearts were pooled for each array
because of their small size. To improve the statistical power in our analysis, the
5-month and the 1-year mice were combined into a single group of normal adult mice.
Signal intensity values were obtained with Affymetrix MAS 5.0 software. Signal values
less than 20 were raised to 20 and the log base 2 was taken. Log folds were determined
from the average of each time point when compared with the average of the combined
control group. P values were calculated with a permutation t test. The statistical analysis was done using the multest package of the R statistical
programming language [22]. These data were imported into GenMAPP, and the resulting GenMAPP Expression Dataset
file (.gex) was exported to MAPPFinder.

MAPPFinder requires a user-defined criterion for a meaningful gene-expression change.
In this case we combined a fold change with a statistical filter to determine significance.
We are using a fold change of greater than 1.2 with a p value of less than 0.05 to define a significant gene-expression increase, and a fold
change of less than -1.2 with a p-value of less than 0.05 to define a significant gene-expression decrease. To determine
the overall number of gene-expression changes in each GO term, an additional criterion
of a fold change greater than 1.2 or less than -1.2 and a p value of less than 0.05 is used (data not shown).

It is important to note that while we have used gene-expression data generated from
Affymetrix GeneChips, data from other microarray platforms and other techniques such
as SAGE (serial analysis of gene expression) can be used equally easily.

Linking the expression data to Gene Ontology

MAPPFinder builds a local copy of the GO hierarchy using the three ontology files
(Process, Component and Function) available from GO [12]. The directed acyclic graph (DAG) structure of GO [23] allows a node to be a child of multiple parents. This makes the navigation, visualization
and computation of the MAPPFinder results more difficult than if the GO were stored
in a classical tree structure. To ease the programming necessary to implement the
MAPPFinder algorithm, the DAG structure was converted to a classical tree. For each
node of the DAG that contained multiple parents, multiple copies were inserted into
the tree representation of the GO using local identifiers to handle duplicate GO terms.
This tree structure maintains the 'true path' rule enforced in the GO DAG structure.
MAPPFinder handles this conversion internally, and to the user the GO hierarchy seen
in the MAPPFinder browser will be identical to that seen in other GO browsers.

The links between the GO terms and the genes in the expression dataset are made with
the gene-association files [15]. These associations are taken from the European Bioinformatics Institute [24] for human genes, the Mouse Genome Database (MGD) [16] for mouse genes, and the Saccharomyces Genome Database (SGD) [25] for yeast genes. Currently, the genes in the input data must be identified with
GenBank, SWISS-PROT or SGD identifiers.

MAPPFinder uses a relational database to link the expression dataset to the gene-association
files. The MAPPFinder database relates gene-expression data to the appropriate gene-identifrcation
systems for each species (Figure 1). For human data, the gene-association files use SWISS-PROT identifiers, requiring
a SWISS-PROT-to-GenBank relational table to link datasets using GenBank accession
numbers to the GO annotations. For yeast data, the gene-association files use SGD
identifiers. A SWISS-PROT-to-SGD relational table is also included for expression
datasets using SWISS-PROT identifiers. For mouse data, the GO gene-association files
use MGD identifiers, requiring a GenBank-to-MGD relational table, and a SWISS-PROT-to-MGD
relational table. MAPPFinder takes advantage of the fact that MGD is also related
to UniGene, allowing additional ESTs that are not in the MGD-GenBank relational table
to be used as gene identifiers. With this intermediate step, many more GenBank identifiers
can be linked to GO annotations. Currently, there is no direct relationship between
SWISS-PROT and UniGene, so a similar intermediate step was not used for human data.

Calculating the MAPPFinder results

MAPPFinder calculates the percentage of genes measured within each GO term that meet
a user-defined criterion, and this measurement is known as the 'percent changed'.
MAPPFinder also calculates the percentage of the genes associated with a GO term that
are measured in the experiment, and this measurement is known as the 'percent present'.
Calculating the percent present is necessary to determine how well represented a GO
term is in the dataset.

The GO gene-association files [17] are potentially problematic, because they treat each GO term independently, removing
the implicit parent-child relationship. As a result, looking at the GO terms individually
is often uninformative because the number of genes associated with any one term is
smaller than the actual number of genes involved in that process, component, or function.
To address this issue, we calculate the nested percentage for a parent term with all
its children below it in the hierarchy. By combining the child terms with their parent,
the results incorporate genes associated with the entire branch of the hierarchy,
providing a much more accurate representation of the number of genes involved in that
process, component or function. As more specific branches of the GO are examined,
the denominator of the two equations will become smaller and the user can find their
desired level of specificity. One complication that arises from this method is that
in some cases a gene is associated with both the parent and child terms or multiple
child terms. When the percentages are calculated for the sub-tree, we ensure that
each gene is only counted once, so that genes with multiple annotations are not weighted
more heavily.

Another complication that arises while calculating the MAPPFinder results is the issue
of multiple probes of the same gene on the array. In this case, the features or duplicate
genes are clustered to one unique gene. If any of the instances of this gene on the
array meet the user-defined criterion, then that gene meets the user-defined criterion.
The number of unique genes is also used to calculate the z score, meaning that the statistics are based only on a single occurrence of each gene
in the dataset.

A statistical rating of the relative gene-expression activity in each MAPP and GO
term is also provided. It is a standardized difference score (z score) using the expected value and standard deviation of the number of genes meeting
the criterion on a GO term under a hypergeometric distribution. The z score is useful for ranking GO terms by their relative amounts of gene expression
changes. Positive z scores indicate GO terms with a greater number of genes meeting the criterion than
is expected by chance. Negative z scores indicate GO terms with fewer genes meeting the criterion than expected by chance.
A z score near zero indicates that the number of genes meeting the criterion approximates
the expected number. Extreme positive scores suggest GO terms with the greatest confidence
that the correlation between the expression changes of the genes in this grouping
are not occurring by chance alone. P values are not assigned to the GO terms or MAPPs because, while such a standardized
difference score could approximate a normal z score for an individual MAPP, the lack of independence between GO terms and the multiple
testing occurring among them most certainly makes the normal p value for such a z score unreliable. As a result, p values are not assigned to the GO terms and MAPPs.

The z score is calculated by subtracting the expected number of genes in a GO term (or MAPP)
meeting the criterion from the observed number of genes, and dividing by the standard
deviation of the observed number of genes. The equation used is

or

where N is the total number of genes measured, R is the total number of genes meeting the criterion, n is the total number of genes in this specific MAPP, and r is the number of genes meeting the criterion in this specific MAPP.

Therefore, if two GO terms contain the same number of genes, the term with the greater
number of genes meeting the criterion will receive a higher score. Dividing by the
standard deviation adjusts for the size of the GO term, ranking a GO term (or MAPP)
with a large number of genes meeting the criterion higher than a GO term (or MAPP)
with the same percentage of genes changed, but fewer total genes.

The MAPPFinder results are generated in the GO browser for analysis in the context
of the GO hierarchy and as tab-delimited text files that can be used for sorting and
filtering the data in a spreadsheet program.

Additional data files

The following additional data files are available:

The FVBN developmental data in the form of a 1 (.gex). It contains the microarray dataset and the criteria used to define increased
and decreased gene-expression change. It can be opened for editing in GenMAPP and
is the appropriate data type for use with MAPPFinder.

The FVBN developmental data as a 2 generated by MAPPFinder (.gdb). It contains the relationships between the genes in
the dataset and the GO hierarchy. The file can be opened for viewing in Microsoft
Access. This file must be present to build GenMAPP MAPPs from existing MAPPFinder
results.

The MAPPFinder results for the 12.5-day embryos versus the adult mice are contained
in the files: 3, 4, 5, 6, 7, 8. These text files contain the MAPPFinder results for both criteria and both the GO
hierarchy and the GenMAPP.org MAPPs. These files can be loaded into MAPPFinder for
view in the MAPPFinder GO browser. These files are tab-delimited and can also be viewed
as tables in Microsoft Excel. The 'All Changes' files contain the results for a criteria
looking for either increased or decreased gene expression changes. A 9 containing all aditional data files is available.

Additional data file 1. The FVBN developmental data in the form of a GenMAPP expression dataset file (.gex).
It contains the microarray dataset and the criteria used to define increased and decreased
gene-expression change. It can be opened for editing in GenMAPP and is the appropriate
data type for use with MAPPFinder.

Additional data file 2. The FVBN developmental data as a database file generated by MAPPFinder (.gdb). It
contains the relationships between the genes in the dataset and the GO hierarchy.
The file can be opened for viewing in Microsoft Access. This file must be present
to build GenMAPP MAPPs from existing MAPPFinder results.

Acknowledgements

We thank A. Zambon, W. Tingley, T. Speed, P. Bacchetti and J. Myers for helpful conversations
about the design and implementation of MAPPFinder, B. Taylor for help with the preparation
of this manuscript, and S. Izumo and CardioGenomics for making the microarray dataset
publicly available. This work is supported by the J. David Gladstone Institutes, the
San Francisco General Hospital General Clinical Research Center, the National Heart,
Lung, and Blood Institute, San Francisco General Hospital General Clinical Research
Center MO1RR00083 (B.R.C.) and the NHLBI Programs for Genomic Applications (BayGenomics).