YFGdb Help Contents:

YFGdb searches: view and download yeast data sets

The YFGdb searches may be used to search and view the functional genomics data set collection.
Data and annotation files are provided for download either in a tar ball or zip file format.
Note that these files might be in a variety of formats, including
MAGE-ML, GEO soft format, pcl or cdt files, or tab-delimited files.
For more information on these file formats, see the file formats
section below. Our longer term
goal is to incorporate all these data within the YFGdb PostgreSQL database,
so that all data can be exported either via files in common formats
or via a sql dump of the database.

The Quick Search may currently be used to query YFGdb based
on PubMed ID
(e.g. 16963631).
It is available at the top of every
YFGdb page.

The Advanced Search may be used to query YFGdb based on study type, experimental technology, Gene Ontology (GO) biological
process terms, file format and curation status in YFGdb. Please note that only the "study type" must be selected; the rest of the categories are optional and may be used to further refine your search.

Step 1 (required): Select one or more study types. You may sort the results by author, PubMed ID, file format or experimental technology. The default sort orders the results based on the first author's last name.

Step 2 (optional): To further narrow your results, you may also make the following selections:

Experimental technology: Select one. Please note that this is an "AND" search and the experimental technology selected should be consistent with the study type chosen in Step 1.

GO Biological Process Terms: Using the yeast Gene Ontology (GO) process slim terms (obtained from SGD), YFGdb curators manually associate GO processes addressed by each study, if appropriate. You may select one or more GO process terms to further refine your query. However, please note that each resulting study will match ALL of the selected GO terms.

File formats: To further narrow your results, select one or more file formats. Each resulting study will match ALL of the selected file formats in addition to any query selections made above.

Curation status: You may further restrict your query based on curation status in YFGdb. The current options are the following:

Curated: a study curated by YFGdb curators that includes relevant data sets, a README file, and an archetype file, as appropriate.

Not yet curated: a study in YFGdb that includes only collected data sets in various formats. The entry has not yet been curated by a YFGdb curator.

No data available: a study in YFGdb for which there are no data available at this time, according to the original authors contacted by YFGdb curators.

Query Results are organized based on study type and include a list of studies that match your search criteria. In the publication column, links to the relevant YFGdb entry, PubMed entry, SGD curated Paper and web supplements are provided when available:

Click on the YFGdb icon in order to access the YFGdb entry for that study.

Click on the arrow icon in order to access the author's or journal's web supplement for the paper.

To access the relevant PubMed entry for the paper, click on the PubMed icon.

To obtain more information on the relevant paper, click on the "SGD curated paper" icon.

Downloading Data Sets: Click on the tar or zip links in the archive column of the query results in order to download all files associated with a particular study. If the study has not yet been curated in YFGdb, then only the data files in their original format (e.g. text, pdf, Excel, soft, etc.) will be available. If the study has been curated in YFGdb, then the downloadable archive will contain the data files associated with the study, and a README file describing in detail
all of the downloaded files associated with the study. These files should be untarred and
uncompressed by any standard compressing/uncompressing software (for
example, Stuffit Expander). For help and for downloading free
versions of programs that can unzip and uncompress these tar files,
see the gzip home page.

In addition to the README and data files, we also provide an archetype
gene file for some data sets when appropriate. The archetype
genes are meant to help indicate what comprises a significant result
for a particular study, for example, CLN2 is an archetype gene for the
cell cycle data sets.

YFGdb study viewer

Each individual study associated with a paper has a study viewer page in YFGdb. Some papers have multiple studies associated with them, in which case a disambiguation page is provided. Clicking on any of the YFGdb study IDs on the disambiguation page will open up the relevant study viewer. The study viewer page contains the following information:

Publication:

The full citation of the paper associated with the data set is
provided at the top of the study viewer page. Links to the relevant
PubMed entry , SGD curated paper and web
supplements are also provided when available. If the paper is associated with any entries in a public repository such as GEO and/or ArrayExpress, then the accession ids are also provided and serve as direct links to the relevant entries in those repositories.

YFGdb study ID:

The YFGdb study ID is a unique accession id that corresponds to a single study for a particular paper. Please note that a single paper may have more than one study associated with it, and multiple studies associated with a publication may or may not be of the same study type. The format of the YFGdb study ID is the Pubmed ID (e.g. 17314980) followed by the study id, e.g. 17314980id466. YFGdb may be searched based on study ids using the Quick Search.

Status:

The current curation status in YFGdb is provided for each study:

Curated: a study curated by YFGdb curators that includes relevant data sets, a README file, and an archetype file, if appropriate.

Not yet curated: a study in YFGdb that includes only collected data sets in various formats. The entry has not yet been curated by a YFGdb curator.

No data available: a study in YFGdb for which there is no data available at this time, according to the original authors contacted by YFGdb curators.

Study Type:

The study type assigned by SGD or YFGdb curators is given for each study associated with a PubMed ID.

Study Description:

If the study has been curated by YFGdb, then overviews of the study design are also provided. Most of these study descriptions are written by curators as they curate the data set, whereas some are parsed from MAGE-ML files (i.e. provided by the authors).

Web supplement: data set files in any format obtained from author or journal web supplements.

Contact:

The contact person for the data set with a link to their email address. Often this is the corresponding author on the original paper, although in some cases another author may serve as the contact for the data set. Authors may be contacted for more information.

Visualization and analysis tools:

Different visualization tools, such as Java TreeView, are available
on study viewer pages, if applicable. A brief description of the
tools and links to launch the application for viewing the results are
also provided. More visualization and analysis tools will be added
over time.

Download data files:

Click on the tar or zip links in order to download all files associated with a particular study. If the study has not yet been curated in YFGdb, then only the data files in their original format (e.g. text, pdf, Excel, soft, etc. ) will be available. If the study has been curated in YFGdb, then the downloadable archive will contain the data files
associated with the study and a README file written by YFGdb curators describing in detail
all the downloaded files associated with the study. These files should be untarred and
uncompressed by any standard compressing/uncompressing software (for
example, Stuffit Expander). For help and for downloading free
versions of programs that can unzip and uncompress these tar files,
see the gzip home page.

In addition to the README and data files, we also provide an archetype
gene file for some curated data sets when appropriate. The archetype
genes are meant to help indicate what comprises significant expression
for a particular study, for example, CLN2 is an archetype gene for the
cell cycle data sets. The individual files associated with the study are also listed in tabular format with their file size and type noted. They may be downloaded invidually.

README file:

A detailed README file is written by YFGdb curators for each curated study. The README file includes the full citation, PubMed ID, study description, a brief description of the raw and/or processed data files, web supplement links and author contact information. For more information on the different file types, please see the file formats section below.

Archetype gene file:

Archetype gene files are created for studies by YFGdb curators, if appropriate. Archetype genes are intended to indicate what constitutes a significant result for a particular experiment (e.g. the G1 cyclin CLN2 is an archetype gene for cell cycle data sets). Each archetype has a curated description indicating criteria used to identify significant genes in the data set and how the archetype genes meet these criteria. Archetype genes are meant to serve as benchmarks for biologists looking at their favorite genes in a data set and for computational biologists writing algorithms to find significance in an automated way.

The archetype gene files contain the following columns:

Type/group of genes: used to group archetype genes
together in a set

ORF name

Type of expression/behavior: Amplified, Increased, Decreased,
Deleted, Periodic, or Other

Description: description of the group of archetype genes as well
as information about the individual gene

File formats

Depending on the type of experiment and the particular data set, the
files provided may be in a variety of formats. Currently, there are a
few types of formats for microarray experiments, while data from other
experiment types most often are available in basic tab-delimited formats.

Possible file formats:

The Affymetrix CEL file contains raw microarray analysis results that must be interpreted by specific software programs (e.g. MAS5, dChip). The CEL file contains the raw intensities of all the in situ oligos (25-mers). These typically number ~200,000. There is no direct association of an individual probe intensity with the gene it might be reporting on, at least not within the CEL file. In the design of these probes, Affymetrix typically chose about 16 oligos, termed a "probeset", that collectively report on the transcript level. Software (MAS5, dChip, etc.) typically reads the CEL file along with a library file (mapping of the probe to the probeset/transcript/gene) to make the aggregate call on the transcript level that the probes report on. The resulting file (derivation of CEL) from this analysis typically has as many lines as there are genes in the organism (plus any other hypothetical transcripts).

CHP (Affymetrix)

The Affymetrix chip (.CHP) file contains microarray analysis results produced from Affymetrix software. This file can be saved as .chp, .txt, or exported as an Excel (.xls) file. Over the years, this file format has been generated using at least four different versions of Affymetrix software and two different types of algorithms. Older software versions (MAS4 Microarray Suite, and GeneChip Analysis Suite) made use of an empirical expression algorithm, i.e. the calculations were not based on standard statistical methods. The file Affymetrix_Empirical.txt describes the column headers generated using the empirical algorithm. The more recent software versions (MAS5 Microarray Suite, or GCOS GeneChip Operating Software) make use of statistical expression algorithms. The file Affymetrix_Statistical.txt describes the column headers generated using the statistical algorithm. For more information, see a summary of the method analysis, or refer to Affymetrix's statistical algorithms technote. Please note that there are three Affymetrix yeast arrays: Ye6100, YG-S98, and Yeast 2.0. Information on these arrays is available from the YG-S98 and Ye6100 datasheet and the Yeast 2.0 datasheet.

GFF

The "Gene Finding Format" or "General Feature Format" (GFF) is a standarized file format for describing genes and other features associated with DNA, RNA and protein sequences. Please refer to the GFF specification document for more information.

GPR (GenePix)

GenePix results files (.gpr) are the output from
a Molecular Devices "GenePix" microarray scanner, which takes
measurements from a variety of microarrays spotted on glass. The file
GenePix_GPR.txt (html version) summarizes the information found in
the header of a GPR file and describes the column headings in the
current software version. The file gpr_history.txt (html version) describes the changes that have been
made to the GPR format since its initial public release (GenePix
Results format version 1.4). The following sample files containing
the headers, column headings, and several rows of data are also
available for reference: GPR v. 1.4 (3_0_6_x_truncated.gpr), GPR v. 2.0 (
4_0_1_x_truncated.gpr), GPR v. 3.0 (4_1_1_x_truncated.gpr), GPR v. 3.0 ( 5_0_1_26_truncated.gpr). Note that they are numbered according to the GenePix Pro software version, not the GPR format version. As of GenePix Pro 5.0, Molecular Devices adopted a flexible file format in which the positions and contents of the GPR file columns are not specified; rather, they are read and identified when the file is opened. Accordingly, Molecular Devices froze the GPR file format at v. 3.0.

MAGE-ML

MicroArray and Gene Expression (MAGE) XML format for
microarray data exchange. For details on MAGE-ML, see the MAGE page. Note that most of the MAGE-ML files on this site were provided by ArrayExpress.

MAGE-TAB

MicroArray and Gene Expression (MAGE) tab-delimited text format for
microarray data. For details on MAGE-TAB, see the MAGE page. Note that most of the MAGE-TAB files on this site were provided by ArrayExpress.

PCL

Pre-Clustered File, file format developed by the Stanford
Microarray Database (SMD). The file pcl_format.txt describes the pcl file format and column headers. For more information see SMD or PUMAdb.

PSI-MI

The Proteomics Standards Initiative Molecular Interaction (PSI-MI) XML format is a standardized data exchange format for protein-protein interactions. See the Proteomics Standards Initiative site for more information on the PSI-MI format

SOFT

Usually both soft (.soft) and annotation (.annot) files are
available. See GEO
for more information.

TXT

In some cases, particularly in older data sets, data are not
available in a standard format, but instead are in author-defined
tab-delimited files. In addition, in some MAGE-ML sets, the actual data
matrices are provided in separate *.txt files. BioDiscovery, Inc. has also written microarray analysis software called ImaGene. Additional information on the column headings from ImaGene .txt files can be found in this summary, the user's manual, or this sample .txt file.

Alternative sources of functional genomics data and
other useful links

There are several other sources for functional genomics data available
for both yeast and other species:

Repositories of misc. functional genomics data:

SGD: provides a tool for searching microarray expression data sets,
called Expression Connection. In addition, the SGD Genome-wide
Analysis page lists all yeast publications that describe some
large-scale analysis, categorized by the type of experiment. This
page also includes links to web supplements and data files. There is
also a variety of data sets available at the SGD ftp site.

MIPS/CYGD: provides various types of functional
genomics data, including interaction and phenotype data.

Major microarray repositories:

GEO: Gene Expression Omnibus at the NCBI, provides data in a
tab-delimited format.

YMGV: Yeast
Microarray Global Viewer, provided by the Jacq group in Paris, France.

Sources of interaction data:

BioGRID:
provided by Mike Tyers' lab at the Samuel Lunenfeld Research
Institute, Toronto. Contains the most extensive set of large-scale data sets as well as
individual interactions manually curated from the literature.