TransfactomeDB: a resource for exploring the nucleotide sequence specificity and condition-specific regulatory activity of trans-acting factors.

1Department of Biological Sciences, Columbia University, New York, New York 10027, USA.

Abstract

Accurate and comprehensive information about the nucleotide sequence specificity of trans-acting factors (TFs) is essential for computational and experimental analyses of gene regulatory networks. We present the Yeast Transfactome Database, a repository of sequence specificity models and condition-specific regulatory activities for a large number of DNA- and RNA-binding proteins in Saccharomyces cerevisiae. The sequence specificities in TransfactomeDB, represented as position-specific affinity matrices (PSAMs), are directly estimated from genomewide measurements of TF-binding using our previously published MatrixREDUCE algorithm, which is based on a biophysical model. For each mRNA expression profile in the NCBI Gene Expression Omnibus, we used sequence-based regression analysis to estimate the post-translational regulatory activity of each TF for which a PSAM is available. The trans-factor activity profiles across multiple experiments available in TransfactomeDB allow the user to explore potential regulatory roles of hundreds of TFs in any of thousands of microarray experiments. Our resource is freely available at http://bussemakerlab.org/TransfactomeDB/

The flow of data. Publicly available microarray data and genomic sequence was integrated by MatrixREDUCE and other computational procedures to infer TF sequence specificities (PSAMs) and post-translational regulatory activities (TFAPs). These two data types can be displayed and interrogated using five different ‘tabs’ in the Yeast Transfactome Database interface.

Regressing microarray data on genomic sequence. (A) Inferring a PSAM from a ChIP-chip experiment. Shown is the result for the transcription factor Abf1p. The parameters of the PSAM are chosen so as to maximize the correlation between chromatin enrichment ratios and the total affinities of the promoter region across all genes. (B) Regression of the change in mRNA expression value on total promoter affinity predicted using a previously computed PSAM can be used to infer changes in the regulatory activity (the slope of the regression line) of the TF whose sequence specificity is represented by the PSAM. In this example, it is shown that between rich media and media containing copper sulphate (GEO accession number GSM17192) mRNA expression levels are downregulated in proportion to the affinity of the promoter region for Abf1p.

Comparison with weight matrices from MacIsaac et al. (15) and TRANSFAC. Each weight matrix from MacIsaac et al. (15) or TRANSFAC (5) was converted into a pseudo-PSAM (see Methods). The correlation between the total affinity of each promoter region predicted by the pseudo-PSAM and the fold-enrichment in the ChIP-chip experiment was then computed. These Pearson r values were then compared with the Pearson r values achieved by PSAMs optimized for the same ChIP-chip data by MatrixREDUCE. In all but nine instances, the correlations were better for PSAMs fit by MatrixREDUCE than for pseudo-PSAMs. In those cases where the pseudo-PSAM had a higher correlation, MatrixREDUCE could still improve the fit of the pseudo-PSAM (green lines).

Determining the parameters of the empirical P-value calculation for MatrixREDUCE quality of fit. Shown in black are the value of |r|, the absolute value of the Pearson correlation for randomized data at N = 6 505 genes and a range of PSAM widths Lw. The red line shows the result of a linear fit to the data, which gives rise to the results shown in Equation 1.