Abstract

Genes with small open reading frames (sORFs; <100 amino acids) represent an untapped source of important biology. sORFs largely escaped analysis because they were difficult to predict computationally and less likely to be targeted by genetic screens. Thus, the substantial number of sORFs and their potential importance have only recently become clear. To investigate sORF function, we undertook the first functional studies of sORFs in any system, using the model eukaryote Saccharomyces cerevisiae. Based on independent experimental approaches and computational analyses, evidence exists for 299 sORFs in the S. cerevisiae genome, representing approximately 5% of the annotated ORFs. We determined that a similar percentage of sORFs are annotated in other eukaryotes, including humans, and 184 of the S. cerevisiae sORFs exhibit similarity with ORFs in other organisms. To investigate sORF function, we constructed a collection of gene-deletion mutants of 140 newly identified sORFs, each of which contains a strain-specific "molecular barcode," bringing the total number of sORF deletion strains to 247. Phenotypic analyses of the new gene-deletion strains identified 22 sORFs required for haploid growth, growth at high temperature, growth in the presence of a nonfermentable carbon source, or growth in the presence of DNA damage and replication-arrest agents. We provide a collection of sORF deletion strains that can be integrated into the existing deletion collection as a resource for the yeast community for elucidating gene function. Moreover, our analyses of the S. cerevisiae sORFs establish that sORFs are conserved across eukaryotes and have important biological functions.

Evidence of S. cerevisiae sORFs. (A) Gene expression-based analyses and homology searching reveal 170 potential sORFs, bringing the total number of annotated sORFs in S. cerevisiae to 299. Reports in the literature provided empirical evidence of transcription derived from SAGE, microarray, RT-PCR, Northern blot, and gene-trap experiments, while empirical evidence of translation was derived from reports of mass-spectrometry and gene-trap experiments. Comparative genomic studies provided evidence of homology. (B) The bar graph depicts the subcategorization of the evidence for the 170 new sORFs, showing that the largest number of sORFs were identified based on evidence of transcription followed by evidence based on translation and homology.

sORFs constitute a similar percentage of annotated ORFs in representative eukaryotes. The percentage of sORFs for S. cerevisiae and representative eukaryotes was calculated and is depicted in the bar graph. The genome size (megabases) and the number of RefSeq ORFs for each eukaryote are displayed below the graph.

sORFs required for growth at 37°C (Ts) or in the presence of nonfermentable carbon source (petite phenotype). (A) sORFs required for growth at 37°C. Growth assays of 3 μL of fivefold serial dilutions of logarithmic-phase cells of the sORF deletion strains spotted on YPD plates and incubated at 30°C or 37°C for 2 to 3 d. The wild-type strain BY4741 and the temperature-sensitive strain ndc10-1 (JK421) served as controls. (B) Growth assays of 3 μL of fivefold serial dilutions of logarithmic-phase cells of the sORF deletion strains spotted on YP Dextrose or YP Glycerol plates and incubated at 30°C for 2 to 3 d.

YGR271C-A and YGR272C constitute a contiguous ORF required for growth at 37°C, growth on HU-containing media and cell cycle progression. (A) The ygr271c-aΔ, ygr272cΔ and ygr271c-aΔ/ygr272cΔ strains exhibit slow growth, Ts, and HU-sensitive phenotypes. Growth assays were carried out as in Figures 3 and 4. (B) YGR271C-A and YGR272C exhibit similarity to the A. gossypii ORF PABR143C (Brachat et al. 2003). By sequencing the genomic locus in S. cerevisiae, we determined that YGR271C-A and YGR272C constitute a single larger ORF. (C) Cell cycle analysis was done by arresting wild-type (BY4741) and ygr271c-aΔ/ygr272cΔ strains with α-factor (0 min) and then releasing them into pheromone-free media. Samples were analyzed for DNA content at 20-min intervals as indicated by flow cytometry. The 1C and 2C peaks denote cells with a 1N and 2N DNA content corresponding to cells in G1 or G2/M phase of the cell cycle, respectively.