Abstract

We assess five years of usage of the major genome-wide collections of mutants from Saccharomyces cerevisiae: single deletion mutants, double mutants conferring 'synthetic' lethality and the 'TRIPLES' collection of mutants obtained by random transposon insertion. Over 100 experimental conditions have been tested and more than 5,000 novel phenotypic traits have been assigned to yeast genes using these collections.

In April 1996, the completely annotated genome sequence of the yeast Saccharomyces cerevisiae was made publicly available [1,2], the first eukaryotic genome sequence to be completed. Eight years later, thanks to the united efforts of the large yeast research community and to the unique genetic and physiological properties of yeast, this humble servant of mankind provides by far the best annotated eukaryotic genome [3]. The completeness of the yeast genome sequence has allowed the development of many novel tools for analyzing all molecular components of the cell and their interactions. These tools include three high-throughput collections of mutants that were first produced in 1999 and that have been analyzed in the five years since then. Here, we review the uses of these collections and their contribution to the identification of the components of basic physiological and developmental pathways of S. cerevisiae.

The yeast deletion mutant collection

A set of over 20,000 knockout strains was created by a consortium of European and North American laboratories [4,5]. The collection currently contains homozygous and heterozygous diploid strains corresponding to deletions of each of 5,916 genes (including 1,159 essential genes) and one haploid strain of each mating type for every non-essential gene (4,757 genes). Each knockout strain is marked by two unique 20-nucleotide 'bar codes', allowing quantitative and qualitative identification by DNA microarray hybridization of each strain in the pools used to assess the strains under different growth conditions (see Figure ​Figure1).1). The original article [4] describing this collection has been cited more then 560 times in the five years since its publication, according to the ISI Web of Science [6]. The complete collection of strains can be obtained at low cost from Euroscarf [7], ATCC [8] and Invitrogen [9].

Construction and screening of the yeast deletion strain collection. (a) The cassette used consists of a kanamycin-resistance gene (KanMX4) flanked by two tags (also called barcodes), the UPTAG and the DOWNTAG, which are unique to each gene. The yeast...

The deletion collection has been used in dozens of novel exhaustive screens for phenotypes that occur under a variety of physiological conditions; these include growth in minimal medium, in high salt and low salt, in galactose or sorbitol, at pH 8, after heat or cold shock, under stress by hydrogen peroxide (all in [10]); growth on non-fermentable carbon substrates [11], in saline conditions [12] or after treatment by ionizing radiation or DNA-damaging agents [13-17]; and the collection has also been screened for defects in meiosis, sporulation and germination [18,19]. This approach has uncovered numerous new putative components of well-known pathways; for instance, the number of genes known to have sporulation or germination phenotypes when deleted has been doubled by these analyses [18].

More sophisticated screens, for example for suppressors of the accumulation of mutations [20], have been developed more recently, as well as screens involving transformation of the deletion strains in order to identify genes needed for non-homologous DNA end-joining [21]. Novel protocols requiring individual transformations of each mutant have allowed the identification of host factors that influence the fate of the Ty family of long-terminal-repeat retrotransposable elements [22] and of genes involved in the unfolded protein response induced by heterologous introduction of mutant human Huntingtin protein or fragments of α-synuclein, both of which form disease-associated aggregates [23]. Similarly, proteins that interfere with the assembly of endoplasmic-reticulum structures termed karmellae, which are induced by elevated levels of HMG-CoA reductase under specific growth and genetic conditions, have been identified using the collection [24]. Several morphological screens have been developed, for example for defects in the selection of the bipolar bud site [25], in cell-size distribution [26,27], in cell morphology [28] and in meiotic chromosomal segregation [29]. Another approach is the screening of individual colonies. For instance, mis-sorting and secretion of vacuolar carboxypeptidase Y were detected by colony immunoblotting [30]. A second example of colony screening is the transformation of each single-deletion strain to express viral replicase proteins and an RNA replication template in which the capsid gene was replaced by a luciferase reporter gene, which was used to monitor viral expression in yeast colonies [31]. Finally, to identify the genes affecting glycogen storage, the deletion mutant colonies were blotted and stained by iodine vapor; the intensity of coloration allowed assessment of glycogen accumulation [32].

Use of the yeast deletion collection in screens for synthetic lethal mutants and to study drug targets

Synthetic lethality is the phenomenon that occurs when two mutations that are each viable are combined and the double mutant is lethal. A method has been developed for the systematic construction of double mutants, which is called synthetic genetic array analysis (SGA) [33,34]. Haploid strains with mutations in non-essential genes were crossed to an array of the whole haploid deletion collection; the resulting diploid cells were made to sporulate and the lethal combinations identified, indicating the existence of essential interactions between gene products. In a first SGA screen using eight query genes, 291 interactions among 204 genes were identified [33]. Three years later [34], the search was expanded to 132 query genes, and 4,000 interactions were identified among 1,000 genes with roles in cytoskeletal organization, cell-wall biosynthesis, microtubule-based chromosome segregation and DNA metabolism.

A more recent development of this approach is the use of DNA-DNA hybridization protocols to assess lethality, termed 'synthetic lethality analysis by microarray' (SLAM). In this method, a pool of haploid deletion strains is transformed with a cassette that replaces the gene of interest with either a deletion construct or the wild-type form. Transformants are pooled and genomic DNA is isolated; the barcodes are amplified by PCR and labeled with either Cy3 (green) or Cy5 (red) fluorescent dyes; and hybridization to an array containing all the deletion tags allows identification of the synthetic-lethal combinations (which are missing). The SLAM method has been validated by identifying members of the DNA helicase interaction network [35]. Synthetic genetic arrays have also been used for high-resolution genetic mapping of suppressor mutations [36]; this method, termed SGA mapping (SGAM), is in principle also applicable to the analysis of multigenic traits.

The yeast deletion collection has also been used to identify members of the pathways modified by more than 25 different chemical ligands. This approach has identified the L-carnitine transporter Agp2p as a novel transporter of bleomycin in yeast, implicating membrane transport as a key determinant of resistance to this widely used anticancer agent [37]. In another study, the transcription factor Rpn4p was shown to compensate for proteasome inhibition by PS-341, a drug that is being studied as a treatment for cancer [38]. Other ligands tested include the phosphatidyl kinase inhibitor wortmannin [39]; anticancer agents [10,37,40,41]; antifungal agents including nystatin and calcofluor [10,41]; antibiotics including hygromycin B [42] and rapamycin [43]; statins (a class of drugs that lower cholesterol), the smooth muscle relaxant alverine citrate, the local anaesthetic dyclonine, yeast K1 killer toxin [44], 6-azauracil (a drug that depresses cellular nucleoside triphosphate levels [45]) and even the detergent sodium dodecyl sulfate [44].

Major scientific contributions arising from the use of the yeast deletion collection

In a total of 33 publications that have used the deletion collection, in which over 100 different conditions were explored, more then 5,000 genes were attributed a phenotype (see Table ​Table11 for an overview). Many of these genes were not previously known to contribute to the pathways scrutinized, even though in many cases - such as in studies of sporulation, the cell cycle, radiation damage and growth in defined media - there had previously been apparently exhaustive classical mutagenesis or DNA microarray screens. Altogether, these analyses have shown 71 genes previously thought to be non-essential to have a nonviable phenotype [46]; and the systematic deletion approach has been claimed to be five-fold more sensitive then a microarray gene-expression analysis for finding genes involved in pathways of interest [11].

Screens performed so far on the yeast deletion strain collection and transposon-insertion collection

Some new phenotypes have emerged from these analyses, although many known phenotypes have not yet been explored [47]. As an example of recent novel information provided by the systematic-deletion approach, over 100 new genes resulting in a defective bipolar budding pattern [25] have been identified. Other examples are the identification of a new putative bleomycin transporter [37]; the NEJ1 gene involved in non-homologous DNA end-joining and claimed to be one of the guardians of the cancer cells [21]; and 52 genes, the deletion of which causes lethality in yeast transfected with a fragment of heterologous human Huntingtin [23]. The latter set, which includes genes involved in lipid metabolism, vesicle transport and of unknown function, may help to define the still poorly defined pathway termed the unfolded protein response [23].

Uncertainties in the results obtained from systematic deletion mutants

The yeast deletion collection allows powerful novel screens and has revealed many novel components of complex metabolic and developmental pathways. The data produced should be examined with great care, however, as there are several problems with the use of the collection (for further details, see two reviews by Elisabeth Winzeler [46,48]). A first issue is that the collection is incomplete, as a few hundred genes were missed in the first annotation of the genome that was used as the basis of the deletions (see updates of yeast genome data in Saccharomyces genome database (SGD) [49]). Moreover, relative amounts of some deleted strains may have been distributed unequally during the initial creation of the pools and further amplified [4]. Secondly, there can be problems with mutations or variations elsewhere in the genome that affect the phenotype. For instance, because the transformation procedure that was used to create the deletion library is a mutagenic event, mutations in sites some distance away from the intended deletion might be present in some strains [50]. Also, up to 8% of the deleted strains are known to retain a wild-type copy of the targeted gene, presumably because of aneuploidy or some duplication event [50]. Even in diploid strains, mutations that apparently have no phenotype may in fact modify the phenotype of other mutations in subtle ways (a phenomenon termed haploinsufficiency) [51].

When deletions are moved into a different genetic background, their phenotypes may change; for instance, up to 18% of synthetic-lethal interactions were not confirmed in different backgrounds [33,46]. Another source of discrepancy might be genetic variation among the individual isolates of the strains that are amplified under given selective conditions. For instance, the phenotype for a given mutant strain can be masked by extracellular complementation by a protein product of another strain in the pool.

Other factors are related to the use of the barcodes. Their presence may modify the phenotype or, conversely, the stress created by the knockout procedure may mutagenize the barcode. About 17% of the 4,600 homozygous strains in the collection have a hybridization signal similar to that of the background. Moreover, trivial technical factors may affect the hybridization data, for example when the wrong piece of DNA is deposited on the microarray or when the tag modifies the secondary structures and thus causes poor hybridization. Also, some published datasets have too few repetitions of experiments to be statistically valid.

These and other problems concerning the specificity and sensitivity of the screen used must be taken into account when interpreting data obtained using the yeast deletion collection. As a gross estimate, over one third of the data may consist of false positives or false negatives. As a golden rule, therefore, the data from a primary screen of the collection should always be confirmed by independent approaches.

Collections of strains mutated using transposons

Transposons are mobile genetic elements that can be used to disrupt genes in a non-targeted fashion. Insertional mutagenesis using transposons can disrupt non-coding as well as coding regions and can lead to partial loss or gain of function; it is thus complementary to deletion of genes [52]. The 'mini-Mu' transposon modified for insertional mutagenesis in yeast was pioneered by Daignan-Fornier and Bolotin-Fukuhara [53]. Ross-Macdonald et al. [52] used a version of the same transposon including a reporter gene (lacZ), an epitope tag (encoding hemagglutinin) and selectable markers for yeast and bacteria to perform random transposon mutagenesis. The approach uses a minitransposon that was introduced into the yeast genome by a two step method: a yeast genomic DNA library was mutagenized in Escherichia coli using the minitransposon; the library plasmids were then cut to excise the yeast genomic DNA with the transposon insertion, which was transformed into diploid yeast, replacing one chromosomal copy by homologous recombination [52]. This transposon-mutagenesis system can be used for the analysis of gene expression, detection of subcellular localization and identification of phenotypes caused by the insertion, and the transposon can be deleted using the Cre-LoxP recombinase system, removing the selectable marker. The insertion point in the genome can be determined for each mutant by sequencing outwards from the transposon sequence into the surrounding yeast sequences. The article describing the first such screen [52] has been cited 209 times in the five years since its publication [6]. The collection of transposon-inserted strains created in this work [52] is available free of charge from Michael Snyder's laboratory [54].

A database called 'transposon-insertion phenotypes, localization and expression in Saccharomyces cerevisiae' (TRIPLES) [55,56] contains the results obtained from exploiting the mutants obtained by the transposon-insertion approach. A total of 240,768 yeast insertional mutants were screened for transposon-encoded β-galactosidase activity, and a collection of 28,428 yeast strains was generated that produce β-galactosidase under conditions of vegetative growth and/or sporulation. The insertion point of 23,191 of these strains has been sequenced, identifying 3,750 different yeast genes, and the subcellular localization of 2,744 different yeast proteins has been documented (A. Kumar, personal communication).

The transposon-insertion collection [52,54] has been used to identify genes involved in sporulation and vegetative growth. Analysis of the collection has also identified 137 new genes that were not originally annotated by the yeast genome sequencing consortium [57]. Ferreira et al. [58] have used the collection - which was produced in the S288C strain used to sequence the yeast genome [52] - to mutagenize the W303-1A strain (which is known to be more responsive to gene inactivation than the S288C). Ferreira et al. [58] identified genes required for growth in high-salt medium (61 insertions in 31 genes) and survival of hypo-osmotic shock and growth at 15°C (31 insertion in 10 genes). A similar approach was also used by Takahashi et al. [59] to screen in other strains for ethanol tolerance and modification of the cell wall, and by Suzuki et al. [60] to screen for the effects of insertions on the response to nitrogen starvation, on invasive or filamentous growth and on sporulation.

Other approaches for insertional mutagenesis have also been applied to yeast. Firstly, instead of bacterial mini-Mu transposons, T-DNA from Agrobacterium tumefaciens has been used to mutagenize yeast on a genome-wide scale [61]. A second approach is the use of non-homologous insertion of DNA cassettes not derived from mobile elements. This process, which exploits the cellular machinery that normally repairs DNA double-strand breaks by non-homologous end joining, is increased by gamma irradiation [62]. Thirdly, Blanc and Adams [63] used mutations resulting from insertion of the Ty1 transposon to identify yeast mutations that generate evolutionarily significant phenotypes by causing small but positive increments of fitness. Finally, an application of transposon mutagenesis called direct allele replacement technology (DART) allows rapid transfer of any insertion allele into any strain [64]. A transposon library consisting of a collection of plasmids containing yeast genomic DNA with transposon insertions is sequenced to identify the exact insertion point in the yeast genomic DNA [64]. After excision from the plasmid, the yeast genomic DNA containing the transposon is used to transform a yeast strain of choice by homologous recombination. The procedure was validated by identification of 29 insertions into 17 genes involved in apical growth [64].

Insertional mutagenesis using transposons has several potential advantages over targeted deletion: for instance, insertion occurs in the non-coding as well as in the coding segments, so regulatory regions and other non-genie regions can be disrupted. Moreover, depending on the site of insertion, transposon mutagenesis may lead to partial loss of function or gain of function and hence to the identification of novel functions that would not be found from studies of complete knockouts of genes [52]. Conditional alleles may be generated, as well as mutants in promoter or terminal regions. Also, apart from phenotype analysis of the mutation, the level of expression of the targeted gene can be measured in vivo and the subcellular localization of its product can be determined. A disadvantage, however, is that transposon insertions are not random, and this method may therefore never cover all the genes in the genome. Also, several of the problems that were mentioned above for the analysis of the yeast deletion collection apply equally to the transposon mutant collections, including the observation that phenotypes are often very strongly background-dependent.

Other genome-wide mutant collections and their uses

Several databases have been developed to catalog the subcellular localizations of yeast proteins as identified by fluorescence microscopy. The yeast protein localization database [65,66] describes the results obtained using a library of yeast genes fused to a green fluorescent protein (GFP) reporter. The TRIPLES database [55,56] includes the use of the transposon-insertion libraries to determine protein localization for 5,504 insertions. The yeast GFP fusion localization database [67,68] presents the localization of 4,156 proteins into 22 distinct subcellular locations, as determined using a library of GFP-tagged proteins compared with reference strains expressing proteins of known localization tagged with red fluorescent protein (the strains used are available from Invitrogen [9]).

In a recent study [69], an exhaustive global analysis of protein expression in yeast was reported. Each open reading frame was marked by an insertion cassette consisting of a modified version of the tandem affinity purification (TAP) tag, a yeast selectable marker to drive homologous recombination, and regions homologous to yeast genes. The level of expression of each protein was determined by very sensitive western-blotting analysis that could detect less than 50 protein molecules per cell. The level of expression of 4,251 gene products was identified in exponential growth conditions [69].

Like the sequencing of the yeast genome, the construction of the single and double deletion libraries and that of the major transposon insertion library has required a considerable investment by the international yeast research community. The three original papers [4,33,52] presenting the libraries have a total of 83 authors, funded by public agencies from the USA, from the European Union and from Canada. Was this effort worthwhile?

Over the last five years, the three seminal papers have been cited more than 950 times [6]. We have found over 50 experimental papers reporting on different phenotypic screens carried out with these genome-wide libraries. More than 100 experimental conditions have been tested, representing near a million individual mutant screens. We estimate that more than 5,000 novel phenotypic traits have been assigned to yeast genes of known or unknown molecular functions. This undoubtedly represents a considerable amount of progress towards the ultimate goal of a full description of the functions and interactions of all the molecular components of a basic eukaryotic cell. Several recent improvements of the use of the original genome-wide mutant libraries have been reported, such as the DART [64] and SLAM [35] approaches described above. New genome-wide libraries using more sensitive protein tagging are being developed [69] for global measurements of protein levels or subcellular expression. In all this work, yeast continues to be a very convenient test-bed for the development of novel high-throughput tools that rely on the availability of the complete genome sequence.

But a word of caution is appropriate. As was the case for other high-throughput tools that were also pioneered using yeast, such as two-hybrid protein-interaction screening, DNA-hybridization microarrays and proteomic analyses, the information provided by large screens of genome-wide libraries contains an appreciable number of false-positive and false-negative data points. It is therefore essential to confirm each result by several independent approaches. Ultimately, we will have to return to 'reductionist' biochemical approaches to demonstrate fully the molecular function suggested by a large primary screen.

Finally, there is an urgent need to build databases to collect and organize all the data obtained from all the large screening approaches. The SGD [49] and the Munich information center for protein sequences (MIPS) database [70] are making good progress in this respect. The authors of these databases should, however, be encouraged to further develop procedures that take into account the difficult assessment of the uncertainties associated with much of the data. Here again, yeast is expected to become a pioneer.