Abstract

Large-scale projects are providing rapid global access to a wealth of mouse genetic
resources to help discover disease genes and to manipulate their function.

Review

It is a little known fact that Gregor Mendel, remembered for his studies of trait
inheritance in pea plants, also experimented with breeding mice to understand coat
color traits. Had it not been for the disapproval of Bishop Anton Ernst Schaffgotsch,
who led the Augustinian monastery where Mendel studied, he might well have been credited
as the father of mouse genetics [1]. Instead, CC Little started generating inbred lines of mice half a century later,
driven by a desire to understand cancer biology and recognizing the importance of
reproducible genetic crosses [1]. From these beginnings more than 300 strains of laboratory mice have been developed;
each line has been faithfully replicated and cryo-preserved, making them a renewable
genetic resource. Most are the result of the blending together of Mus musculus sub-species, including domesticus and musculus, with some contribution of castaneous and molossinus, resulting in a distinctive genetic mosaic of these progenitors in each inbred line
[2].

Today's geneticists usually turn to one of these inbred mouse strains when attempting
to model human disease because mice offer advantages that few species can match. Importantly,
the mouse genome can be easily manipulated with greater speed, scale and sophistication
than that of other mammals, and the efforts of the International Mouse Genome Sequencing
Consortium has resulted in a high quality reference genome sequence that is the envy
of other model organism users [3]. The future for mouse genetics promises to be even more exciting now that high-throughput
sequencing of mouse strain genomes has started, and efforts are under way to systematically
disrupt every gene in the mouse genome and phenotype the resulting mutant animals
[4]. Here, we outline the tools and technologies that have emerged for using mice to
discover and characterize disease genes, and the resources that are being developed
to accelerate these discoveries.

Sequencing mouse genomes

In 2002 the International Mouse Genome Sequencing Consortium released the first draft
of the genome from C57BL/6J, an inbred strain of the laboratory mouse [3], and a finished genome was released in 2009 [5]. As one of the most globally used lines, C57BL/6J was a wise choice for the reference
mouse strain, but it is by no means the only strain used in research. Therefore, subsequent
efforts were initiated to generate genomic sequence of other inbred strains. Firstly,
four different strains of the laboratory mouse were included by Celera in a whole-genomic
shotgun sequencing project: A/J, DBA/2J, 129X1/SvJ and 129S1/SvImJ [6]. This resulted in 27.4 million sequencing reads, giving a total of 5.3x coverage
of the mouse genome. Secondly, more than 150,000 short insert clones were sequenced
from the 129S5SvEvBrd strain covering 4.7% of the reference genome [7]. Thirdly, Perlegen Sciences used hybridization to re-sequence 15 inbred mouse strains
[8]; this set included 11 classical strains and four strains derived from the wild. Unlike
the other resources, Perlegen's approach did not generate sequence reads, and their
hybridization sequencing technology queried only 1.49 Gigabases of the reference genome
(equivalent to about 58% of the C57BL/6J sequence that is non-repetitive). Furthermore,
to generate high accuracy calls, high stringency cutoffs were used, resulting in a
false negative rate estimated to be as high as 50% [2]. Therefore, available sequence data lacked the coverage and breadth of strains to
make it a widely used resource.

The first non-reference mouse chromosomes to be sequenced were A/J and CAST/EiJ chromosome
17, revealing significant variation at the nucleotide level and also considerable
structural differences [9]. Building on that work, we commenced the Mouse Genomes Project, which has sequenced
the genomes of 17 key mouse strains using next-generation sequencing on the Illumina
platform (Box 1). At the last data freeze in December 2009 an average of 25x sequence
coverage of each strain had been generated, and a deep catalog of variants [10]. These data provide a comprehensive insight into the genomes of the 17 strains, allowing
immediate access to background genetic information for most mouse models of disease
in addition to facilitating the analysis of the molecular basis of complex traits
with unparalleled resolution.

Genetic manipulation of mice in the post-genomic era

Technologies for modifying the mouse genome can be split into two broad classes: those
for gene-driven analyses and those for random mutagenesis.

The collection and propagation of mice harboring spontaneous mutations with striking
phenotypes, such as the obese mouse, served mouse geneticists well for most of the 20th century. When it became
clear that the rate of random germline mutation can be significantly increased by
exposure to radiation or to chemical mutagens such as N-ethyl-N-nitrosourea (ENU) [11], large-scale mutagenesis programs followed, resulting in an explosion in the number
of mutant lines. Phenotypic screens of these lines led to the identification of many
hundreds of new mutations and candidate disease genes [12,13]. One notable example of a successful forward genetic screen, reviewed in [14], identified 89 ENU-induced mutants that influence the immune system, of which at
least 69 have now been characterized at the molecular level. However, mapping random
mutations and identifying the affected gene can be an arduous process, often taking
years; therefore, causal mutations for only a fraction of mutant lines have been identified
thus far. Screening DNA from archived mutagenized lines for mutations in a specific
gene of interest is a parallel 'gene driven' strategy that has proven successful [15,16]. With the advent of new sequencing technologies it is now cost effective to sequence
mutagenized mouse exomes in their entirety, enabling the rapid identification of candidate
disease genes from existing resources and meaning that mutagenesis-driven approaches
may return as a powerful tool for studying disease genes. Other methods of random
mutagenesis include retroviruses, transposons (reviewed in [17]), and 'gene traps' [18]. These DNA-based mutagens can be easily mapped using approaches such as splinkerette
PCR [19] and are discussed in more detail below.

The genome of the mouse can also be manipulated by pronuclear injection of DNA into
oocytes or by modification of embryonic stem (ES) cells, which can then be injected
into blastocysts to make chimeras, allowing modified alleles to be transmitted through
the germline. Direct pronuclear injection results in random integration of the injected
DNA [20]; consequently, transgene copy numbers and integration sites differ between lines,
potentially resulting in very different phenotypes. Large genomic fragments such as
bacterial artificial chromosomes (BACs) may also be injected (reviewed in [21]); these have proven particularly useful in complementation studies or 'rescue' experiments
for identifying genes contributing to a genetically mapped disease trait of interest
[22]. By contrast, DNA introduced into ES cells in culture can undergo site-specific,
homology-directed recombination [23], thus enabling the generation of targeted gain- and loss-of-function alleles as well
as the engineering of large-scale rearrangements of entire mouse chromosomes (Figure
1) [24,25]. Other recently developed techniques include transgenic small hairpin RNAs (shRNAs),
which are often delivered by lentiviral transgenesis [26,27], single-stranded oligonucleotides (ssODNs; reviewed in [28]), and zinc-finger nucleases (ZFNs) [29], which can be used to generate subtle sequence-specific genomic modifications. Here
we will address in more detail a few of these technologies, focusing on recent advances
uniquely available to mouse geneticists.

Figure 1.Gene targeting strategies used in mouse ES cells. Targeting is achieved by recombination (black crosses) between homology arms (red
lines). (a) A knockout vector replaces an entire gene with a selection cassette containing drug
resistance (DR), enabling the selection of successfully targeted ES cell clones. (b) A knock-in vector allows the expression of a transgene, such as LacZ or Cre, by the promoter (gray arrow) of the targeted gene. (c) Insertion vectors can interfere with splicing by disrupting a target gene by the introduction
of an exon with an early termination codon or a 5' splice acceptor site (SA). They
typically target the genome with a single crossover event. (d) A conditional allele with directional DNA sequences (LoxP, green triangles) either side of a critical exon. Recombination between the sites
will result in a null allele. (e) LoxP sites can also be targeted megabases apart, either side of a larger cluster of genes,
enabling chromosome engineering. (f) Heterospecific Lox sites, such as LoxP and Lox511, are targeted by the site-specific recombinase Cre. Recombinase-mediated cassette
exchange (RMCE) enables the efficient swapping of one targeted cassette containing
incompatible target sites for another cassette flanked by an identical pair of sites.
This enables the rapid generation of new alleles, such as introducing a point mutation
in a critical exon.

ES cell gene-targeting

ES cell technology has been a profound advance in mouse genetics (detailed in [30]). Historically, the majority of manipulations have been performed in ES cells derived
from 129 sub-strains (Table 1). Recently, robust and highly germline-competent ES cells derived from the popular
C57BL/6 strains have been developed, such as JM8 and C2 (Table 1). To assist in tracking the contribution of these ES cells to chimerism, and to identify
mice that have transmitted their genome through the germline, a dominant Agouti (yellow) coat color allele was engineered in JM8 cells [31]. This now enables the study of mutant alleles on a common, controlled genetic background
without the need for generations of backcrossing.

Gene targeting in mouse ES cells can be achieved by homologous recombination, using
replacement, insertion or knock-in vectors, all of which contain a region of homology
with the locus to be targeted. In replacement vectors, crucial exons (or entire genes)
are replaced by a selection cassette to generate a null knockout allele (Figure 1a). Knock-in vectors are designed such that a transgene or reporter is transcriptionally
regulated by the endogenous promoter of the locus (Figure 1b; reviewed in [32]). By contrast, insertion vectors rely on gene rearrangement by interfering with splicing
to disrupt a target gene (Figure 1c). Significant resources are available for obtaining suitable genomic DNA for targeting
vector construction, including genome-wide end-sequenced BAC libraries for C57BL/6J-derived
[33] and 129-derived strains [7]. Homology arms (the part of the vector that aligns with the genome to facilitate
recombination) were typically generated by restriction digest of large DNA fragments
or by PCR amplification, but increasingly 'recombineering' technologies are being
used [34], which make it possible to engineer virtually any mutation into the mouse genome
with base pair resolution. In addition, customized targeting vectors can be generated
on a contract basis by several companies.

Gene modification with conditions

Conditional gene modification is used to enable spatial and/or temporal control over
the modification of the gene of interest. To this end, site-specific recombinase (SSR)
systems are used, including Cre-LoxP, Flp-FRT, φC31 integrase-attB/attP and most recently Dre-rox [35]. For a comprehensive review of the use of site-specific recombinases for manipulation
of the mouse genome, see [36]. The DNA sequences that the SSRs recognize are typically directional and can either
flank the target DNA for excision from the genome or be used to invert segments of
DNA. SSRs can be used for the generation of single gene knockouts or rearrangements,
and for chromosome engineering on a megabase scale (Figure 1d,e) [37,38].

SSRs can be expressed from endogenous promoters (as shown in Figure 1b) and in a tissue- or cell-specific manner. This is particularly useful when studying
the organ-specific function of genes that are widely expressed and essential for embryonic
development. For example, a conditional allele of Sox9, a gene implicated in campomelic dysplasia in humans, is necessary to study its function
in cartilage in mice because germline deletion of Sox9 results in perinatal lethality [39]. For somatic mutagenesis, inducible gene-modification systems may be used. These
systems allow temporal 'inducible' control of SSR expression. There are several inducible
expression systems available, including tetracycline [40], LacZ [41], and the tamoxifen-inducible systems [42]. These systems have been invaluable in studying genes and neural circuits involved
in learning and memory, by turning genes and cellular markers 'on' or 'off' during
controlled time periods (reviewed in [43]), and in a range of other biological systems.

There are now over 500 tissue- or cell-specific Cre recombinase mice (some of which
are inducible) documented in databases such as Cre-Zoo and Cre-X-mice (Table 2) [44]. However, as conditional modification technologies become increasingly sophisticated,
the potential for non-specific effects, from mis-regulation of the targeted gene to
incomplete recombination by the SSR, must remain a consideration [45,46]. For example, a recent study highlighted the potential for protein expression from
episomal products of Cre recombinase-excised genes, particularly when deletion occurs
in cells that have a low population turnover [47].

Recombinase-mediated cassette exchange

Using homologous recombination to introduce genetic material into a desired genetic
location in the mouse genome is not always straightforward. The efficiency is often
dependent on the nature of the genomic target site and on the design of the targeting
vector. Therefore, the ability to efficiently introduce secondary modifications to
already successfully targeted cassettes is advantageous. Recombinase-mediated cassette
exchange (RMCE) is a process in which site-specific recombinases exchange one gene
cassette, flanked by a pair of incompatible target sites, for another cassette flanked
by an identical pair of sites (Figure 1f) [48]. Apart from the naturally occurring heterotypic SSR sites (attB and attP for φC31), several variant sites have been developed for Cre and Flp, providing the
required heterospecificity crucial for RMCE (for example, LoxP/Lox511 and FRT/FRT3; see [49] for a complete list). In RMCE, typically one cassette is present in the host genome,
whereas the other cassette (and the recombinase) is introduced into the host ES cell
by electroporation, chemical-mediated or adenoviral-mediated gene transfer [50]. Transient expression of the recombinase will direct integration of the SSR site-flanked
cassette, which can then be selected by drug resistance. RMCE-based techniques are
proving to be useful in the rapid production of custom allelic series [51]: they have recently been used to compare the impact of different tumor-associated
mutations in p53 [52], and to study the effect of multiple enhancer elements on the expression of a targeted
cassette [53].

Transposons for mutagenesis

Unlike most of the methods described so far, which allow manipulation of the genome
with base pair precision, transposable elements provide the power to molecularly tag,
and therefore rapidly map, random mutagenic events. The application of transposons
to the field of mouse genetics has become possible only in the past decade. So far,
four distinct DNA transposons have been shown to function in mice: Tol2, Minos, Sleeping Beauty (SB) and PiggyBac (PB) (reviewed in [17]), with the latter two being the most widely used. DNA transposons use a 'cut-and-paste'
transposition mechanism. When both the transposase enzyme and a transposon vector
are present in the same nucleus, the transposase can mediate excision of the transposon
from the donor site and integration into another target site in the host cell genome.
RNA-mediated transposition, driven by a 'copy-and-paste' mechanism, has also been
introduced into mice for mutagenesis [54].

Transposons can be used for germline mutagenesis in mice (reviewed in [55]). However, this technique is inefficient for genome-wide forward genetic screens,
owing to the low rate of transposition (one to three de novo insertions per gamete) and the tendency for local hopping exhibited by most of the
transposons; though some researchers have taken advantage of this observation to saturate
smaller genomic regions [56,57]. So far, the most common use for transposons has been in the field of cancer genetics
[17]. Retroviral insertional mutagenesis has traditionally been used to study the genetics
of hematopoietic and mammary cancers (Box 2), but the study of other tumor types has
been limited by viral tropism. Initial studies demonstrating the validity of transposon-mediated
insertional mutagenesis (using SB) identified both known and novel cancer genes involved in sarcoma and lymphoma [58,59]. Since then, transposons have been engineered to produce gain-of-function mutations
in epithelial cells resulting in the development of a wide variety of carcinomas [60]. In addition, Cre-inducible SB transposase alleles can restrict mutagenesis to specific tissues, permitting studies
into colorectal cancer and hepatocellular carcinoma [61,62]. More recently, PB has been used for somatic mutagenesis, representing another tool for cancer gene discovery
in the mouse [63].

Transposons can also be used to generate transgenic mice by loading them with genetic
cargo. SB, PB and Tol2 are all efficient in delivering large transgenes, up to 70 kb in size [64]. PB has also been used together with SSR technology to generate large-scale rearrangements
of the mouse genome, including duplications, deletions, and translocations [65]. Recently, transposons have been used to deliver the reprogramming factors required
for generating induced pluripotent stem (iPS) cells [66,67].

Gene trap mutagenesis

Gene trapping in mouse ES cells is an efficient method for mutagenesis of the mammalian
genome. Insertion of a gene trap vector can disrupt gene function and/or report gene
expression, and because these vectors integrate into the genome they provide a convenient
tag that facilitates the identification of their insertion site. A typical strategy
involves electroporating into ES cells a vector containing a 5' splice acceptor that
splices to the upstream exon of the trapped gene, and thus the endogenous promoter
of the trapped gene is used to drive the expression of the reporter gene [18]. However, the vector can also be delivered by retroviral infection, or transposon-mediated
insertion and identification of the trap insertion sites in the resultant ES cell
clones performed by splinkerette PCR (detailed in [68]).

Recent developments in trapping technology involve the use of 'conditional traps',
which enable the induced modification of trap alleles, in vitro or in vivo, using SSRs, and using RMCE to exchange trapped vectors with other functional cassettes
[69]. Gene trapping strategies have also been successfully developed to screen for genes
that have specific expression patterns ('enhancer traps' [70]) or are acting in specific biological pathways ('induction trapping' [71,72]). Another approach to direct gene trapping toward genes in a specific pathway is
to perform a phenotypic screen in ES cells. However, most insertions will cause heterozygous
mutations (which will generate detectable phenotypes only for haploinsufficient genes).
One strategy to overcome this has been to use ES cells that have a deficiency in the
Bloom (Blm) DNA helicase. These cells show high levels of mitotic recombination, which facilitates
the generation of homozygosity in cell lines from colonies carrying heterozygous mutations
[73].

'ES cell-driven' mouse production

Another advantage that the mouse has over other model organisms is in the rapid generation
of mutant mice using ES cell-driven approaches. These enable the production of mice
that are entirely, or almost entirely, derived from ES cells without the requirement
for germline transmission. These approaches involve the injection of ES cells into
eight cell embryos or a process called tetraploid complementation and allow the generation
of mutant mice in weeks rather than months [74,75]. By combining these approaches with shRNA-mediated knockdown, several groups have
shown that it is possible to rapidly generate knockdown mice for the analysis of somatic
gene function [76,77]. Mice somatically overexpressing genes in an inducible and regulated way have also
been developed using these approaches [78].

Mouse genetics on a grand scale

The success of the genome sequencing consortia over the past two decades established
a model for further large-scale, collaborative projects aimed at functionally characterizing
genomes (Table 2). Examples include the International Knockout Mouse Consortium (IKMC), and its constituent
regional projects, which collectively aim to generate mutant alleles for every protein-coding
gene in the mouse genome and to make the resources available to the scientific community
[4]. Researchers can now search the IKMC website and acquire, at minimal cost, mice or
ES cells that lack a gene of interest [79], thereby accelerating the path from a gene of interest to mutant mouse line. By May
2011, IKMC had over 16,000 ES cell lines with mutations in protein coding genes. Many
of these alleles are 'knockout first' alleles, which are designed to introduce a LacZ expression marker into a target gene, and the allele can be tailored by using Cre
and Flp to generate null and conditional alleles, respectively [80].

In parallel, a number of past and ongoing standardized phenotyping projects have documented
traits in inbred strains and mutant lines for phenotypes relevant to human disease,
including the Mouse Phenome Project, the European Mouse Disease Clinic (EUMODIC) and
the Mouse Genetics Project (MGP; based at the Wellcome Trust Sanger Institute); see
also Table 2[81-84]. The results from many screens are made available online, enabling researchers to
identify potentially interesting phenotypes for detailed analysis (Figure 2). For example, primary MGP analysis of mice lacking the gene Slx4 identified a number of developmental and DNA instability phenotypes. Detailed secondary
analysis revealed the mouse to phenocopy a new sub-type of the human genetic illness,
Fanconi anemia [85-87].

In an effort to identify quantitative trait loci (QTLs), large stocks of genetically
heterogeneous (HS) mice have been generated [88,89]. Individual mice have been phenotyped and genotyped to facilitate high-precision
QTL mapping. The Collaborative Cross (CC) is a resource that is using a similar strategy
by interbreeding eight strains of mice to generate around 300 new inbred lines [90], which, unlike HS mice, are being cryopreserved for posterity. It is estimated that
the CC will capture approximately 90% of the genetic variability in laboratory mice
and will allow the mapping of genetic networks that underlie complex diseases. Moreover,
the progenitor strains of the CC were selected for sequencing in the Mouse Genomes
Project (Box 1), which should allow the QTLs identified by phenotyping CC mice to
be rapidly resolved into a list of candidate variants. When complete, the CC will
mark a new era in the discovery of the molecular basis of complex traits in the mouse.
Meanwhile, large-scale phenotyping of the strains developed so far is well under way.
Finally, EuTRACC is a project to generate ES cells that carry a targeted tandem affinity
purification tag (TAP-tag). Initially this will be several hundred transcription factor
genes, but this is an effort that is likely to extend genome-wide. This resource will
facilitate mass spectrometry of native protein complexes to better understand the
mouse 'interactome' [91].

Towards the future

Mouse genetics has a bright future. Genome-wide association studies have identified
hundreds of alleles statistically associated with human disease, which now demand
detailed functional analysis. Early examples suggest the mouse will be the ideal model
in moving from genetic association studies to understanding molecular mechanisms leading
to complex disease [92]. The ablation of a large proportion of the coding mouse genome within the next 5
years, at least in ES cells, should rapidly accelerate these studies.

The modular design of modern gene targeting cassettes, together with SSRs and RMCE,
makes for an incredibly flexible system of genetic engineering in mice. This is establishing
the mouse as a leading model in scientific disciplines previously dominated by work
in simpler organisms. For example, gene targeting combined with channelrhodopsin,
which allows the control of neural activation using light [93], allows the visualization, and fine manipulation, of precise neural circuits in the
mammalian brain that was until recently only possible in Drosophila and C. elegans [94,95].

However, there are also challenges ahead. A significant number of regions across the
mouse genome, typically those containing clusters of highly homologous, tightly arrayed
genes, are not amenable to efficient gene targeting. Moreover, the same loci are often
difficult to sequence, with some lacking complete coverage even in the high quality
reference genome [96]. Thus, as much as 5 to 10% of the functional mouse genome may fall through the cracks
of the present large-scale projects unless new technologies, or clever combinations
of current technologies, are developed and used to investigate these genes. Nevertheless,
the mouse is likely to remain the non-human vertebrate with the most sequenced, and
best studied, genome for the foreseeable future. Together, the advances described
here will underpin an understanding of mouse genetics within the current decade unthinkable
to CC Little when he first began generating inbred lines over a century ago [1].

Competing interests

The authors declare that they have no competing interests.

Box 1: A genome for all reasons

The 17 strains being sequenced as part of the Mouse Genomes Project were carefully
selected to support other major mouse genetics resources. Three 129 strains were chosen
because they serve as the background for thousands of existing gene knock-outs. The
C57BL/6N strain is the origin of the highly germline-competent JM8 ES cells that are
being used in large-scale gene targeting programs [31]. Nine common lab strains were chosen because of their historical utility, and also
because they include the progenitors of the heterogeneous stock and Collaborative
Cross mice that are used in dissecting complex traits [88,89]. Finally, four wild-derived strains have been sequenced because they represent some
of the founder sub-species of many inbred laboratory lines, and are also important
models of cancer and infection resistance [2].

Box 2: Exploiting viruses in mouse genetics

The first transgenic mice were generated by infecting embryos with viruses [97], and today viral vectors remain an integral part of the mouse genetics toolkit. Lentiviruses
integrate their genome into the host's DNA, making them an effective transgene delivery
vector. The lentiviral genome, derived from immunodeficiency viruses, has been deconstructed
and distributed across multiple plasmids to minimize the potential formation of replication-competent
viruses [98]. A transgene of interest may be included in a plasmid containing a viral packaging
signal. This is co-transfected into a cell line (typically human embryonic kidney
HEK293T cells) with other plasmids expressing proteins required for viral production,
such as envelope proteins. Viruses produced in this way can be introduced into oocytes
for transgenesis (reviewed in [99]). In its simplest form, this method necessitates only a few weeks between target
selection and phenotypic analysis, offering a distinct advantage over other approaches.
To enable pooled loss-of-function screens to identify complex genetic interactions,
lentiviral short hairpin RNA (shRNA) libraries targeting most mouse genes have been
generated [27]. Some groups have recently used ultrasound-guided microinjections of lentiviruses
to deliver genes to organs and tissues of early mammalian embryos in utero [100].

Slow transforming retroviruses have been widely used to generate mouse models of cancer
[101]. They can re-infect the same cell, randomly inserting their genome into the host
DNA multiple times, resulting in an accumulation of mutations. This process of progressive
mutagenesis recapitulates the multi-step progression of human tumorigenesis (reviewed
in [102]). The development of next-generation sequencing technologies has dramatically enhanced
the process of identifying retroviral insertion sites, and databases, such as the
Retroviral Tagged Cancer Gene Database, have been developed to map insertion sites
to the reference genome [103].

Acknowledgements

We thank the Sanger Mouse Genetics Project for generating, managing and phenotyping
the mice shown in Figure 2.

References

Paigen K: One hundred years of mouse genetics: an intellectual history. I. The classical period
(1902-1980).