Abstract

The International Crocodilian Genomes Working Group (ICGWG) will sequence and assemble
the American alligator (Alligator mississippiensis), saltwater crocodile (Crocodylus porosus) and Indian gharial (Gavialis gangeticus) genomes. The status of these projects and our planned analyses are described.

Keywords:

Genomics; evolution; Crocodylia; Archosauria; amniote

The importance of reptilian genomics

The study of reptilian genomes is essential if we are to understand the patterns of
genomic evolution across amniotes (mammals, birds and non-avian reptiles). Non-avian
reptiles differ from mammals and birds in several ways: they have diverse sex-determining
systems, are exothermic ('cold blooded') and have extreme physiology. Non-avian reptiles
are divided into four extant orders: Crocodylia (crocodiles and alligators; approximately
25 species), Sphenodontia (tuatara; two species), Squamata (lizards and snakes; approximately
7,900 species) and Testudines (turtles; approximately 300 species). The clade's most
recent common ancestor is thought to have lived around 275 million years ago (Mya)
[1], and birds (class Aves) are nested within reptiles (class Reptilia) (Figure 1). Although they are more diverse than birds and mammals, non-avian reptiles have
not been a major focus of genome sequencing efforts [2,3]. The green anole (Anolis carolinensis) is the only non-avian reptilian genome sequence published to date [4]. There are, however, ongoing initiatives to sequence the genomes of the painted turtle
(Chrysemys picta; (see NHGRI Genome Sequencing Proposals [5], the garter snake (Thamnophis sirtalis [6]), the king cobra (Ophiophagus hannah; M.K. Richardson, personal communication and the Burmese python (Python molurus bivittatus [7]). Although these projects will provide considerable insight into the evolution of
both reptilian and amniote genomes, they only begin to address the diversity represented
within reptiles, and do not include any crocodilians.

Figure 1.Amniote phylogeny emphasizing the crocodilians. The geographic ranges of the three crocodilians of interest are shown, along with
approximate times of divergence of each group based upon the Timetree of Life [1]. On the basis of the fossil record, the origins of dinosaurs and birds were Triassic
and upper Jurassic, respectively [86], and birds from within dinosaurs [86,87]. The phylogenetic position of turtles is unclear [2,88,89], however for simplicity we chose the consensus estimated position and divergence
time presented in the Timetree of Life [1]. The photos of the American alligator (Alligator mississippiensis), the saltwater crocodile (Crocodylus porosus) were kindly provided by Louis Guillette and the photo of the Indian gharial (Gavialis gangeticus) was provided by Alan Wolf. Mya; million years ago.

Order Crocodylia is a key group within Reptilia and genome drafts from crocodilians
would provide insights into ancestral reptilian and amniote genomes. These genome
assemblies will also enable more detailed inferences on the evolution of three additional
lineages of substantial interest to vertebrate biologists: dinosaurs, pterosaurs and
birds. Crocodilians and birds are the only extant members of Archosauria (a clade
that also includes dinosaurs and pterosaurs along with several extinct lineages) [8]. Among archosaurs, only the genomes of chicken (Gallus gallus [9]), turkey (Meleagris gallopavo [10]) and zebra finch (Taeniopygia guttata [11]) have been sequenced, although several additional avian genomes, such as the Mallard
duck (Anas platyrhynchos [12], budgerigar (Melopsittacus undulatus, a type of parrot) and a set of other avian taxa [13] are currently underway [14]. Crocodilians are the best extant outgroup for comparative analysis of avian genomes,
and, as such, would substantially enhance analyses of the large set of bird genomes
that are expected to be available shortly. Avian and crocodilian genomes provide the
best hope for elucidating the gene and genomic properties of dinosaurs and other extinct
archosaurs, about which we have learned surprising amounts (for example, genome size
and limited protein sequences) considering we have no access to the DNA of these organisms
[15-19]. In the broadest sense, Crocodylia represent an important vertebrate clade, and their
genomes hold information that will illuminate the underlying relationships among all
amniotes. In addition, crocodilians present several interesting biological questions
that can be approached from a genomic perspective, many of these will be discussed
below.

Background on crocodilians and project justification

The order Crocodylia, which typically refers to the clade that includes the extant
crocodilians [20], is an ecologically successful group of reptiles that originated in the mid- to upper-Cretaceous
period (approximately100 Mya) [21,22]. Crocodilians are apex predators in the marine and freshwater habitats where they
reside. They play a major role in warm-water ecosystems throughout the world. Extant
crocodilians are members of a larger group, termed the Crocodylomorpha, that appeared
in the fossil record by the upper Triassic (about 200-250 Mya) [8,1], a date coincident with molecular estimates of the avian-crocodilian divergence [2,22,23]. Crocodylia is divided into three families with extant members, Alligatoridae (alligators
and caimans), Crocodylidae (crocodiles) and Gavialidae (gharials) [21,23]; the Gavialidae are traditionally thought to be the outgroup of a clade comprising
Alligatoridae and Crocodylidae [21]. However, recent phylogenetic analyses of both molecular data [22,24] and combined molecular and morphological data [25] support a closer relationship between Crocodylidae and Gavialidae (Figure 1).

Crocodilians have been a part of the human narrative for centuries, appearing in modern
popular culture (for example, the wildlife documentary series The Crocodile Hunter), scientific documentaries, as ancient mummies and in cave paintings. They are prized
for their hides and meat, and some species, such as the American alligator, the Nile
crocodile (Crocodylus niloticus) and the saltwater crocodile, are ranched (that is their eggs are brought in from
the wild) and/or farmed (in which captive breeding stock produce the eggs). Globally,
crocodilians are a source of trade worth more than $US500 million [26]. However, crocodilians likely have their most profound economic impact as tourist
attractions [27,28]. Thoughtful ecotourism could be the best hope for saving endangered crocodilians,
such as the critically endangered gharial, from extinction and their habitats from
destruction.

Given their popularity, their status as the sister group of dinosaurs, and their inherent
public fascination, efforts focused on crocodilian genomics are ideally suited for
education and outreach focused on evolution and comparative genomics. Indeed, the
preliminary data from our efforts has been used in a pilot genomics course at the
University of Florida that integrates with undergraduate research. The consortium
plans to make material for genomics pedagogy and public outreach available in parallel
with the release of the genome assemblies.

In addition to their ecological, sociological and economic significance, crocodilians
have genomes that will be useful sources of data for biological and biomedical research.
Alligator serum has been shown to contain broad spectrum antibiotic peptides [29-32]. The American alligator has been used extensively as a model for examining the environmental
impact of various contaminants, including endocrine disrupting xenobiotics [33-36]. Crocodilians represent important research organisms for diverse fields that include
evolution and phylogenetics [25,37-39], functional morphology [37,40], osmoregulation [37], sex determination [41-45], hybridization [46-48] and population genetics [49-51]. To provide the genomic resources necessary to expand our understanding of these
fascinating organisms, the ICGWG is obtaining and assembling genome sequences for
the American alligator, saltwater crocodile, and gharial, one representative from
each of the extant crocodilian families. For further information about the project
and preliminary assemblies, see Ref. [52].

Properties of crocodilian genomes and available genomic resources

Short of whole genome sequencing, much work has been done on crocodilian genomes,
especially the American alligator and Australian saltwater crocodile. The genome of
the American alligator is approximately 2.5 gigabases [53] comprising 16 pairs of chromosomes [54,55]. The genome size of the saltwater crocodile is around 2.78 gigabases [56] with 17 pairs of chromosomes [54,57]. The genome size of the gharial is currently unknown, although it is likely to be
approximately 2-3 gigabases, given the genome sizes of other crocodilians. Like the
American alligator, the gharial has 16 chromosomes [54]. Unlike organisms with genetic sex-determination systems, crocodilians are not thought
to have sex chromosomes [54]. Instead sex is determined by incubation temperature of the egg [42]. Although microchromosomes are common among other reptiles (including birds), and
there is striking variation in chromosome sizes within crocodilians, the smallest
crocodilian chromosomes are not generally regarded as small enough to be classified
as microchromosomes [54,58,57,55].

As in birds, the most common transposable elements (TEs) in crocodilian genomes are
Long INterspersed Elements (LINEs) of the chicken repeat 1 (CR1) family [59]. Earlier studies indicated that the majority of CR1 LINEs in crocodilians are fairly
short (typically < 2 kbp [59]). Indeed, our efforts to identify novel repeats in preliminary saltwater crocodile
and American alligator genome assemblies show that the most abundant repeats in the
current assemblies are less than 1 kbp (Figure 2). The observation that this relatively well-characterized and short class of TE insertions
is the predominant family of repeats in crocodilians suggest that assembling the genomes
of these organisms will be a manageable project, compared with a typical repeat-rich
mammalian genome that contains a greater proportion of longer repetitive elements.

Figure 2.The distribution of repeats of different length in the alligator and crocodile assemblies. Overlaid are some of the library insert size or fragment sizes we have made for
the various assemblies. Note however that the current crocodile assembly in this figure
does not include the 454 data.

Libraries of bacterial artificial chromosomes (BACs) are available for all three species
of interest and these will be used for each genome project. The American alligator
BAC library currently has about 10× clone coverage [60], the saltwater crocodile library has approximately 3.7× clone coverage [56] and the gharial library has about 5.7× clone coverage, assuming it is a 2.7 gigabase
genome (X. Shan, unpublished data). Several large-scale nucleotide datasets have been
collected for the American alligator, including 21 assembled BAC sequences completed
through the NISC Comparative Sequencing Initiative [61], and 3,276 Sanger BAC-end reads [59]. A linkage map based on microsatellite loci [62] for the saltwater crocodile is also available. Additionally some saltwater crocodile
microsatellite loci have been mapped by fluorescence in situ hybridization (FISH) to physical chromosomes using fosmids and BACs ([58] and P. Dalzell unpublished data), which will facilitate anchoring portions of the
genome assembly to chromosomes.

In addition to genomic sequences and mapping information, both Sanger and 454 transcriptome
data for the crocodile and alligator are available [63,64]. Transcriptome data will be further augmented by a diversity of tissue-specific cDNA
libraries from multiple species that will be sequenced using Illumina RNA-seq to assist
gene annotations. The cDNA sequences will also enable further scaffold ordering and
orientation for transcripts that are split between multiple genomic fragments [65]. We will use these legacy and new data to further improve the initial de novo assemblies. To view the preliminary assemblies, see Ref. [52].

Sequencing strategy for the three crocodilian genomes

Owing to the availability of diverse legacy data, we are pursuing different strategies
for the sequencing and assembly of each genome, as described below.

For the American alligator genome, we are following the Allpaths-LG recommended pipeline
[66] of a combination of high coverage pairs of overlapping reads with a second, moderate
coverage, longer insert mate-pair library. This pipeline has yielded good results
with a variety of assemblies including de novo reassemblies of mouse and human [66], and was successfully employed in an independently evaluated genome assembly contest
[67]. We have combined approximately 50× coverage from an overlapping, Illumina, short-insert
library with about 20× coverage from an Illumina 2 kbp mate-pair library. To investigate
genetic variation and increase coverage, we will combine these reads with a set of
short, non-overlapping 2 × 100 bp Illumina reads at approximately 50× coverage. In
addition to providing deeper coverage, these data will also provide information about
genetic variation in American alligators due to single nucleotide polymorphism differences
between the diploid chromosomes of an individual. We will further scaffold the assembly
using low coverage BAC-end sequences, and we will carry out FISH mapping to assign
scaffolds to chromosomes.

To sequence the saltwater crocodile genome, we are combining high coverage Illumina
short insert sequencing with low coverage 454 libraries in a hybrid approach, similar
to that used for the turkey genome [10]. We currently have about 80× coverage from a non-overlapping, short-insert library
and an additional 40× from an overlapping short-insert library. We also plan to generate
about 20× coverage from an Illumina 2 kbp mate-pair library. To supplement the Illumina
data, we have generated 1× coverage of unpaired 454 reads (about 700 bp in length),
and plan to generate an additional 2× coverage from 3 kbp and 6 kbp paired 454 reads.
We will also end-sequence the crocodile BAC library using a method similar to the
fosmid-based ShARC method described by Gnerre et al. [66]. Some of these BACs are known to contain microsatellite DNA markers used in the crocodile
linkage map [62] and others have already been FISH mapped to chromosomes in the crocodile [58]. We will integrate this information for scaffolding and assigning scaffolds to chromosomes.
As with the American alligator genome, we are also generating transcriptome data for
the saltwater crocodile for both annotation and scaffolding purposes. We will also
use the 454 brain transcriptome data that exists for the American alligator [64] and the Nile crocodile [68] in our analyses. We will use these EST and RNA-seq data, along with the other resources
described above, to further order and orient scaffolds within the assembly.

Finally, we will assemble the gharial genome using a hybrid approach similar to that
used for the saltwater crocodile. To do this, we have generated 40× coverage from
an overlapping short-insert library. This will be combined with sequences from 400
bp and 700 bp paired-end Illumina libraries sequenced to give approximately 30× coverage,
as well as 2-3× genome coverage consisting of 454 shotgun reads and 3 kbp and 6 kbp
paired-end 454 libraries with FLX+ reads. Finally we will generate approximately 20×
coverage from an Illumina 2 kbp mate-pair library. The gharial is a critically endangered
species, making it nearly impossible to collect a wide variety of tissues for transcriptome
data. Nonetheless, we have collected blood, which will be used to generate Illumina
RNA-seq data. As with the American alligator and saltwater crocodile, we will use
de novo assembled transcripts to improve the assembly.

Project timeline and goals

The first phase of our sequencing effort, in which we generate high coverage short
insert and overlapping libraries, has been completed for American alligator and saltwater
crocodile and is ongoing for the Indian gharial. The data generated for alligator
and crocodile were used to generate early draft assemblies for those genomes. The
second phase will involve generating longer distance mate-pair libraries and BAC-end
sequences to improve the assemblies. We plan to have the data gathered for this phase
by mid-March 2012. The third and final phase will involve FISH mapping the BACs to
assign scaffolds to chromosomes. When all three phases are completed the assemblies
should be as contiguous as possible, given the combination of high coverage short
distance information generated in phase one with lower coverage long distance information
generated in phase two. The third phase is not critical for the most pressing questions
involving crocodilian genomics; individual genes and their regulatory regions will
be of primary interest, as opposed to the long-range linkage required for identifying
selective sweeps. Thus we will proceed with this third phase in parallel with our
other comparative genomic analyses. Once the three genomes are assembled, we will
perform comparative genomic analyses both within Order Crocodylia, and among crocodilians
and other members of Reptilia.

The completion of each of these phases will be publicly communicated via the website,
and links to the data and assemblies will be available to researchers with restrictions
as detailed below. We anticipate data collection and initial analyses to be complete
by June 2012, and we plan to submit the genome paper within one year of finalizing
these initial analyses. The Toronto Statement [69] suggests that there be a one-year period of initial analyses and publication, after
which the broader community would be free to use this data in an unrestricted manner.
Precise dates at which we complete data collection and initial analysis, and thus
the beginning of the embargo period on the genome data, will be promptly posted on
the website [52].

Status of the current preliminary genome assemblies

Preliminary assemblies for alligator and crocodile are available. The assembly for
alligator additionally uses information from a 120× physical coverage, Illumina 1.5
kbp mate-pair library. The current crocodile assembly was generated with 80× coverage
from a 380 bp paired-end Illumina library. The statistics for the length and contiguity
of these two assemblies are shown in Table 1. These assembly statistics are on par with other early stage de novo assemblies using short read data [7,70].

To obtain early estimates of potential TE content, we analyzed the current assemblies
using RepeatMasker and a custom repeat library. The library consisted of all vertebrate
TEs identified in RepBase [71] and a set of potential TEs identified by applying RepeatScout [72] to both raw 454 data and to the current assemblies (D. Ray, unpublished data). Consistent
with earlier studies [59,73,74], much of the repetitive content of the genome comprises non-long terminal repeat
(non-LTR) retrotransposons from the CR1 family (Figure 3). There is also high content of Chompy-like miniature inverted-repeat transposable
elements (MITEs) [75], Penelope retrotransposons, ancient short interspersed repetitive elements (SINEs),
and satellite/low complexity regions. Overall, 23.44% of the alligator and 27.22%
of the crocodile genome assemblies are annotated as repetitive compared with 50.63%
seen in humans. Thus, this preliminary analysis provides further evidence that these
reptilian genomes might be easier to assemble than typical mammalian genomes due to
their lower repeat content.

Figure 3.The size of different repeat families classified in our current alligator and crocodile
assemblies. Despite more long-distance insert libraries for alligator, more repeats were found
in the crocodile assembly. This strongly suggests that crocodiles have more repeats
than do alligators, and perhaps the difference will become even more striking as the
crocodile assembly improves.

We also examined GC content across the assemblies (Figure 4). Alligators and crocodiles appear to have a higher mean GC content than many other
vertebrates. Additionally their large standard deviation in GC content across contigs
is similar to that of birds and mammals, suggesting that their base composition is
heterogeneous and likely contains GC-rich isochores. This is unlike the situation
in the lizard (Anolis) and frog (Xenopus), which lack strong isochores based upon analyses of genomic data [76], or the turtle Trachemys scripta, which appears to lack strong isochores based upon analyses of expressed genes [77]. However, these results are consistent with previous analyses of ESTs that suggested
the existence of GC-rich isochores in the alligator genome [62,77]. Thus, these crocodilian genome data extend the results of the previous analyses
and confirm the genome-wide nature of GC-content heterogeneity in crocodilian. We
expect improved crocodilian genome assemblies to further illuminate the details of
isochore structure in reptiles.

Figure 4.The distribution of GC proportion across several species. Note that alligators and crocodiles have a higher overall proportion of GC than
many other vertebrates, as predicted by early BAC-end scans [42]. Abbreviation: SD; standard deviation.

Quality control of intermediate assemblies and raw data

For the alligator genome, we have collected nearly 1.8 billion pairs of Illumina reads
from embryos at different developmental stages that were incubated at 'male producing
' (33.5°C) and 'female producing' (30°C) temperatures. From these data, we produced
a set of rigorously filtered transcript sequences that we will use to assess the completeness
and contiguity of the alligator assembly. These transcripts were assembled using the
OASES [78] module of velvet [79] as follows. The initial assembly of the RNA-seq paired-end reads produced 749,838
fragments. We identified the longest open reading frames from each and translated
them into putative proteins. We then compared these with the set of known protein
sequences in the Swiss-Prot database [80], removing proteins that were more than 10% different in length from the full length
Swiss-Prot hit, this removed all but 16,972 putative transcripts. We then focused
on the CDS sequence of these genes and removed sequences with less than 5× RNA-seq
coverage in any 30-bp window of the sequence. This procedure yielded 2,570 high-confidence
alligator CDS sequences. We used these sequences to assess the quality and completeness
of the current alligator assembly with results shown in Figure 5. Overall, more than 95% of these filtered CDS sequences were full length on a single
scaffold. The improvement garnered by subsequent assemblies will be assessed using
these data in the same manner. We will assess the quality and completeness of crocodile
and gharial genomes in a similar manner.

Figure 5.Using de novo assembled alligator transcripts, the level of gene presence and fragmentation in two
alternate alligator assemblies were compared. These results suggest that the new assembly (assembly B) is an improvement over
the earlier effort (assembly A).

Because we do not yet have a set of assembled transcripts for the crocodile genome,
we instead used a comparative genomics approach for quality assessment on our early
assemblies. For example, we generated two pre-release draft saltwater crocodile assemblies,
the second of which (here called Crocodile B) had a slightly lower N50 but a greater
overall length and slightly greater mean contig size relative to the first version
(here called Crocodile A). Because these statistics conflicted, we aligned the two
competing versions of the saltwater crocodile genome to the chicken reference genome
(UCSC galGal3) using the UCSC multiz genome alignment pipeline [81]. We then analyzed regions of the multi-way alignment that overlapped chicken genes
in the n-scan gene track. With these gene alignments we compared the total number
of genes that could be aligned across the two assemblies and the overall level of
gene fragmentation for the genes that aligned between the two assemblies (Figure 6). Based on this analysis, we determined not only that N50 was reduced in Crocodile
B but that gene contiguity was also reduced. This indicates that assembly B was not
introducing false joins to achieve a higher N50, as its joins resulted in more intact
gene alignments.

Figure 6.Using gene regions in a whole genome alignment to the chicken reference genome (galGal3),
we compared the number of scaffolds of each assembly that the alignment of each gene
is split between. Although many genomic rearrangements may exist between chicken and crocodile, assuming
that breakpoints tend to happen between genes rather than within genes, this method
allows us to assess the relative quality of assemblies in the same manner as Figure
5 when assembled and verified transcripts are not yet available.

We will employ additional quality metrics to detect and describe the collapse of segmental
duplications within our assemblies. Specifically, read-depth is a sensitive measure
of this assembly artifact. Preliminary analysis suggests that such artifacts are not
common in alligator or crocodile genomes (data not shown). We will employ a final
form of quality control by examining the relative synteny of our three crocodilian
candidate assemblies. Because alligators, crocodiles, and gharials appear to have
undergone few chromosome-level rearrangements [54], we expect a high level of synteny between accurate assemblies. Once we begin scaffolding
all of our assemblies with longer mate-pair and BAC data, we will assess their relative
quality by measuring the effect on overall crocodilian synteny.

Planned analyses and experiments

Here we outline major questions, types of analyses and analytical goals that will
be included in the core publication of these completed genomes. The Toronto Statement
[69] suggests these questions should be articulated to identify these topics as embargoed
during preparation of the genome publication. The ICGWG will address a number of research
questions at both the level of genome evolution and crocodilian biology that we describe
below.

A crucial step in making genome resources useful to the scientific community is generating
gene annotations. We will perform gene finding for crocodilians using the Ensembl
[82] and Augustus [83] annotation pipelines and combine the output. We will also partner with groups sequencing
additional avian genomes and update the crocodile annotations as needed. Gene finders
will initially be trained using the chicken genome and the results from the pipelines
will be compared to identify accuracy at both the gene and exon level. Genes will
be assigned standardized gene nomenclature based on chicken gene names where there
is an unambiguous 1:1 functional ortholog, or a gene identifier in cases where this
is not possible. We will also provide preliminary functional annotation for proteins
and transcripts using standard Gene Ontology Consortium methods, including functional
analysis of motifs and domains and manual curation of orthologs. The ICGWG will perform
these analyses to complement and extend those performed by NCBI and Ensembl once the
draft genomes are submitted to those organizations.

One major focus will be the large-scale structure of crocodilian genomes, focusing
on the degree of syntenic conservation at different scales within these genomes. Karyotype
analysis suggests a remarkable conservation of synteny among crocodilians, with the
alligator and crocodile having undergone fewer than five chromosomal rearrangements
visible at the microscopic level [54] despite 80 million years of evolutionary divergence. However, the level of syntenic
conservation at small scales within these genomes remains unclear, and we expect our
genome assemblies to illuminate this topic. Microchromosomes are absent in crocodilians
[54,55,59] but present in birds, lizards and snakes, tuatara, and turtles [4,84]. This absence in crocodilians almost certainly represents a derived feature of crocodilians.
We will examine the fate of these genetic units within crocodilian genomes. Do microchromosomes
comprise linked components within the genomes of the only major reptilian clade without
microchromosomes?

Recent work showed that the lizard, Anolis carolinensis, unlike other amniotes sequenced to date (with the possible exception of turtles
[77]), has a homogeneous genome that lacks GC-rich isochores [76,4]. Our preliminary analyses indicate that crocodilians have a higher GC-content and
greater heterogeneity than Anolis (Figure 4), but these analyses are less clear regarding the scale of the observed GC-content
variation. Do crocodilians have GC-rich isochores that are similar to those in mammals
and birds or do the patterns of GC-content heterogeneity appear distinct?

We will also carry out a number of traditional analyses of genome content using the
crocodilian genomes, focusing on repeated sequences and gene families. These analyses
include the evolution of repeat families and patterns of TE proliferation. We will
compare the repeat family content within crocodilian genomes and with other reptiles
and amniotes. Additionally, we will conduct analyses of gene family evolution within
reptiles and crocodilians to identify specific genes and other functional elements,
including the identification of ultra-conserved regions and potential micro RNA sequences,
with a special focus on those sequences that could have been gained or lost both within
the crocodilians and in comparison to the other relevant lineages that are now available
for investigation.

We will use these three crocodilian genomes to infer their ancestral genome. This,
combined with existing and soon to be released bird genomes, will enable some inference
of the ancestral archosaur genome. Reconstructing the ancestral archosaur genome has
obvious implications for expanding our understanding of the genomes of extinct archosaurs,
like the non-bird dinosaurs and pterosaurs (Figure 1).

There are also several biological questions specific to crocodilians that we will
address by analyzing genomic and RNA-seq data and via experimental techniques. For
example, despite having a temperature-dependent sex-determination system seemingly
without sex chromosomes, the sexes of crocodilians have been shown to have very different
recombination rates [62]. Identification of the genes that are differentially expressed in the male and female
crocodilian gonads might provide insight into the perplexing observation.

SNP discovery arising from the genome sequencing is particularly relevant to farm-bred
saltwater crocodiles. Large panels of SNP markers will enable more refined linkage
maps [62], more precise mapping of quantitative trait loci (QTL) than is currently possible
with microsatellite markers [62] and eventually the implementation of genomic selection in crocodile breeding programs.

Eventually members of the ICGWG hope to address additional questions beyond the scope
of the initial genome paper. These might be presented in satellite publications. One
of these involves the sex determination system of American alligators. Which genes
are the initial temperature sensitive regulators that trigger the downstream, largely
conserved [85] sex-determination system? Having the genome sequences available for these three crocodilians
will enable a new wave of discoveries about the evolutionary histories of crocodilians,
non-avian reptiles and birds, and amniotes generally.

How other groups can join the consortium, or publish independently with our early
release data

This project is affiliated with the Genome 10 K (G10K) initiative [14]. We invite other G10K affiliates and the broader scientific community to access and
make use of the draft assembly and raw read data that we have produced. Any group
performing non-genome-scale analyses that are sufficiently independent of the analyses
described above are welcome to use these data without restriction. As a matter of
courtesy and to avoid duplicated effort, we request that competing genome-scale projects
or analyses that overlap with the areas stated above disclose their status to the
ICGWG consortium (formal inquiries and requests to join the working group should be
made to D.A.R.) and cite this and subsequent papers that provide the data. Versioned
assemblies, further project description, and a complete list of current ICGWG members
can be accessed on the website dedicated to this project [52].

Competing interests

The authors declare that they have no competing interests.

Acknowledgements

This work was supported by grants to D.A.R. (MCB-1052500, MCB-0841821, DEB-1020865
from the U.S. National Science Foundation) and funds from the Institute for Genomics,
Biocomputing and Biotechnology at Mississippi State University. E.L.B., E.W.T., and
collaborators at the University of Florida were supported by funds from the U.S. National
Science Foundation (DUE-0920151). T.I. received financial support from the National
Institute for Basic Biology and Grants-in-Aid for Scientific Research from the Ministry
of Education, Culture, Sports, Science and Technology of Japan. S.R.I., L.G.M., J.G.,
P.D. and C.M. were supported by Australian Rural Industries Research and Development
Corporation grants (RIRDC PRJ-000549, RIRDC PRJ-005355, RIRDC PRJ-002461). M.K.F.
received financial support from a U.S. National Science Foundation Biological Informatics
Postdoctoral Fellowship (DBI-0905714). R.E.G. is a Searle Scholar and a Sloan Fellow.
E.D.J. was supported by the Howard Hughes Medical Institute and the National Institutes
of Health. We are grateful to Kent Vliet (University of Florida) and the Alligator
Farm (St. Augustine, Florida) for providing access to fresh gharial blood.

References

Hedges SB, Kumar S: The Timetree of Life. Oxford University Press, USA; 2009.