Resources

The combined resources provided by the preliminary Sea Urchin Genome Project are included on this site. The
individual resources listed below are useful for gene discovery approaches, expressed sequence tag analysis and most importantly, studies of gene regulation in the sea urchin.

BACs, BAC-ends and Gene NumberA virtual map of the genome was constructed by sequencing the ends
of 76,020 BAC recombinants (average length 125kb) {Figure}. The BAC-end sequence tag connectors (STCs) occur an average of 10kb apart. They can be used to assemble contigs surrounding any gene of interest.Using Blast matches to sequences from Bac-ends and complete BACs, confirmations from cDNA sequences we estimate that the sea urchin genome contains a total of 22+/-5X103 genes.

RepeatsSince the first sea urchin genomic sequencing project was undertaken, a number of collections of the repeat sequences in the purple sea urchin have been collected. A survey of the 76,000 BAC ends for repeat sequences is described here.

Genomic sequence segments are maintained in bacterial artificial chromosome (BAC) libraries. We currently have libraries for seven species of lower deuterostomes in our resource (see Table)The vector used is pBACe3.6 which originates from Children's Hospital Oakland Research Institute in Oakland, California, USA. This vector has chloramphenicol antibiotic resistance and is fully described in an article by Frengen and colleages. The vector sequence is available here. The average insert size for our BAC libraries is about 140 Kb thus 1X coverage of the 800 Megabase genome of S. purpuratus is on average 5700 clones. Our libraries are at least 100,000 clones providing about 17X genome coverage.

cDNA sequences Approximately 13,000 cDNA sequences were obtained from the primary mesenchyme cell library. These sequences comprise 7,400 unique sequences when all of the overlaps are assembled. When these are searched against the BAC-end sequences, 1087 unique matches occur. Thus, the sequence matches between the BAC-ends, the ESTs, and the published data bases all give results commensurate with the conclusion that the collection of sequences we have obtained are of a quality suitable for gene discovery investigations in the sea urchin embryo. (Cameron et al., Proc. Natl. Acad. Sci. USA, Vol. 97, Issue 17, 9514-9518, August 15, 2000 )

Gene Model

During sea urchin genome project, several groups came up with gene predictions based on diverse approaches (ab initio, homology-based or empirical). Baylor used GLEAN methodology to combine those gene-sets into 28,944 unique genes. Their structures were derived from V0.5 genome assembly. At SpBase, we adopted Baylor's GLEAN genes and renamed each GLEAN IDs as GLEAN3_12345 to SPU_012345. New SPU genes will be added with IDs starting from 030000. This first release only modified the gene IDs from GLEAN and adopted them into SpBase. No real changes of gene structures were done.

Genome sequence traces

Beginning in March 2003, the Baylor College of Medicine, Human Genome Sequencing center (www.hgsc.bcm.tmc.edu) began to produce sea urchin sequences. First, a whole genome shotgun (WGS) project was undertaken and the individual sequences are deposited in the Genbank Trace Repository at NCBI. We have downloaded these traces,analyzed them by Blast and posted the matches in a searchable form on this web site. We will continue to do so until assembled genome sequences are posted at NCBI.

Quantitative PCR primers

The Davidson laboratory at Caltech has generated a panel of quantitative PCR primers useful for measuring the level of mRNA abundance for genes involved in early development in general and the endomesoderm gene regulatory network in particular. A table of primer sequences and comments can be viewed (here) or downloaded (here).

Nanostring Codeset

The Nanostring nCounter identifies and counts RNA molecules based on a fluorescent barcode attached to a sequence specific hybridization (Geiss, G. K. et al,2008). Our newly designed probe set contains codes for 341 genes covering the majority of active, and spatially restricted regulatory genes in the Strongylocentrotus purpuratus embryo up to pluteus (72 h post-fertilization). A description of the use of our previous code set was published by (Materna et al.). They showed that Nanostring nCounter yields measurements with high fidelity over 5 orders of magnitude at levels down to a few transcripts per embryo. The genes and sequences in the new codeset are tabulated here.