Skepticism

EVENTS

The CephSeq Consortium has a strategy

I approve this plan. A number of researchers have gotten together and worked out a grand strategy for sequencing the genomes of a collection of cephalopods. This involves surveying the phylogeny of cephalopods and trying to pick species to sample that adequately cover the diversity of the group, while also selecting model species that have found utility in a number of research areas — two criteria that are often in conflict with one another. Fortunately, the authors seemed to have found a set that satisfies both (although it would have been nice to see the Spirulida and Vampyromorpha make the cut — next round!). Here’s the initial group, table taken directly from the text with the addition of a few pretty pictures for those of you unfamiliar with the Latin names.

Table 1: Cephalopod species proposed for initial sequencing efforts.

Species

Estimated genome size

Current sequencing coverage

Geographic distribution

Lifestyle juvenile/adult

Research importance

O. vulgaris

2.5-5 Gb

46×

world-wide

planktonic/ benthic

classic model for brain and behavior, fisheries science

O. bimaculoides

3.2 Gb

50×

California, Mexico

benthic

emerging model for development and behavior, fisheries science

H. maculosa

4.5 Gb

10×

Indo-Pacific

benthic

toxicity

S. officinalis

4.5 Gb

–

East Atlantic- Mediterranean

nectobenthic

classic model for behavior and development, fisheries science

L. pealeii

2.7 Gb

–

Northwest Atlantic

nectonic

cellular neurobiology, fisheries science

E. scolopes

3.7 Gb

–

Hawaii

nectobenthic

animal-bacterial symbiosis, model for development

I. paradoxus

2.1 Gb

80×

Japan

nectobenthic

model for development, small genome size

I. notoides

–

50×

Australia

nectobenthic

model for development, small genome size

A. dux

4.5 Gb

60×

world-wide

nectonic

largest body size

N. pompilius

2.8-4.2 Gb

10×

Indo-Pacific

nectonic

“living fossil”, outgroup to coleoid cephalopods

It’s a nice balance. There’s a pair of related octopus (Octopus vulgaris and Octopus bimaculoides) and a pair of related squid (Idiosepius paradoxus and Idiosepius notoides) so common features to each group can be recognized, a couple of model organisms used in neuroscience (Loligo pealeii) and developmental biology (Euprymna scolopes), and a couple of just plain cool animals, the blue-ringed octopus (Hapalochlaena maculosa) and the giant squid (Architeuthis dux). And of course you have to include a cuttlefish (also an important research model), and a nautilus for the outgroup.

It’s going to be challenging — cephalopods are like us in having large, sloppy genomes with lots of repeats and accumulated junk.

Like all good science, too, this is going to be open and accessible.

We therefore propose to adopt a liberal opt-in data sharing policy, modeled in part on the JGI data usage policy, which will support the rapid sharing of sequence data, subject to significant restrictions on certain types of usage. Community members will be encouraged to submit their data, but not required to do so. We plan to provide incentives for this private data sharing by (1) developing a community data and analysis site with a simple set of automated analyses such as contig assembly and RNAseq transcript assembly; (2) offering pre-computed analyses such as homology search across the entire database; and (3) supporting simple investigative analyses such as BLAST and HMMER. We also plan to provide bulk download services in support of analysis and re-analysis of the entire dataset upon mutual agreement between the requesting scientist and the CephSeq Consortium Steering Committee (see below), who will represent the depositing scientists. Collectively, these policies would provide for community engagement and participation with the CephSeq Consortium while protecting the interests of individual contributors, both scientifically and with respect to the Convention on Biological Diversity. Policy details will need to be specified and implementation is subject to funding. Our intent is to build an international community by putting the fewest barriers between the data and potential researchers, while still protecting the data generators.

I also like that there’s an appreciation of the importance of wider communication of this information beyond the sphere of nerdy genomics researchers and obsessed cephalofreaks. The authors recognize that cephalopods are important barometers of climate change and the ocean environment, and that people are just plain fascinated with them.

People are fascinated by cephalopods, from Nautilus to the octopus to the giant squid. The coupling of genomics to cephalopod biology represents a fusion of two areas of great interest and excitement for the public. This fusion presents a tremendous educational platform, particularly for K-12 students, who can be engaged in the classroom and through the public media. Public outreach about cephalopod genomics will help build support for basic scientific research, including study of marine fauna and ecology, and will add to the public’s understanding of global changes in the biosphere.

Unfortunately, this short paper is a little thin on details of particular interest to me: “Education and outreach will be emphasized for broad dissemination of progress in cephalopod genomics at multiple levels, including K-12, undergraduate and graduate students, and the public at large.” I’d be curious to see more about the how of doing that, but I’m glad it’s on their list of priorities. Part of their plan is building a website, but unfortunately when I just checked it wasn’t yet available.

It’s going to be challenging — cephalopods are like us in having large, sloppy genomes with lots of repeats and accumulated junk.

Newb biology question: are there any species or families where this doesn’t hold? Are there any sorts of environments that would provide selective pressure not to have a sloppy genome? My intuition is that having a sloppy genome generally isn’t disadvantageous, and without anything working against it the tendency is to build up cruft.

Most viruses have nice, neat genomes without junk. They can do some pretty weird transcriptional and post transcriptional modifications, and most will make fairly major post translational modifications to the product proteins. Polio, for example, has a genome that encodes a polyprotein that is cleaved into the functional proteins after translation. Ebola produces four proteins and a peptide from one gene by a combination of poly-A induced frameshifts during transcription, post transcriptional modification, and post translational cleavage and glycosylation. As another example of viral genomic weirdness, the Hepatitis B virus has a partially double stranded genome that is converted to a fully double stranded one by a reverse transcriptase during replication.

Thank you both for the clarifications! I thought that was a possibility with non-eukaryotes, but didn’t consider it too much due to metazoan chauvinism. A.R., thanks for the details; I had no clue viruses worked that way. Very weird.

My intuition is that having a sloppy genome generally isn’t disadvantageous, and without anything working against it the tendency is to build up cruft.

DNA replication uses energy, so a sloppy genome would be disadvantageous if your energy budget is low, and if the time required to replicate the DNA is the rate limiting step to your reproduction.

These two hold true in most prokaryotes, and as a result, prokaryotes tend to have a streamlined genome.

But they don’t generally hold true for eukaryotes (a eukaryote can simply increase its number of mitochondria if it needs more energy, and their cell cycles are more complicated with DNA replication only being a small part of the whole), so eukaryotes have tended to accumulate sloppy genomes.

Thanks for the great write up. It didn’t make it in but I advocated for

CephSeq Phase 1: Random initial projects (what you see above)

CephSeq Phase 2: Remainder of favorite species

CephSeq Phase 3: CephSeq 1000 – community-based sequencing of all cephalopod species. Similar to the Vertebrate 10,000 and Arthropod 3,000.

The relatively small size of the class makes its sequencing feasible, increasingly so as sequencing costs drop and sequencing quality improves. Comparisons of say octopus to human will be fascinating – but imagine full class comparisons of cephalopods to vertebrates, mapping traits in parallel across their trees at genomic and organismal levels. Fortunately, Phase 1 already captures much of the major diversity in cephalopods, so we’ll be able to start doing analyses such as this using the representative genomes in the not so distant future. But still, I like the idea of sequencing the entire class.

And no, there aren’t a 1000 described species yet but the numbers are growing.

And yes, many samples will be difficult to get – but start now and we’ll get there :)

The authors recognize that cephalopods are important barometers of climate change and the ocean environment, and that people are just plain fascinated with them.

It’s amazing really. Just like many other South Australians I’d never previously cottoned on to the fact that Spencer Gulf had a seriously interesting giant cuttlefish population in a unique(ish) habitat. Once I knew about them, I couldn’t get enough.

Lots and lots of people you’d never have thought had much scientific interest generally were deeply miffed that the Gulf’s Sepia amada animals missed out on threatened species status. Though everyone knew that one main driver of the application was a proposed industrial development slap bang in the middle of their breeding grounds, they still thought “our” giant cuttlefish were special. Of course, I might just be overlooking the boating and fishing enthusiasts who seem to have a lot of good knowledge if you can tolerate listening to them rattle on about some quite tedious stuff as well.