Perhaps most excitingly (terrifyingly?) we’re going to raise some of the funds to do the genome sequencing by crowdfunding, using the Experiment.com platform. So please keep an eye on the project site, follow our Twitter feed, and Like our Facebook page to make sure you don’t miss your chance to help understand Joshua trees’ evolutionary past and ensure their future.

At the CGRI, we would like to understand first, how much genetic variation there is in the numerous pure C. sativa, C. indica, and C. ruderalis accessions and heirloom varieties. This will lead us to understand the relationships among the major lineages within the genus, the spread of Cannabis throughout the globe, and rates of historical hybridization between the named species.

For Daniela’s detailed run-down of important evolutionary questions in Cannabis, go read the whole thing.◼

Do you like evolution, genetics, and evolutionary genetics? Would you like to think of things to do with a whole lot of genetic data and a flagship model legume? Well, my boss, Peter Tiffin, is looking for another postdoc. Here’s the post description from EvolDir:

I have available a post-doctoral position to work on association and evolutionary genomics of the model legume Medicago truncatula. Collaborators and I have recently collected genome sequence for > 200 accessions and have used these data for GWAS and population genomic analyses. We are currently working to refine our understanding of genomic variation segregating within this species and are particularly interested in the evolutionary genetics of the symbiosis between Medicago and Sinorhizobia. The successful applicant will have considerable freedom to develop research in their area of interest.

The deadline for submissions is 15 September 2013, so get in touch with Peter pronto if you’re interested. (See the full ad for contact information and the application package requirements—it’s standard stuff.) Benefits of the position include working with population genomic data from the cutting edge of current technology in a collegial lab with some very smart people (and me) in the midst of a fantastic community of biologists at the University of Minnesota—as well as living in the Twin Cities, which are empirically awesome. Yes, even in winter.◼

Equipped with the core genome sequence, the team collected still more sequence data from ten male flycatchers of each species, and aligned these additional sequences to the genome sequence, identifying millions of sites that vary within the two species, and millions of sites where they share variants. They scanned through all these sites to identify points in the genome where differences between the two small samples of flycatchers were completely fixed — that is, sites where all the collared flycatcher sequences carried one variant, and all the pied flycatcher sequences carried a different variant. The frequency of these fixed differences varied considerably across the genome, but there are dozens of spots where they’re especially concentrated, forming peaks of differentiation.

To learn what all those “islands of divergence” could tell us about how the two flycatchers came to be different species, go read the whole thing.◼

Does evolutionary change happen in big jumps, or a series of small steps? The question may seem a little esoteric to non-scientists—how many mutations can dance on the head of a pin?—but it has direct implications for how we identify the genetic basis of human diseases, or desirable traits in domestic plants and animals.

That’s because the evolutionary path by which a particular phenotype, or visible trait, first evolved in a population is closely related to the genetics that underlie the trait in the present. Phenotypes that arose in a single mutational jump will probably remain connected to one or a few genes with large effects; phenotypes that evolved more gradually do so because they are created by the collective action of many genes. So what kind of evolutionary change is most common will determine which kind of gene-to-phenotype relationships we should expect to find.

In an excellent recent review article for the journal Evolution, Matthew Rockman, a biologist with the Department of Biology and Center for Genomics and Systems Biology at New York University, makes the case that the era of genomics has, so far, been much too focused on finding genes of large effect. Fortunately, Rockman also sees the beginnings of a new movement towards acknowledging the importance of small-effect genes—one which may ultimately make genomic association studies more useful.

Biologists are about to have access to all the genetic data we could ever want. Unfortunately, once we have that data, we have to figure out where to put it—and some way to sift out the bits that answer the questions we want to answer.

That’s the first day of the NESCent workshop in next-generation sequencing methods in a nutshell.

Brian O’Connor, who gave the morning lectures, framed the immediate future of biology as a race between technologies for collecting genetic sequence data and technologies for storing and analyzing that data. Moore’s Law is that computer processor speed (really, the number of transistors packed into a single processor chip) doubles about every two years; Kryder’s Law is that computer storage capacity roughly quadruples in the same amount of time. But in the last few years, and for the foreseeable future, DNA sequence collection capacities are growing on the order of ten times every couple years.

In other words, there may very well come a day when the cost of storing and using a genome (or genomes!) belonging your favorite study organism will exceed the cost of obtaining those data.

O’Connor suggests that one major way to stave off the point where computing capacity limits data collection and analysis will be to use more “cloud” systems—remote servers and storage. Lots of institutions have their own servers and computing clusters. I’m already working with data sets too big to carry, much less process, on my laptop; I filter out the subset of sites I want on the server where the data is stored, and download (some) of that smaller data set for local work.

However, high-capacity computing facilities need a lot of lead time, and infrastructure investment, to scale up. That isn’t practical for individual projects. In such situations, and for researchers at institutions that don’t have their own high-capacity computing resources, commercial services may become a major alternative.

In the afternoon, we got started with one such alternative, Amazon EC2, or “elastic cloud computing.” Yes, that’s Amazon as in Amazon.com, the place where you buy used textbooks. It’s possible to rent processing capacity and storage from Amazon, and the services are provided in such a way that when you need more, you can just request it. “Instances” running on Amazon’s computing facilities can run Unix or Windows—you can interact with an instance via a remote desktop-type interface such as NoMachine’s NX system—and will run any program you’d care to have chew its way through your data.

All of this, of course, assumes you have the budget. It’s not clear to me how easy it’d be to estimate computing needs ahead of time for grant-writing purposes; but on the other hand, you can probably expect that whatever estimate you come up with will likely go that much further when you finally start working a year later. Over beer at the end of the day, Karen Cranston, the Informatics Project Manager for NESCent, told me that Amazon’s pricing is close enough to that of the high-capacity computing facility at Duke University that it’s often worthwhile to use EC2 for short-term, high-volume projects simply because it’s so quick and easy to bring new resources to bear.

As a not-yet faculty member, the cloud means I can plan to do genome-scale work even if I end up at an institution without the on-campus resources to build its own cluster. That’s potentially pretty liberating. ◼

Sighted in the woods near Northgate Park, Durham. For real. Photo by jby.

I’m spending the next two weeks in Durham, North Carolina, for the NESCent workshop on next-generation sequencing. Which is to say, a workshop about collecting great big genetic datasets, and what you can do with them once you have them. I’ll be stretching my programming skills to the maximum, and hopefully getting a head start on some ideas I’ve had for good old Medicago truncatula.

If time permits, I may take a page from Carl Boettinger’s literally open lab notebook and post some notes and thoughts here as the workshop progresses, but it’s looking likely to be a full two weeks, and time may very well not permit. ◼