We’ve been promised that as genome sequencing becomes faster and simpler, we’ll start seeing practical dividends as well as parlour tricks like sequencing Watson’s genome. Some of the dividends are already paying out, as a paper in the latest PLoS Pathogens1 shows.

Probably most of you remember the outbreaks of foot-and-mouth disease in Britain in 2001, and again last year. FMD is a virus that affects many hooved animals; it’s not usually fatal, but causes productivity loss. FMD outbreaks are economically devastating, because aside from the productivity loss many countries, that are free of the disease, will refuse to take meat or other agricultural products from outbreak areas. The goal of FMD management, then, is to keep it away, and if it ever hit, to contain it and slaughter all infected and potentially-infected animals.

The 2001 outbreak in Great Britain came from outside the country. The 2007 outbreak, though, was clearly from a local source: The FMD research lab in the Institute for Animal Health (IAH), Pirbright, Surrey. The latest paper discusses the epidemiology of that outbreak, and how they used whole-genome sequencing to track and predict sites of FMD.

(This is timely, because the US is planning to move the sole American FMD research center, now on Plum Island, to the mainland. There’s obvious concern that the virus could escape from containment within research labs and infect neighboring animals, causing the first American FMD outbreak since 1929. I am not particularly knowledgeable about the field, but I have to think that, at best, the timing of the planned move is unfortunate.)

FMD is caused by a picornavirus, the same broad family as polio and cold viruses. Like those viruses, FMD mutates rapidly, traveling around as a quasispecies cloud. The clouds can be easily divided into 7 broad groups, and within the most common serotype (O) there are 8 distinct subgroups (see the map2 to the right [click for a larger version] for their geographical distribution).

The FMD genome is 8134 nucleotides long, and the sequence analysis that has been used for epidemiology like the 7 different topotypes has been based on no more than 8% of that length — the VP1 gene, usually. That’s enough to track high-level changes, because of FMD’s rapid mutation rate:2

the rate of evolution is approximately 1% per year …. If the concept of a constant evolutionary rate is accepted and there are no constraints on virus evolution then it would expected that new topotypes could arise in approximately 15 years. In reality, this extent of evolution probably takes much longer. For example, FMD viruses belonging to the Asia 1 serotype, first identified in samples from Pakistan in 1954 … have not yet exceeded 15% nucleotide difference …

But 8% of the genome is not nearly enough to track changes within a single epidemic, like the one in Surrey last year; it simply isn’t long enough to pick up the handful of variations. It was known in the previous outbreak, in 2001, that the information was there in the genome (“virus recovered from closely housed animals can differ by 1 to 2 nucleotides and is likely to pass through a “bottleneck” on passage between farms”).3 The issue was a practical, technological one — being able to sequence entire virus genomes quickly enough to pass back information to people in the field.

By 2007, the technology was there. The people at the IAH were able to sequence genomes from viruses isolated in the outbreak with a fine enough comb to track changes throughout the spread, and fast enough pass information back to the field within 24-48 hours. Their sequencing confirmed that the virus was in fact a lab escapee, because it was almost identical to a couple of lab strains but was different from circulating viruses. 4

The 40-odd viral genomes yielded a fair bit of useful information (see the figure to the left for a summary). For example,

The small number of nucleotide substitutions observed between viruses from source and recipient IP suggests that there has been direct transmission without the involvement of other susceptible species, e.g. sheep or deer.

It’s obviously useful to know if there’s a wild-animal reservoir of disease, but an even more important insight came from this work as well.

the virus from IP3b was nine nucleotides different from the virus from IP1b … This is a high number of changes for a single farm-to-farm transmission … and we predicted that there were likely to be intermediate undetected infected premises between the first outbreaks in August and IP3b. … Serosurveillance of all sheep within 3 km of the September outbreaks revealed another infected premises (IP5), on which it was estimated that disease had been present for at least two, and possibly up to five weeks. As Figure 2B shows, IP5 is a likely link between the August and September outbreaks.

I would be interested in hearing from the people on the ground just how useful this information was — for example, were they impelled to search more for an intermediate source based on this information, or did they already suspect it from other, classical ways? But in any case, it’s clear that genomics is capable of pushing epidemiology a lot further in the future.

As far as I know, it’s not yet known how exactly the virus escaped from the IAH. I’ve read what seems to be informed speculation that it may have come from the drains, as decontamination systems designed to prevent that weren’t properly maintained; but I don’t know if that’s true, an educated guess, or mere rumor and guesswork.[↩]