BGI has done some sequencing using Ion Torrent of one of these isolates, and Nick Loman assembled the data. Without getting too technical, the genome is actually in about 3,000 pieces, but with those data (and thanks to Nick for assembling them and releasing them) I was able to perform multilocus sequencing typing (‘MLST’). Basically, we look at the partial sequences of several genes (in this case, seven) to identify its sequence type–think of it as a molecular barcode (for the scheme and details, see here).

So what did I find?
This EHEC strain is most likely a very close relative of ST678 (details in a bit). In fact, according to the mlst.net strain database, there is a strain “Jan-91″, isolated in 2001* from Europe (no further geographic information is provided). That strain belongs to phylogroup D, and is associated with HUS…just like the outbreak strain. And the older strain also has the exact same serotype as the outbreak strain, O104:H4.

Now, the outbreak strain sequence isn’t identical (the data are at the end of this post). “Jan-91″ has an allele profile of adk6-fumC6-gyrB5-icd136-mdh9-purA7-recA7 (to orient you, adk is the gene, 6 is the particular variant of adk or allele). There are three differences:

1) In the outbreak strain, adk is a novel allele that differs from adk6 by one point mutation at position 30 (if I counted correctly; it’s late as I write this…)

2) In the outbreak strain, the icd allele matches icd136 exactly; however, the genome sequence lacks the last two bases. Given that the genome assembly is in over 3,000 pieces (‘contigs’), I think this is missing data, not biology.

3) In the outbreak strain, the recA allele differs from recA7 by one insertion. “Jan 91″ has a sequence of AAAA, while the outbreak strain has a sequence of “AAAAA” (below, it’s recorded as “aAAAA” to indicate the difference). With Ion Torrent (and other high throughput sequencing technologies), when you have ‘runs’ of the same nucleotide, such as “AAAA”, it’s not unusual for a base to be added or deleted, which could yield a ‘false’ “AAAAA.” This could be sequencing error, but I can’t rule out a real insertion (i.e., an extra A that’s real).

While this is obviously a very preliminary analysis of a very preliminary assembly, I don’t understand why this strain is being called ‘new’, ‘mutant’, or anything else. It’s not a bolt from the blue: it looks like a nearly identical strain that caused HUS a decade ago in Europe. I would add the obvious qualifier that there very well could be massive gene gain and loss (I haven’t looked at that yet). I’m guessing that the reports of this strain being very different were based on comparisons to the genomes of other HUS strains, which are pretty divergent. But we have seen this MLST type before associated with this serotype and this MLST sequence type disease syndrome.

All that being said, this is a very serious outbreak–I don’t mean to downplay the seriousness of this as a public health and agricultural crisis by raising this issue. And it will be very interesting to see how different this strain is from other HUS strains. If we’re lucky, the “Jan-91″ E. coli strain still exists in someone’s freezer, and we can see how it’s evolved over the last decade. It’s especially disconcerting that this strain is resistant to so many antibiotics.

An aside: Many kudos to BGI for publicly releasing the data.

Update: There’s a new assembly using a different method. I haven’t checked that yet.

Quoting scientists at the University of Münster, the institute rebutted earlier reports that the newest strain of E. coli had never been previously identified, calling it a “hybrid clone” that drew together the virulent properties of other strains. “Reports that this is a completely new type of pathogen are not accurate,” the institute said.

Well, over here at the meeting, there has been some discussion. Some people even appeared on the BBC to comment last night (much fun at the Red Lion). Seeing as this was a conference on genomics and public health, at the Sanger, it was very apropos.

Anyway, I think Mike got it right when he said that the MLST didn’t handle the potentially interesting issues – gene gain/loss. I’m agnostic on whether this is new or not; I’ve only been hearing things second hand. However, the backbone of the genome is not the place to look. Especially when we have low coverage genomes. Maybe try something like mummer against the references? The fun is likely to be in mobile elements and the accessory genome.

I thought the ‘mutant’ description was because there are additional virulence and/or toxin genes that haven’t been found in this serotype before which I assume wouldn’t show up on an MLST analysis as that is looking at housekeepers. Not that I have been able to find out exactly what has changed – just quotes from people saying there are differences.

(I believe the WHO is saying that it is a ‘variant’ that has never caused an outbreak before, rather than a new serotype. And I think it was in der Spiegel it said that this strain had been seen in a single case before.)

I may be totally wrong – microbiology is still a fairly new field to me – and am quite happy to be corrected.

I’ve also heard the suggestion that increased HUS rates in this outbreak (~30 percent) compared to normal (~10 percent) point toward something new. It seems to me, though, that a likelier explanation is that we have an artificially low overall case count that misses mild, self-limiting illness which never sends people to the doctor. Or maybe German doctors inadvertently administered antibiotics, which is a no-no for STEC infections.

New findings on the E. coli O104 that is causing deaths in Europe
New York. USA. A private biotechnology company used their DNA scanning algorithms to determine that E. coli O104 has genomic signatures specific to Stx2 converting phage I and Stx2 converting phage II previously found in strains of the outbreak in Sakai city, Japan, in 1996. These genomic signatures are absent in the Central African E coli EAEC 55989.

I don’t know how cucumbers could be the original culprit. The cukes I see in the pictures from Europe show that part of the stem is still on (unlike cukes in U.S. supermarkets). Also, one of my brothers harvested cucumbers for a summer and said that they are so prickly that if you don’t wear gloves, your hands will get torn up.

In the tomato crisis in the U.S., it was determined that tomatoes on the vine were OK. I ate them without a problem.

Why isn’t this also true of cucumbers?

Yes, the gloves can be contaminated, but with stems and gloves, how can this crisis be so serious?

The real question here should be the identification of the natural host of this e.coli strain, which is of course a mammalian gut bacteria.

If it got onto vegetables or sprouts, it must have been through the use of manure contaminated with this strain. That is, there must be populations of livestock in Europe that are hosting this strain, perhaps without any problems (a virulent strain in one species may be benign in another, depending).

A similar case (E coli O157:H7) occurred in the U.S. in 1993, linked to Jack in the Box suppliers and poor food handling practices:

“The ground beef had been distributed to Jack in the Box stores in Washington, Idaho, California, and Nevada, and by the end of February 1993 the states had reported the following:

P.S. What about the antibiotic resistance package that this strain contains? There’s been very little about that – but typically such antibiotic resistance kits are contained on plasmids taken up by E.coli from other bacterial strains. They are closely associated with the use of antibiotics to enhance growth and prevent disease in crowded unsanitary factory farms.

Germany’s meat industry has been taking a beating lately (dioxin feed contamination, etc.) and it seems that the German government is bent on protecting that industry – probably the main reason they pointed to Spanish cucumbers in the first place.

Of course the strain is not completely new. The core-genome (on which mlst analysis usually is applied) is quit stable and is not involved in host specificity, aggressiveness or pathogenicity. It is very likely that a few virulence factors (on plasmids or genomic islands) were acquired by this strain through horizontal gene transfer from other E. coli or related (pathogenic) enterobacteria. I am not surprised that these strains show mlst-types which are known already for a long time. Anyway, It is very clear that “new” highly pathogenic variants of a single species may arise easily as a result of genome flexibility, recombination and/or DNA uptake by (pathogenic) species

The site is currently under maintenance. New comments have been disabled during this time, please check back soon.