Here is one final post about our recent paper on rhizobium population genomics (1). For me, it marks an important milestone in my long investigation into the diversity of Rhizobium leguminosarum, which began thirty years ago with the publication of my very first paper on rhizobial diversity (2). Curiously, that paper raised some of the issues that are still being addressed today, as it showed that isolates of symbiovars viciae and trifolii shared a number of distinct chromosomal genotypes. Of course, the tools available at the time allowed only a very blurred picture – it is wonderful to return to the story with the clarity of genome sequencing. If you wonder what I have been doing in the intervening thirty years, you will find a complete list of my publications on Google Scholar. (I recommend that everyone gets a Google Scholar profile – it is such a useful way to keep track of your publications – and those of others.)

Back in 1985, publishing a paper involved massive sheaves of typescript, with glossy photos and hand-inked drawings for the figures. Months after acceptance, your article would arrive in the library in a paper issue of the journal. In the weeks and months after that, you would send out paper reprints in response to requests on postcards and airmail letters (those from India always smelt of curry). Finally, after a year or two, people would start to cite your paper in their own work.

Now, the pace of publication has accelerated, like al aspects of life, as a result of the internet. Our latest paper was accepted a week before Christmas and the final version was published online just two weeks into the new year. One great thing about online publication is almost instant gratification, because you can immediately see how much interest a publication is generating. The “Info & Metrics” tab on the article’s web page tells me that, since publication on 14 January, our article’s abstract has been viewed 552 times, the full text 1160 times, and the PDF has been downloaded 192 times. Not bad for less than two weeks. I hope all those readers are going to cite our paper!

The article also scores 19 in Altmetrics. This is not something I was really familiar with, but it is based on interest in social media. The article has been blogged once (not including this blog), tweeted by 15 people, and featured on 1 Facebook page. So, thank you to all the rhizobium fans who have tweeted us to stardom! Apparently, an Altmetric of 19 already puts the article in the top 5% of all articles.

Of course, attention on social media does not measure scientific quality, but this did alert me to a couple of interesting web pages. SciGuru provides short commentaries on interesting recent papers covering all aspects of science. They headlined with “How bacteria are like smartphones”, picking up on an analogy that I used in the press release that I wrote for our university. A Facebook community called Microbiology also featured us on 15 January. One more news item that did not make it into Altmetrics was in an online newspaper called The Speaker.

I don’t usually think of writing a press release when I publish a paper, but in this case I thought the work could be of wider interest, and I was also worried that Open Biology is a relatively new journal, and not yet widely associated with microbiology, so people might miss it. The wider attention came largely through two popular analogies – the idea that bacteria are all indviduals with “personalities”, and that they achieve this by acting like smartphones. Each phone comes out of the factory with standard hardware and operating system (core genome), but gains a unique combination of capabilities through apps (accessory genes) downloaded through the internet (by horizontal gene transfer).

A gimmick, perhaps, but rhizobia do not get enough attention from the rest of the world, so sometimes we rhizobiologists need to wave our arms a bit.

First, I should note that our paper has been covered in an article in the online newspaper, The Speaker.

Phenotype: the problem with polyphasic taxonomy

Ganesh Lad grew our 72 strains on Biolog plates to test their ability to grow on each of 95 substrates. He found that almost every strain had a unique pattern, just as we had found with gene content. As a geneticist, I naturally believe that such phenotypes are determined by genes. We were able to prove that connection for one substrate, gamma-hydroxybutyrate, but it was not easy to map phenotypes to genes more generally. There were no patterns of substrate utilisation that were characteristic of particular genospecies – by contrast, there were some accessory genes that were found only in one of the genospecies, so there might be some genospecies-specific phenotypes, but they were not among the “easy” ones that we looked at.

Many of these “easy” phenotypes are the ones that taxonomists conventionally report when describing a new species. When we looked up the species description for Rhizobium leguminosarum, we found a long list of substrates that this species can, or cannot grow on. However, when we cross-checked this list against our data for the same substrates, we found that the majority of these assertions were actually wrong – they were disobeyed by some of our strains, even though we have good evidence that they do belong to this species, as normally defined. It seems clear that the tables of differences between species in substrate utilisation, beloved of taxonomists, are largely a fiction based on an inadequate sampling of the variation within species. Fifty years ago, phenotypes were all that bacteriologists had to go on when trying to classify bacteria, but nobody should rely on these metabolic phenotypes to identify bacteria these days, when DNA sequence is so much easier and more conclusive. Ernesto Ormeño-Orrillo and Esperanza Martínez-Romero (2013) have already made a similar point, and I agree with them.

Now that we know so much more about bacterial genomes, thanks to thousands of genome sequences, we can propose a bacterial classification that reflects the reality of core and accessory genomes. The sequences of core genes provide a robust phylogeny that allows stable species to be defined and recognised, even if strains differ in phenotype. On the other hand, the possession of important phenotypes, conferred by clusters of accessory genes, should play no part in defining species, but can be recognised as “biovars”. These phenotypes include pathogenesis, metabolic traits, or – in the special case of rhizobia – symbiotic capabilities. They are the “apps” that bacteria download from the microbial internet, and they determine most of the “interesting” things that bacteria do.

Gene content and gene transfer

The 72 strains that we sequenced are all unique. A small part of their individuality stems from allelic variation in core genes. Although core genes do not appear to recombine between genospecies very often, they certainly experience a lot of recombination within the genospecies. Nitin Kumar demonstrated that by showing that most core genes have phylogenies that are significantly different from the consensus, and by using the ClonalFrame software to quantify the effect of recombination on core genes.

A more important part of the individuality of strains is conferred by the accessory genome: almost every strain had a unique set of genes, differing from its nearest relative by at least one cluster of five or more adjacent genes. All these strains were collected from one square metre, and sometimes even from separate nodules on the same plant. This implies that the gain and loss of accessory genes occurs very often. A nodule is most often founded by a single rhizobial lineage. When the nodule senesces and releases its bacteria, we assume that they are still more or less clonal (has anybody tested that?). By the time they form nodules of their own, though, these bacteria are likely to have shed some of their genes, or gained new ones from a donor, so that they have clearly diverged from each other.

A reference genome like that of 3841 is of limited use when exploring the accessory gene pool of a population. Nitin looked at all the contigs that could be assembled from the 72 genomes but had no similarity to sequences in 3841. He found 13,252 putative complete genes in addition to those that were in 3841 – more than twice the typical total number of genes in any strain! When considering the whole population, the accessory genome is much larger than the core genome. A few years ago, the concept of a species “pangenome” was popular. This comprised all the core and accessory genes found in a bacterial species. As long as only a few strains were sequenced, this was manageable, but as more and more genomes became available, the number of accessory genes in most species just seemed to grow without limit – an “open pangenome”. Every new genome contributed new genes, just as we are seeing in R. leguminosarum. A species seems to sample very widely from the pool of genes available to bacteria in general. The species pangenome concept does not seem very useful if it just means “all the genes there are in bacteria”.

For those interested in Rhizobium, the first point is that, although we only looked in one square metre, we found five genetic clusters within our R. leguminosarum population that were sufficiently distinct in sequence that each could be described as a new species. Furthermore, these genospecies, and a small number of others, can also be identified in locations around the world. All of these fall within R. leguminosarum as presently defined, and are definitely more similar to each other than to related species such as R. pisi, R. fabae, R. phaseoli or R. etli. On the other hand, the recently described R. laguerreae seems much closer, though it is not possible to match it with any one of these genospecies on the basis of the very limited sequence information that is currently available for it. At the moment, our five genospecies have no distinguishing phenotypic features, so traditional taxonomists are unlikely to let us describe them as formal species. I wouldn’t want to, anyway – it would just create unnecessary complications for people who wanted to use R. leguminosarum to do some real science.

Another interesting point is that a large part of the extrachromosomal genome seems to have little mobility between the genospecies. This is not just true of the two chromids (chromosome-like plasmids) that are homologous to pRL12 and pRL11, but also of the genes that occur on the smaller pRL10 and pRL9. Incidentally, a search for the repABC genes reveals that all 72 isolates have replicons equivalent to pRL12 and pRL11, and nearly all have one like pRL10. The other replicons of strain 3841 are relatively rare in our population, though all of them are found. Much more frequent is the replicon of the pR132503 plasmid of strain WSM 1325. This information on plasmids is not in the paper – it comes from an extensive analysis in the PhD thesis of Nayoung Kim, who has now returned to a bioinformatics job in Korea.

A study of 72 genome sequences is bound to reveal a lot of things, and there are plenty more points I would like to draw to your attention, but I will save them for future posts.

Today sees the publication of a paper that I regard as a landmark in my decades-long study of Rhizobium leguminosarum. The paper ‘Bacterial genospecies that are not ecologically coherent: population genomics of Rhizobium leguminosarum’ Open Biol. 5: 140133 is published in the Royal Society’s journal Open Biology rsob.royalsocietypublishing.org and descibes a population genomics study that changes our perception of the species. It also illustrates a changing view of bacterial diversity in general, and that is what I have emphasised in the press release that our university is issuing, which I have copied below. Readers of this blog will be more interested in the detailed picture of a rhizobium population that it reveals. The paper is Open Access, so you can read it here:

Bacteria are as individual as people, according to new research by Professor Peter Young and his team in the Department of Biology at the University of York. Bacteria are essential to health, agriculture and the environment, and new research tools are starting to shed more light on them.

The York team dug up a square metre of roadside verge on the University campus in search of a bacterium called Rhizobium leguminosarum. The name means “root dweller of the legumes”, and these bacteria are natural fertilizer factories that extract nitrogen from the air and make it available to peas, beans, clover and their wild relatives.

In the laboratory, the team extracted the bacteria from the plant roots and established 72 separate strains. They determined the DNA sequence of the genome of each strain. Their research, published today in Open Biology, shows that each of those 72 strains is unique –each has different genes and is capable of growing on different food sources.

People are unique because each of us inherits half our genes from our mother and half from our father, but bacteria reproduce by binary fission, making two identical daughters. What bacteria are good at, though, is passing packages of genes from one cell to another. It is this process of horizontal gene transfer that made every rhizobium unique.

“We can think of the bacterial genome as having two parts,” says Professor Young. “The core genome does the basic housekeeping and is much the same in all members of the species, while the accessory genome has packages of genes that are not essential to the operation of the cell, but can be very useful in coping with aspects of the real world.

“Bacteria are like smartphones. Each phone comes out of the factory with standard hardware and operating system (core genome), but gains a unique combination of capabilities through apps (accessory genes) downloaded through the internet (by horizontal gene transfer).”

We increasingly recognise the vital roles played by bacterial communities, such as those in our gut or on the roots of plants. Many researchers have used variation in a standard core gene to draw up lists of the species in a community, but the new research shows that a list of names is not sufficient.

“There may be 300 people called Baker in your city, but you can’t assume that there are 300 people baking bread,” explains Professor Young.

It is possible, with more sequencing effort, to look at all the genes in a bacterial community – an approach called “metagenomics” – but to understand how they are functioning we also need to know which genes occur together in the same bacterium. This new study helps us to understand the way in which bacterial genomes are assembled.

The 11th European Nitrogen Fixation Conference will be held on September 7 to 10, 2014 at Tenerife, Canary Islands, Spain. I would like to remind you that today is the deadline for early registration: the price goes up tomorrow. Today is also the deadline for offering an oral presentation.

There is already a great line-up of invited speakers for the conference – check the web site – but the full programme is not yet decided because the organisers will be choosing more speakers from the abstracts that have been offered. This is your opportunity for 15 minutes of fame, so get your abstract in today!

As usual, I am organising a satellite workshop on the genomics of N-fixers, and this will take place on Sunday 7 September before the main meeting opens. The workshop consists entirely of offered talks, and I have had plenty of offers, so the programme is already full and I can share it with you here and now.

10.40 – 11.10 Coffee

11.10 – 12.50 Session 2

Dagmar Krysciak (Hamburg, Germany): RNA-seq analysis of Sinorhizobium fredii NGR234 identifies a large set of genes linked to quorum sensing-dependent regulation in the background of a traI and ngrI deletion mutant

15.30 End of workshop

I hope you agree that this programme looks really exciting, so don’t forget to register for the Genomics Workshop at the same time as you register for the main ENFC. I look forward to seeing you in Tenerife!

No doubt many of you are planning to attend the 11th European Nitrogen Fixation Conference, which will be held in Tenerife on 7-10 September 2014. This is one of the major international conferences for rhizobium researchers, attended by people from around the world, not just Europe. I am, once again, organising a satellite meeting on “Genomics of Nitrogen-Fixing Organisms” on Sunday 7 September before the main conference begins. Similar workshop meetings at previous conferences have been very popular.

The workshop covers the analysis of genomes of N-fixing bacteria or archaea, as well as post-genomic studies such as transcriptomics and proteomics. The format is a series of short offered talks, probably 15 minutes plus 5 minutes for questions. There has been a flood of new genomes since the last workshop in 2012, so there should be plenty to talk about. I would welcome analyses that compared multiple genomes, as well as detailed studies of individual strains.

If you, or one of your colleagues, would like to give a talk at the workshop, please let me have a title and a very brief abstract (<100 words) by email to peter.young@york.ac.uk. This does not have to be your final abstract – just a few words so that I can check that your contribution is relevant and can decide where to place it in the programme. I will accommodate as many speakers as possible, but may have to be selective if we get too many offers. To have the best chance, please apply as soon as possible. The deadline is 15 May, but don’t wait till then as we may fill the programme sooner. You do not need to register before contacting me.

If you know of any other labs who are working on the genomics of N-fixers and may be interested, please let me know. Note that this workshop is restricted to the genomics of the N-fixing organisms themselves, not their symbiotic partners, and I also want to avoid any overlap with the talks already scheduled for the main meeting. However, our workshop could be a good place for postdocs or students to present more detailed aspects of the work than will be possible in the more formal setting. This can include material presented as posters in the main conference – but if your contribution is selected for oral presentation in the main sessions, we will take it out of the workshop to avoid duplication.

The workshop will be held in the morning and afternoon of Sunday 7 September, before the opening of the main conference. There will be an additional registration charge for the genomics workshop (€50, €25 for students), which will include the cost of lunch and coffee breaks, so please make sure that you sign up for this when you register for the main conference. This charge is payable by everyone who attends, including the speakers.