Interspecies recombination and its effect on species assignments based on a single gene sequence. The relatedness among isolates of three species is inferred from a tree constructed using the sequences of a single house-keeping gene. Isolates of species A are well resolved from those of species B, and from the strain of the more distantly related species C used an outgroup. Consider a homologous recombinational event that occurs in a strain of species B (arrow), replacing the single locus used to assign the species with the corresponding sequence from a strain of a relatively divergent species. Now, the strain will not be recognized as a strain of species B and will be incorrectly assigned (dotted line) as more distantly related to species B than the outgroup.

Resolving populations of B. pseudomallei, B. mallei and B. thailandensis. All of the isolates in the B. pseudomallei MLST database (which includes isolates of closely related species) were extracted and the sequences at the seven MLST loci were concatenated for each different multilocus genotype (strain) and a tree was constructed using MrBayes v. 3.1. The dataset included 400 different strains (STs) of B. pseudomallei, 17 of B. thailandensis, and two each of B. mallei and B. oklahomensis. The scale shows genetic distance, corrected for the best-fitting substitution model determined using MrModeltest and MrBayes. All nucleotide sites were used in the analysis. A general time reversible model was implemented with rate matrix r(A↔C) 0.012: r(A↔G) 0.419: r(A↔T) 0.020: r(C↔G) 0.024: r(C↔T) 0.509: r(G↔T) 0.016; nucleotide frequencies A 0.18: C 0.35: G 0.32: T 0.15 and gamma parameter α=0.11. Pinvar=0.82. All trees and model parameters are based on 10 000 samples from the posterior probability at stationarity.

The other human Neisseria species (e.g. N. lactamica) are all colonizers of the nasopharynx and are considered to be non-pathogeniccommensals, although some have occasionally been associated with disease. These named commensal species therefore may coexistalong with N. meningitidis within an individual human nasopharynx, providing opportunities for recombination within and between the closely related species. The publicNeisseria MLST database (http://pubmlst.org/neisseria) contains the sequences of seven house-keeping genes from many thousand strains of N. meningitidis and much smaller numbers of N. lactamica and N. gonorrhoeae strains. We extracted from the public Neisseria MLST database the sequences of the seven MLST loci of 500 different strains of N. meningitidis, 171 strains of N. lactamica and 67 strains of N. gonorrhoeae, and used the concatenated sequences of the seven loci to explore the patterns of clustering and to examine the relationships between the observed clusters and the species names assigned by standard microbiological procedures (Hanage et al. 2005a).

Resolving populations of N. meningitidis, N. meningitidis and N. gonorrhoeae. Bayesian tree constructed using the concatenated sequences (seven loci) of the first 500 different strains (STs) of N. meningitidis in the public Neisseria MLST database, all different strains of N. lactamica (171) and N. gonorrhoeae (67). The arrow shows the two strains of N. lactamica that cluster anomalously and have probably been incorrectly identified (see text). Only third codon positions were used in the analysis. The scale shows genetic distance, corrected for the best-fitting substitution model determined using MrModeltest and MrBayes. Details as in figure 2 with rate matrix r(A↔C) 0.044: r(A↔G) 0.541: r(A↔T) 0.018: r(C↔G) 0.044: r(C↔T) 0.299: r(G↔T) 0.053; nucleotide frequencies A 0.11: C 0.44: G 0.24: T 0.21 and gamma parameter α=0.481. Pinvar=0.30.

Recombining species that colonize the same body site can therefore be resolved using MLSA, but recombination between similar species can lead to strains ‘creeping’ along the branch that separates the species clusters. If we knewnothing about these strains, it would not be clear where we should put the dividingline between the N. meningitidis and N. lactamica clusters. Recombining species can therefore appearfuzzy and it may be difficult using MLSA to unambiguously assign a few strains to one species rather than the other. Furthermore, it would not be clear that the N. gonorrhoeae cluster should be considered to be distinct from N. meningitidis. These results stress the need for MLSA to be used as the basis for pragmatic decisions by expert groups about where to drawdistinctions between species, and the need to maponto the observed patterns of clustering whatever additional information is available. In this way, the special nature of the strains within the N. gonorrhoeae cluster would be very apparent from their different ecologicalniche and disease association.

A tree constructed from the concatenated sequences (six loci) from 39 different serotypable strains of S. pneumoniae, representing the diversity among more than 2000 different strains in the pneumococcal MLST database (http://spneumoniae.mlst.net), and 121 atypical pneumococci showed a clear resolution into two clusters (Hanage et al. 2005b). One cluster includes all the serotypable pneumococci and a subset of the atypical pneumococci, and another cluster includes the remaining atypical pneumococci. The former class of atypical pneumococci are almost certainly pneumococci that for variousreasons are not expressing a capsular polysaccharide, whereas the latter group appear to be a very closely related but distinct population (Hanage et al. 2005b). As in the Neisseria example, there was a good separation of the clusters (100% posterior probability), but some ‘fuzziness’ as one non-serotypable strain arose from the branch separating the two clusters (figure 3). Recently, Arbique et al. (2004) have also concluded (using DNA–DNA hybridization) that a subset of atypical pneumococci should be assigned to a different species, Streptococcus pseudopneumoniae. MLSA of the two reference strains of this new species obtained from these authors shows that S. pseudopneumoniaecorresponds to the cluster in figure 4 that is similar to, but distinct from, authentic pneumococci.

Figure 4

Resolving populations of S. pneumoniae, S. pseudopneumoniae, S. mitis and S. oralis. Bayesian tree constructed using the concatenated sequences of six of the MLST loci of the authentic pneumococci and atypical pneumococci (now called S. pseudopneumoniae; Arbique et al. 2004) studied by Hanage et al. (2005b), and strains assigned as S. mitis and S. oralis. NT26 is a non-serotypable presumptive pneumococcus that arises from the branch leading to the S. pneumoniae cluster. The scale shows genetic distance, corrected for the best-fitting substitution model determined using MrModeltest and MrBayes. All nucleotide sites were used in the analysis. Details as in figure 2 with rate matrix r(A↔C) 0.016: r(A↔G) 0.027: r(A↔T) 0.010: r(C↔G) 0.001: r(C↔T) 0.939: r(G↔T) 0.007; nucleotide frequencies A 0.31: C 0.18: G 0.24: T 0.27 and gamma parameter with a covarion model allowing rates to change across the tree s(off→on)=0.33 and s (on→off)=1.33.

AlongsideS. pneumoniae and S. pseudopneumoniae, within the mitis group of streptococci, are the closely related named species, Streptococcus mitis and Streptococcus oralis. Isolates identified by API RapidID 32 strep as either S. mitis or S. oralis, when subjected to MLSA, fell into two clusters that were distinct from each other and from the other two species (figure 4). Each of these two clusters included strains identified as both species, presumably due to limitations in the API tests to identify them correctly. The names shown in figure 4reflect the predominant species identification of the strains within the clusters, enabling us to define one as associated mainly with S. mitis strains and another as containing the majority of those identified as S. oralis. With the exception of one strain, all the ‘S. oralis’ strains were grouped together in 100% of trees drawn from the posterior probability at stationarity. Likewise, the ‘S. mitis’ group is found with 100% posterior probability, and is closely allied to the S. pneumoniae and S. pseudopneumoniae clusters. The topology of the four mitis group clusters differed, with substantially more diversity within the S. mitis and S. oralis clusters (average sequence diversity of 5.1 and 6.2%, respectively) than those of S. pneumoniae and S. pseudopneumoniae (average diversity of 1.1 and 3.0%). Furthermore, the average sequence divergence between the S. mitis and S. pneumoniae clusters was 5.8%, only slightly greater than that within the S. mitis cluster. It should be noted that this is not necessarily evident in the tree shown in figure 4 as a result of the best-fitting model of nucleotide substitution implementedacross the tree. However, the diversity within the S. mitis and S. oralis clusters is further shown by the fact that each isolate examined had a different multilocus genotype. As in the Neisseria example, the individual gene trees completely fail to resolve the streptococcal species clusters identified using the concatenated sequences (figure 5).

Figure 5

Failure of single loci to resolve S. pneumoniae and related species. The individual gene trees (minimum evolution; all nucleotide sites) for three of the MLST loci used to produce figure 4. Sequences are coloured according to the species cluster in which they are present, as shown in figure 4.

::

7. Clusters: lineages or species?

::

In the two examples discussed in the previoussection, MLSA resolves clusters that have a clear relationship to named species, even though recombination between the strains in different clusters is very apparent from the inspection of the trees obtained from the sequences of individual loci. Even in recombining populations that co-colonize the nasopharynx, clusters can be resolved using MLSA, which suggests that evolutionary forces have led to distinct non-overlapping genotypic clusters that the microbiologists have recognized as species. It remains to be seen whether greatlyexpanding the number of strains used in these analyses will maintain the resolution between the existing clusters and also whether including many examples of the less well-studied members of each genus will resolve clusters that support current species designations or will suggest new ones. Expanding the datasets can also changeour current views of the clusters. For example, the identification of a group of strains that are much more similar to one of the S. oralis strains than to any of the others would produce a cluster of strains that are similar to each other and are clearly resolved from the other S. oralis strains. The phenotypic, biochemical and ecological properties of any such cluster can then be examined to see if a new species name is justified.

A major problem with the sequence-based approach to the definition of species is deciding whether resolved clusters should be considered to be different lineages within a species or deserve to be assigned species status. In the aforementioned examples, involving very well-studied groups of bacteria, clusters were correlated with the previous species designations; to be a useful taxonomic approach, MLSA needs to be capable of informing the division of large populations of poorly studied bacteria into species. Sequence clusters exist at all taxonomic levels, from the clusters of very similar genotypes that result from the diversification of a clone into a clonal complex (Feil et al. 2004) to deeper clusters that may be assigned as lineages within a species or as separate species.

With the MLSA approach, it may be difficult to decide the taxonomic level of a cluster and to distinguish those clusters that appear to be irreversibly set on different evolutionary trajectories, and which will continue to diverge from each other, from those that are best considered as different lineages of a single species. Generalcriteria for recognizing the nature of clusters (e.g. the sharing or non-sharing of alleles, or the presence of fixedpolymorphisms in different clusters) are probably not achievable for bacteria in which recombination may be very frequent or very rare. In the former case, allele sharing may still occur in clearly distinct species due to interspecies recombination, whereas if recombination is absent, allele sharing between distinct species or divergent lineages within a species is unlikely, raisingquestions of whether each clonal lineage should be considered genotypically to form its own species. These difficulties are not inherentweaknesses of the MLSA approach, but rather inherent problems in findinguniversalprinciples of species definition in the bacteria. The advantage of the MLSA approach is that it is pragmatic, rather than based on strict rules, and allows clusters to be identified and used as the basis for informedjudgements on nomenclature, taking account of whatever additional data are available.

::

8. Species assignment on the Internet: electronic taxonomy

::

The MLSA approach appears to be a fruitful way forward, but to define the limits of clusters it requires the analysis of large populations of strains that cover the diversity within the genus (or part of the genus) of interest. The approach is ideal for collaborative groups with common interests, which can deposit their sequence data in a single database, along with strain characteristics, and can use their experience and knowledge of the genus to interpret the observed patterns of clustering of multilocus genotypes and to derive a consensus view of which clusters deserve species names. This process would take account of whatever additional phenotypic, biochemical, genomic, biogeographical or ecological information is available. Once the initial database has been established, and the clustering patterns have been used to guide the assignment of species, the concatenated sequence from any new strain can easily be compared via the Internet with a reference set that covers the diversity within each species cluster, to identify the cluster into which the strain falls and to assign its species or sub-species name. This facility is already available online at some of the MLST databases at http://www.mlst.net, for example to distinguish S. pneumoniae from S. pseudopneumoniae and B. pseudomallei from B. thailandensis.

New methods need to be accepted and used by taxonomists. The suggestion that MLSA may require the initial analysis of a thousand strains in order to uncover the patterns of clustering within a single genus, or even a part of a genus, and to assign species names, may appear daunting. The experience gained in the development of MLST for the characterization of isolates of bacterial pathogens provides encouragement, as three of the MLST databases now contain over 3000 isolates and several others contain well over a thousand. These databases have been built-up as collaborative ventures by academic microbiology, clinical microbiology and public healthlaboratories, many of which had no prior experience of sequence-based approaches, which share an interest in these important pathogens and which submit their data to the databases. The databases are large because MLST has proved its worth and has provided a gold standard for the precise and unambiguous characterization of strains of these pathogens. If the MLSA approach is similarly shown to be valuable, providing a much improved way of defining species within a genus, and of assigning new strains to species electronically via the Internet, there is littledoubt that large curated databases can be developed for genera where there is sufficient interest. The greatstrength of the approach is that sequence-based taxonomy allows collaboration among laboratories with common interests and a pragmatic and consensual approach to defining species.

redgrey

9. Concluding remarks

Sequence clusters are not of course necessarily species, but whatever other characteristic strains assigned to the same species should share, they should possess house-keeping genes that have similar sequences, and which are on an average more closely related to those from the same species than they are to those of other species (Ward 1998). We argue that clustering patterns should be the basis for defining species, by looking for natural discontinuities in the distribution of related genotypes, which may then be further elaborated upon, throughclassical phenotypic approaches. There may be situations in which large groups of related bacteria fail to form discrete genotypic clusters, and in such cases it is very unlikely that other taxonomic methods will giveconsistent species groups. If such situations occur it would seem likely that they would be within groups of bacteria, where there are continuingdisagreements about their taxonomy.

The MLSA approach needs to be tested using large sets of strains that are considered to fall into a number of closely related species, so that the clustering patterns can be related to other information about the strains. This approach should be applied to sets of strains of related species that have high and lowrates of recombination as different patterns of clustering are expected (Palys et al. 1997). The patterns of clustering need to be related to species designations obtained by the current polyphasic approach (Vandamme et al. 1996), and where they differ, the validity of the species divisionsproposed by both the methods needs to be assessed, withoutassuming that new methods are only valid if they resolve the species divisions established with older methods.

The division of genera into species, and the development of simple phenotypic tests to recognize each species, has worked well in many cases; but in other cases, there have been constant taxonomic revisions and heateddebate, which may reflect an underlyinglack of clear species boundaries or methodologicalinadequacies. Single genes or single phenotypic tests are inadequate for reliable species identification in genera where recombination between species is relatively frequent and addsweight to the need for polyphasic approaches in taxonomy (Vandamme et al. 1996). The analysis of clustering patterns in well-studied genera should lead to guidelines in how best to apply the MLSA approach to define species within groups of similar bacteria that have not been well studied. In such cases, the identification of resolved clusters provides a basis for a search for biologicalcorrelates of the clusters, in terms of phenotypic, biochemical or ecological differences, or biogeography (Gevers et al. 2005).

A distinguishing phenotypic difference is required for the acceptance of a new species. If such differences are not found, groups of similar bacteria that appear to be genetically distinct have to be described by other terms (e.g. genomospecies). The requirement for a distinguishing phenotypic difference hinders the assignment of new species and needs to be reconsidered, as promiscuous recombination between closely related bacteria may prevent the identification of a single defining phenotypic difference, and the requirement for a phenotypic difference can be challenged if isolates can in future be identified online by using multiple house-keeping sequences to assign them to accepted species clusters by interrogation of a MLSA database.

One serious drawback, shared with other taxonomic approaches, is that MLSA cannot at present be applied to unculturable organisms, which are generally considered to be more numerous than those that can be cultivated (Rappe & Giovannoni 2003). However, defining the properties of culturable populations, and how they relate to differences in their 16S rRNA sequences, is only one way in which MLSA could helpplace this field on a more securefoundation.

redgrey

References

Accuracy of phenotypic and genotypic testing for identification of Streptococcus pneumoniae and description of Streptococcus pseudopneumoniae sp. nov

J.C Arbique

J. Clin. Microbiol, 2004

Multilocus sequence typing scheme that provides both species and strain differentiation for the Burkholderia cepacia complex

The medical information provided on this website is of a general nature and can not substitute for the advice of a medical professional
(for example, a qualified doctor/physician)! Information from the internet could and should NOT be used to offer or render a medical opinion or otherwise
engage in the practice of medicine.