Materials and Methods

The genome sequence data were uploaded to the Type (Strain) Genome
Server (TYGS), a free bioinformatics platform available under
https://tygs.dsmz.de, for a whole genome-based taxonomic
analysis [1]. The results were provided by the TYGS on
2019-10-02. In brief, the TYGS
analysis was subdivided into the following steps:

Determination of closely related type strains

Determination of closest type strain genomes was done in two complementary
ways: First, all user genomes were compared against all type strain
genomes available in the TYGS database via the MASH algorithm, a fast
approximation of intergenomic relatedness [2], and, the ten type
strains with the smallest MASH distances chosen per user genome.
Second, an additional set of ten closely related type strains was
determined via the 16S rDNA gene sequences. These were extracted from the
user genomes using RNAmmer [3] and each sequence was subsequently
BLASTed [4] against the 16S rDNA gene sequence of each of the
currently 11819 type strains available in the TYGS database. This
was used as a proxy to find the best 50 matching type
strains (according to the bitscore) for each user genome and to
subsequently calculate precise distances using the Genome BLAST
Distance Phylogeny approach (GBDP) under the algorithm 'coverage' and
distance formula d5 [5]. These distances were
finally used to determine the 10 closest type
strain genomes for each of the user genomes.

Pairwise comparison of genome sequences

All pairwise comparisons among the set of genomes were
conducted using GBDP and accurate intergenomic distances inferred under the
algorithm 'trimming' and distance formula d5 [5].
100 distance replicates were calculated each.
Digital DDH values and confidence intervals were calculated using the
recommended settings of the GGDC 2.1 [5].

Phylogenetic inference

The resulting intergenomic distances were used to infer a balanced
minimum evolution tree with branch support via FASTME 2.1.4 including
SPR postprocessing [6]. Branch support was inferred from 100
pseudo-bootstrap replicates each. The trees were rooted at the midpoint
[7] and visualized with PhyD3 [8].

Type-based species and subspecies clustering

The type-based species clustering using a 70% dDDH radius around each of
the 30 type strains was done as previously
described [1]. The resulting groups are shown in Table 1 and 4.
Subspecies clustering was done using a 79% dDDH threshold as previously
introduced [9].

Results

Type-based species and subspecies clustering

The resulting species and subspecies clusters are listed in Table
4, whereas the taxonomic identification of the query strains is found
in Table 1. Briefly, the clustering yielded 20 species
clusters and the provided query strains were assigned
to 6 of these. Moreover, user strains were
located in 8 of
22 subspecies clusters.

Figure caption genome tree

Figure 1. Tree inferred with FastME 2.1.6.1 [6] from GBDP
distances calculated from genome sequences. The branch lengths are scaled
in terms of GBDP distance formula d5. The numbers above
branches are GBDP pseudo-bootstrap support values > 60 % from 100
replications, with an average branch support of 66.4 %.
The tree was rooted at the midpoint [7].

Figure caption SSU tree

Figure 2. Tree inferred with FastME 2.1.6.1 [6] from GBDP
distances calculated from 16S rDNA gene sequences. The branch lengths are
scaled in terms of GBDP distance formula d5. The numbers
above branches are GBDP pseudo-bootstrap support values > 60 % from 100
replications, with an average branch support of 62.6 %.
The tree was rooted at the midpoint [7].