Abstract

Dehalogenimonas lykanthroporepellens is the type species of the genus Dehalogenimonas, which belongs to a deeply branching lineage within the phylum Chloroflexi. This strictly anaerobic, mesophilic, non spore-forming, Gram-negative staining bacterium was first isolated from chlorinated solvent contaminated groundwater at a Superfund site located near Baton Rouge, Louisiana, USA. D. lykanthroporepellens was of interest for genome sequencing for two reasons: (a) an unusual ability to couple growth with reductive dechlorination of environmentally important polychlorinated aliphatic alkanes and (b) a phylogenetic position that is distant from previously sequenced bacteria. The 1,686,510 bp circular chromosome of strain BL-DC-9T contains 1,720 predicted protein coding genes, 47 tRNA genes, a single large subunit rRNA (23S-5S) locus, and a single, orphan, small subunit rRNA (16S) locus.

Keywords:

Introduction

Strain BL-DC-9T (=JCM 15061, =ATCC BAA-1523) is the type strain of the species Dehalogenimonas lykanthroporepellens, which is the type species of the genus Dehalogenimonas [1]. At the time of publication, D. lykanthroporepellens is the only validly named species in this genus. The type strain was isolated from moderately acidic groundwater (pH=5.1) collected at a waste recovery well at the Petro-Processors of Louisiana, Inc. Superfund Site, near Baton Rouge, Louisiana (USA), in an area contaminated by high concentrations of several chlorinated alkanes and alkenes [1-3]. The strain is able to reductively dehalogenate a variety of environmentally important polychlorinated alkanes but not monochlorinated alkanes, chlorinated alkenes, or chlorinated benzenes [1,3]. Quantitative real-time PCR experiments indicate that bacteria closely related or identical to D. lykanthroporepellens are present throughout the contaminated site from which strain BL-DC-9T was first isolated [4]. In this report, we present a summary classification and a set of features for D. lykanthroporepellens BL-DC-9T together with the description of the complete genomic sequencing and annotation.

Classification and features

At present, D. lykanthroporepellens strain BL-DC-9T is phylogenetically isolated within the domain Bacteria, with no other species assigned to the genus Dehalogenimonas. On the basis of 16S rRNA gene sequences, strain BL-DC-9T clusters within the phylum Chloroflexi. Based on 16S rRNA gene sequences, the closest related type strains are Caldilinea tarbellica D1-25-10-4T [5] and Caldilinea aerophila STL-6-O1T [6], with sequence identities of 81.7% and 81.5%, respectively [7]. Aside from the closely related strain BL-DC-8 that was isolated from the same groundwater source as strain BL-DC-9T, the closest previously cultured phylogenetic relatives of strain BL-DC-9T are “Dehalococcoides” strains [1,3]. Although some variable regions of the 16S rRNA genes of D. lykanthroporepellens and “Dehalococcoides” strains are highly homologous [4], the overall identity of these genes is ~90%, indicating a distant relationship [1].

Figure 1 shows the phylogenetic neighborhood of D. lykanthroporepellens strain BL-DC-9T in a 16S rRNA gene based phylogenetic dendrogram. The sequence of the lone 16S rRNA gene copy in the genome differs from the previously published 16S rRNA gene sequence (EU679419) by a single nucleotide position.

The cells of D. lykanthroporepellens stain Gram-negative, and are non-spore forming, irregular cocci with a diameter of 0.3-0.6 µm (Figure 2). Strains of D. lykanthroporepellens were isolated in liquid medium using a dilution-to-extinction approach. Growth was not observed on agar plates or on medium solidified with gellan gum, even after long term (2 months) incubation [3]. The temperature range for growth of strain BL-DC-9T is between 20°C and 37°C with an optimum between 28°C and 34°C [3]. The pH range for growth is 6.0 to 8.0 with an optimum of 7.0 to 7.5 [3]. The organism grows in the presence of 2% (w/v) NaCl and is resistant to ampicillin and vancomycin at concentrations of 1.0 and 0.1 g/l, respectively [3].

Evidence codes - IDA: Inferred from Direct Assay (first time in publication); TAS: Traceable Author Statement (i.e. a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e. not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [14]. If the evidence code is IDA, then the property was directly observed for a living isolate by one of the authors or an expert mentioned in the acknowledgements.

Chemotaxonomy

The major cellular fatty acids of strain BL-DC-9T, as identified and quantified with the Sherlock MIS v. 6.0 system (Microbial Identification, Inc.) using the Aerobe (TSBA) and MOORE libraries, are C18:1ω9c, C16:1ω9c, C16:0, and C14:0 [1]. The same fatty acids were also present in the closely related strain BL-DC-8 [1]. Cellular fatty acids present in lower proportions include C18:0, C12:0, and the summed features listed in the MIDI Sherlock system as summed feature 5 (C18:2ω6,9c and/or anteiso-C18:0) and summed feature 3 (one or more of C16:1ω7c, C16:1ω6c, iso-C15:0
3OH) [1].

Genome sequencing and annotation

Genome project history

D. lykanthroporepellens strain BL-DC-9T was selected for sequencing on the basis of its phylogenetic position and the importance of reductive dechlorination in the field of environmental microbiology and bioremediation. A detailed understanding of the metabolic capabilities of chloroalkane-dehalogenating bacteria such as D. lykanthroporepellens has the potential to impact decision-making regarding site clean-up at thousands of DOE and non-DOE sites across the USA and around the world. D. lykanthroporepellens strain BL-DC-9T genome project is deposited in the Genomes OnLine Database [9] and the complete genome sequence is available from GenBank. Sequencing, finishing, and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2.

Table 2

Genome sequencing project information

MIGS ID

Property

Term

MIGS-31

Finishing quality

Finished

MIGS-28

Libraries used

Three genomic libraries: one 454 standard library, one 454 paired end (19.8 kb insert size), and one Illumina library

Growth conditions and DNA isolation

D. lykanthroporepellens strain BL-DC-9T (=JCM 15061, =ATCC BAA-1523) was cultured in liquid anaerobic basal medium supplemented with 1,1,2-trichloroethane as described previously [1]. Cells were harvested from 2.0 L of culture medium by centrifugation (10,000×g, 10 min, 4°C). Total DNA was extracted from the resulting cell pellet using a combination of lysozyme/SDS/proteinase K treatment, followed by purification using hexadecyltrimethyl ammonium bromide (CTAB) in conjunction with phenol-chloroform-isoamyl alcohol purification, and ethanol precipitation [15].

Genome sequencing and assembly

The genome of D. lykanthroporepellens BL-DC-9T was sequenced at the JGI using a combination of Illumina [16] and 454 technologies [17]. An Illumina GAii shotgun library with reads of 152.4 Mb, a 454 Titanium draft library with average read length of 356.7±167 bases, and a paired end 454 library with average insert size of 19,767±4,941 kb were generated for this genome. All general aspects of library construction and sequencing performed at the JGI can be found at the JGI website [18]. Illumina sequencing data was assembled with VELVET [19], and the consensus sequences were shredded into 1.5 kb overlapped fake reads and assembled together with the 454 data. Draft assemblies were based on 103.1 Mb 454 standard draft data and all of the 454 paired end data (38,136 reads that were both mapped, non-redundant). Newbler parameters are -consed -a 50 -l 350 -g -m -ml 20.

The initial Newbler assembly contained 64 contigs in 1 scaffold. The initial 454 assembly was converted into a phrap [20] assembly by making fake reads from the consensus, and collecting the read pairs in the 454 paired end library. The Phred/Phrap/Consed software package [18] was used for sequence assembly and quality assessment [21-23] during the finishing process. After the shotgun stage, reads were assembled with parallel phrap (High Performance Software, LLC). Possible mis-assemblies were corrected with gapResolution [Cliff Han, unpublished], Dupfinisher [24], or sequencing cloned bridging PCR fragments with subcloning (Epicentre Biotechnologies, Madison, WI). Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR primer walks. A total of 296 additional reactions were necessary to close gaps and to raise the quality of the finished sequence.

The error rate of the completed genome sequence is less than 1 in 100 kb. Together, the combination of the Illumina and 454 sequencing platforms provided 170 × coverage of the genome. The final assembly contained 36,976 reads.

Genome annotation

Genes were identified using Prodigal [25] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [26]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. These data sources were combined to assert a product description for each predicted protein. Non-coding genes and miscellaneous features were predicted using tRNAscan-SE [27], RNAMMer [28], Rfam [29], TMHMM [30], and signal [31]. Additional gene prediction analysis and manual functional annotation were performed within the Integrated Microbial Genomes - Expert Review (IMG-ER) platform [32].

Genome properties

The genome of D. lykanthroporepellens strain BL-DC-9T comprises a single circular chromosome of 1,686,510 bp with 50.04% G+C content (Table 3 and Figure 3). Of the 1,771 genes predicted, 1,720 were protein-coding genes and 51 were RNAs; 61 pseudogenes were also identified. The majority of the protein-coding genes (68.8%) were assigned a putative function and those remaining were annotated as hypothetical proteins. The distribution of the predicted protein coding genes into COG functional categories is presented in Table 4.

Table 3

Genome Statistics

Attribute

Value

% of Total

Genome size (bp)

1,686,510

100.00%

DNA coding region (bp)

1,479,636

87.73%

DNA G+C content (bp)

928,329

55.04%

Number of replicons

1

Extrachromosomal elements

0

Total genes

1,771

100.00%

RNA genes

51

2.65%

rRNA operons

1a

Protein-coding genes

1,720

97.33%

Pseudo genes

61

3.44%

Genes with function prediction

1,219

68.83%

Genes in paralog clusters

240

13.55%

Genes assigned to COGs

1,257

70.98%

Genes assigned Pfam domains

1,290

72.84%

Genes with signal peptides

417

23.55%

Genes with transmembrane helices

347

19.59%

CRISPR repeats

0

aThe genome contains a single large subunit rRNA (23S-5S) locus and a single, orphan, small subunit rRNA (16S) locus.

Insights from the genome sequence

Analysis of the complete genome sequence of strain BL-DC-9T and its comparison with the genomes of “Dehalococcoides” strains sequenced previously provide several insights into the evolution and adaptation of the organism to its niche. Transposon-mediated horizontal gene transfer appears to have played a major role in creating the genomic diversity and metabolic versatility in strain BL-DC-9T.

Horizontal gene transfer

Strain BL-DC-9T contains a prophage region (1,604,159 to 1,672,879 bp, ~60% GC), which accounts for ~4% of the chromosome. Of the 76 ORFs identified within this region, 45 have been annotated as encoding hypothetical proteins and a vast majority of these have no homologs in the public databases. In addition to the prophage region, manual curation indicated that ~4.3% of the genome of strain BL-DC-9T (~73,000 bp) is comprised of insertion sequence (IS) elements encoding 74 full-length or truncated transposases. These IS elements are scattered throughout the chromosome and their GC content varies from 47% to 57%. The IS elements of strain BL-DC-9T belong to the families IS256 (29 of 74), IS3/IS911 (14 of 74), IS3/IS600 (10 of 74), IS4/IS5 (7 of 74), IS4/IS5/ISMca7 (5 of 74), IS1182 (4 of 74), IS116/IS110/IS902 (2 of 74), IS204/IS1001/ISL3 (2 of 74), and IS6/ISCpe7 (1 of 74).

tRNAs and Selenocysteine utilization

The chromosome of strain BL-DC-9T contains 47 tRNA genes, including those for all 20 standard amino acids as well as the unusual amino acid selenocysteine. Proteins containing selenocysteine are found in all three domains of life and many organisms contain genes encoding the complex molecular machinery required for the incorporation of this modified amino acid during the translation process [33,34]. Strain BL-DC-9T contains an operon (selCDAB) putatively involved in selenocysteine biosynthesis. selC encodes a selenocysteine-inserting tRNA (tRNAsec), which contains the complementary UCA anticodon for the internal UGA stop codon (Dehly_R0051). A gene that is not part of this operon encodes a seryl-tRNA synthetase (Dehly_0621), which catalyzes the aminoacylation of tRNAsec with serine. selD encodes a selenophosphate synthetase (Dehly_1500), an enzyme that produces monoselenophosphate using selenide and ATP as substrates. selA encodes a selenocysteine synthase (Dehly_1501), which utilizes monoselenophosphate as the selenium donor during the conversion of serine-acylated tRNAsec into selenocysteine-tRNAsec. selB encodes a GTP-dependent selenocysteine-specific elongation factor (Dehly_1502), which forms a quaternary complex with selenocysteine-tRNAsec and the selenocysteine inserting sequence (SECIS), which is a hairpin loop found immediately downstream of the UGA codon in the selenoprotein-encoding mRNA molecule [35]. This complex ensures reading through the UGA codon and incorporation of selenocysteine, instead of termination of translation [36]. Consistent with the presence of the genes encoding the synthesis and incorporation of selenocysteine, strain BL-DC-9T also contains a gene encoding a selenocysteine-containing formate dehydrogenase (Dehly_0033). This gene has an internal in-frame UGA stop codon (574 bp from the AUG start codon), which is followed by a 48 bp putative SECIS element.

Strain BL-DC-9T contains a putative IS256 element immediately downstream of the selCDAB operon (Dehly_1503, transposase). Previous phylogenetic analysis has provided evidence for horizontal transfer of these traits [37]. The presence of an IS element adjacent to the sel genes in strain BL-DC-9T suggests horizontal transfer and explains the absence of this locus in “Dehalococcoides” strains sequenced previously [38-40].

Comparative genomics

The chromosome of strain BL-DC-9T is 216,790 bp larger than that of “Dehalococcoides ethenogenes” strain 195, which has the largest genome among “Dehalococcoides” strains sequenced to date. The difference in the size of the chromosomes of strain BL-DC-9T and “Dehalococcoides” strains is partly due to the presence of multiple IS elements and IS element-associated genes in the former. These putative horizontally transferred genes appear to have played a major role in creating genomic diversity and phenotypic variability of strain BL-DC-9T vis-à-vis “Dehalococcoides” strains.

Although the chromosomes of strain BL-DC-9T and “Dehalococcoides” strains contain similar number of rRNA and tRNA encoding genes, they differ in their GC content, gene density, and percentage of sequence that encodes proteins. The lack of synteny between the chromosomes as well as the observed differences in the general features of the genomes of strain BL-DC-9T and “Dehalococcoides” strains insinuate their highly divergent evolutionary paths. This divergent evolutionary past is further supported by a phylogenetic tree constructed based on 432 core orthologous protein encoding genes shared between D. lykanthroporepellens BL-DC-9T and “Dehalococcoides” (meta)genomes [41]. In contrast, the chromosomes of four “Dehalococcoides” strains obtained from diverse geographic regions share a conserved core (1,029 orthologous groups of protein encoding genes conserved across all four genomes, with genes generally sharing the same order, orientation and synteny) that is interrupted by two high plasticity regions, indicating comparatively recent divergence from a common ancestor [40].

BLAST comparisons of protein sets of strain BL-DC-9T and “Dehalococcoides ethenogenes” strain 195 revealed that the two strains contain ~950 protein coding genes in common (bidirectional best hits, 20-90% identity at the predicted protein level). Pairwise blast comparisons indicated that strain BL-DC-9T contains ~700 protein-coding genes with no homologs in strain 195. The latter contained ~600 protein-coding genes with no homologs in BL-DC-9T. Genome-specific genes identified in strains BL-DC-9T and 195 encoded transposases, DNA endonucleases/methylases, heterodisulfide reductases, acetyltransferases, kinases, phosphatases, and dehalogenases. Some of these strain-specific genes were found within prophage-like regions or were associated with IS elements.

Biosynthesis and transport of compatible solutes

A number of microorganisms accumulate low molecular weight organic compounds alternately referred to as osmolytes, osmoprotectants, or “compatible solutes” to convey the notion that the compounds help the microorganisms survive osmotic stress but do not interfere with metabolism [42]. Ectoine, glycine-betaine, and proline are compatible solutes of many mesophilic bacteria capable of survival at high salt concentrations [42]. Many thermophilic organisms accumulate compatible solutes, such as mannosylglycerate and di-myo-inositol phosphate, which generally do not occur in mesophilic organisms [43].

Strain BL-DC-9T contains an operon (ectABC) encoding putative homologs of the enzymes involved in ectoine biosynthesis and regulation (Dehly_1306, Dehly_1307, Dehly_1308). The closest homologs of strain BL-DC-9TectABC are found in Halomonas elongata, Wolinella succinogenes, and Desulfococcus oleovorans (48-75% identity at the predicted protein level). At least two putative transport systems for the compatible solutes proline/glycine-betaine have been identified in strain BL-DC-9T (proVWX and opuABCD). proV, proW, and proX encode an ATPase subunit (Dehly_0378), a permease protein (Dehly_0377), and a periplasmic subunit (Dehly_0376), respectively. opuA, opuB, opuC, and opuD encode a periplasmic substrate-binding protein (Dehly_0909), a permease protein (Dehly_0908), an ATPase subunit (Dehly_0907), and a permease protein (Dehly_0906), respectively. Although the permeases encoded by opuB, opuD, and proW as well as the ATPase subunits encoded by opuC and proV appear to be related to each other (34-40% identity at the predicted protein level), the periplasmic proteins encoded by opuA and proX are unrelated. The closest homologs of proVWX are found in Trichodesmium erythraeum, Marinomonas sp. MED121, and Fulvimarina pelagi (50% identity at the predicted protein level), whereas those of opuABCD are found in Pseudovibrio sp. JE062, Chromohalobacter salexigens DSM 3043, and Denitrovibrio acetiphilus DSM 12809 (44-60% identity at the predicted protein level). Strain BL-DC-9T also contains genes involved in the biosynthesis of proline (Dehly_0299, Dehly_0308). “Dehalococcoides” strains lack homologs of ectABC, proVWX, and opuABCD, but contain homologs of Dehly_0299 and Dehly_0308 (57 and 68% protein identity, respectively).

Homologs of a gene encoding a bifunctional mannosylglycerate synthase (mgsD) are found in “Dehalococcoides” strains (e.g., DET1363), an unusual occurrence for mesophilic bacteria [43]. Although the synthesis and accumulation of mannosylglycerate could not be proven to occur in “D. ethenogenes” because of insufficient biomass, the role of the bifunctional mgsD was confirmed by cloning and expression in Saccharomyces cerevisiae [43]. Comparative analysis revealed that BL-DC-9T contains a homologous gene (Dehly_0877, 54% protein identity). This expands the range of species containing genes putatively involved in the biosynthesis of compatible solutes and may offer D. lykanthroporepellens a stress response mechanism that allows growth under conditions of changing osmolarity.

Reductive dehalogenases

Genes encoding the enzymes that are involved in catalyzing the reductive dehalogenation of chlorinated solvents are organized in rdhAB operons encoding two components: a 50-65 kDa protein (RdhA) that functions as a reductive dehalogenase and a ~10 kDa hydrophobic protein with transmembrane helices (RdhB) that is thought to anchor the RdhA to the cytoplasmic membrane [44-53]. Comparative genomic analyses revealed that strain BL-DC-9T contains several loci related to rdhA and/or rdhB genes scattered throughout the chromosome. The multiple rdhA and rdhB ORFs of strain BL-DC-9T have 24-74% and 24-65% identities at the predicted protein level, respectively. The closest homologs of rdhA ORFs are found among “Dehalococcoides” strains (31-78% identity at the predicted protein level). A twin-arginine motif, with the predicted amino acid sequence SRRXFMK followed by a stretch of hydrophobic amino acids, was identified in the N-terminus of a large majority (19 of 25) of predicted RdhA sequences. Consistent with the presence of the twin-arginine sequence in the N-terminus of most of its RdhA sequences, strain BL-DC-9T contains an operon encoding proteins that constitute a putative twin-arginine translocation (TAT) system (Dehly_1346-1349). This specialized system is involved in the secretion of folded proteins across the bacterial inner membrane into the periplasmic space [54,55]. “Dehalococcoides” strains also contain an operon encoding an analogous TAT system that is partially related to the TAT system of strain BL-DC-9T (38 and 64% protein identity).

Two conserved motifs, each containing four cysteine residues, a feature associated with binding iron-sulfur clusters [56], were identified near the C-terminus of 22 of the predicted RdhA sequences of strain BL-DC-9T. The first of these motifs had a consistent number of amino acids between the cysteine residues (CX2CX2CX3C). In the second motif, there were variable numbers of intervening amino acids after the first and second cysteine residues (CX2-21CX2-7CX3C). If a “full-length” rdhA is predicted to encode a protein containing a twin-arginine sequence in the N-terminus, two iron-sulfur cluster binding motifs in the C-terminus, and an intervening sequence of ~450-500 aa, then strain BL-DC-9T contains 17 such genes. Two rdhA genes (Dehyl_0069 and 1582) appear to be truncated and are predicted to encode proteins lacking the twin-arginine sequences in their N-termini. Five rdhA genes appear to be substantially truncated and are predicted to encode proteins consisting of only the N-terminus (Dehly_0479 and 1520) or the C-terminus (Dehly_0075, 1523, and 1534).

Within strain BL-DC-9T, only six of the rdhA ORFs have a cognate rdhB, and an additional rdhB gene (Dehly_1504) appears to be an orphan with no cognate rdhA ORF nearby. The predicted RdhB sequences of strain BL-DC-9T contain two or three transmembrane helices. Similar features have been observed among the predicted RdhB sequences of “Dehalococcoides” strains [38-40]. Furthermore, the predicted RdhB sequences of strain BL-DC-9T and the “Dehalococcoides” strains share 40-72% identity. In at least seven loci (Dehly_0069, 0075, 0479, 1504, 1530, 1534, and 1541), it appears that transposon insertion has truncated one or both of the rdh genes. Interestingly, genes involved in the regulation of rdhAB operons (e.g., MarR-type or two-component transcriptional regulators) were present only in seven loci (Dehly_0121, 0274, 0479, 1148, 1355, 1530, and 1582). In addition to the genes encoding reductive dehalogenases, the strain BL-DC-9T genome contains two genes encoding putative haloacid dehalogenases (Dehly_0588, Dehly_1126) that have homologs among the “Dehalococcoides” strains (40-44% identity at the predicted protein level).

The presence of IS elements adjacent to some rdhA/rdhB loci in strain BL-DC-9T indicates their acquisition from an unknown host. Previous studies of “Dehalococcoides” strains have also suggested horizontal transfer of reductive dehalogenase genes [41,57]. It remains to be determined if strain BL-DC-9TrdhA genes lacking an rdhB ORF downstream encode functional reductive dehalogenases and whether/how they are membrane-bound. It is possible that an incognate or a non-contiguous rdhB (e.g., the orphan Dehly_1504) could complement one or more of strain BL-DC-9TrdhA genes lacking an rdhB ORF downstream. Alternatively, some of these genes may encode reductive dehalogenases that function by an unknown mechanism. An enzyme involved in the reductive dehalogenation of tetrachloroethene by Dehalospirillum multivorans was found in the cytoplasmic fraction [58], suggesting that some reductive dehalogenases are either loosely membrane-bound or soluble entities. The same may be the case for the majority of reductive dehalogenases of strain BL-DC-9T. Regardless, the repertoire of rdhA/rdhB loci identified by complete genome sequencing sets the stage for future efforts to elucidate the mechanism of reductive dehalogenation by strain BL-DC-9T and other Dehalogenimonas strains.

Declarations

Acknowledgements

The work conducted by the U.S. Department of Energy Joint Genome Institute was supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. The authors gratefully acknowledge Xiao Ying for assistance with microscopy.