Keywords:

methylotrophmethanolOM43marinebacterioplanktonMethylophilaceae

Strain HIMB624 is a planktonic marine bacterium within the family Methylophilaceae of the class Betaproteobacteria isolated from coastal seawater of Oahu, Hawaii. This strain is of interest because it is one of few known isolates from an abundant clade of Betaproteobacteria found in cultivation-independent studies of coastal seawater and freshwater environments around the globe, known as OM43. Here we describe some preliminary features of the organism, draft genome sequence and annotation, and comparative genomic analysis with one other sequenced member of this clade (strain HTCC2181). The 1,333,209 bp genome of strain HIMB624 is arranged in a single scaffold containing four contigs, and contains 1,381 protein encoding genes and 39 RNA genes.

Introduction

Strain HIMB624 was isolated from surface seawater of Kaneohe Bay, a subtropical bay on the northeastern shore of Oahu, Hawaii, via dilution to extinction culturing methods [1,2]. This strain is of interest because it belongs to a globally ubiquitous clade of aquatic bacterioplankton known as OM43, within the obligately methylotrophic family Methylophilaceae of the class Betaproteobacteria. The OM43 lineage was first described in 1997 from a 16S rRNA gene survey of coastal bacterioplankton from the Atlantic coast of the United States [3], and the first published report describing the isolation of OM43 strains via modified extinction to dilution culturing methods was reported in 2002 [1]. Recently, the genome sequence of a member of the OM43 lineage was reported for a strain isolated from the Pacific coast of the United States (HTCC2181) [4]. Here we present a preliminary set of features for strain HIMB624 (Table 1), together with a description of the genomic sequencing and annotation, as well as a preliminary comparative analysis with the genome of strain HTCC2181.

Table 1

Classification and general features of strain HIMB624 according to the MIGS recommendations [5].

Evidence codes - IDA: Inferred from Direct Assay (first time in publication); TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [12]. If the evidence code is IDA, then the property was directly observed by one of the authors or an expert mentioned in the acknowledgements.

Classification and features

Strain HIMB624 was isolated from seawater collected off of the coast of Hawaii, USA, in the subtropical North Pacific Ocean by a high throughput, dilution-to-extinction approach [1,2]. The strain was re-grown in seawater that was sterilized by tangential flow filtration and by autoclaving. Attempts to cultivate cells on solidified seawater media or artificial seawater media (liquid or solidified) failed. However, amendment of sterile seawater with either methanol or formaldehyde increased the maximum cell density from ca. 1×106 cells ml-1 to ca. 1×107 cells ml-1.

Phylogenetic analyses based on 16S rRNA gene sequence comparisons revealed strain HIMB624 to be closely related to a large number of environmental gene clones obtained predominantly from seawater. Alignment of the HIMB624 16S rRNA gene sequence with the Silva release 104 reference database containing only high quality, aligned 16S rRNA sequences with a minimum length of 1,200 bases for Bacteria released in October 2010 (n=512,037 entries) [13], revealed 350 entries that belong to the same phylogenetic lineage within the Betaproteobacteria. Of these, only the entries from HTCC2181, HIMB624 and one other strain (AB022337) originated from cultivated isolates and all entries in the lineage were derived either from seawater, freshwater, or the marine environment. In phylogenetic analyses with taxonomically described members of the Betaproteobacteria, strains HIMB624 and HTCC2181 formed a monophyletic lineage within the family Methylophilaceae (Figure 1; 96.5% sequence similarity). The 16S rRNA gene of strain HIMB624 was most similar to the type strains of Methylophilus luteus strain Mim (94.4%) and Methylophilus flavus strain Ship (94.3%), both isolated from plants [18]; Methylophilus methylotrophus strain NCIMB 10515 (93.7%), isolated from activated sludge [19]; Methylotenera mobilis strain JLW8 (93.7%), isolated from freshwater sediment [20]; Methylobacillus flagellatus strain KT (93.5%) isolated from sewage [21]; Methylovorus mays strain C isolated from maize phyllosphere (92.5%) [22]; and Methylobacillus pratensis strain F31 (91.8%), isolated from meadow grass [23].

Figure 1

Phylogenetic tree based comparisons between 16S rRNA gene sequences from strain HIMB624, strain HTCC2181, type strains of related species within the family Methylophilaceae, and more distantly related Betaproteobacteria. Several Gammaproteobacteria and Alphaproteobacteria strains were used as outgroups. Sequence selection and alignment improvements were carried out using the ‘All-Species Living Tree’ project database [14] and the ARB software package [15]. The tree was inferred from 1,223 alignment positions using the RAxML maximum likelihood method [16]. Bootstrap support values, determined by RAxML [17], are displayed above branches if larger than 60% from 1000 replicates. The scale bar indicates substitutions per site.

In actively growing cultures, cells of strain HIMB624 are long, thin slightly curved rods between 0.1-0.3 μm wide and 0.6-1.8 μm long (Figure 2). Cells in stationary phase are spherical and approximately 0.2 μm in diameter. Strain HIMB624 can replicate in sterile unamended seawater, reaching cell densities of approximately 1×106 cells ml-1. However, in the presence of either methanol or formaldehyde, HIMB624 can achieve a significantly higher growth rate and cellular abundance, similar to the phylogenetically related strain HTCC2181 [4].

Chemotaxonomy

The fatty acid profile of strain HIMB624 was dominated by anteiso-C17:1, C14:0 and C16:0. This is similar to known obligate and restricted facultative methylotrophs within the Betaproteobacteria, which are typically dominated by anteiso-C17:1 and C16:0 [20]. All of the fatty acids detected in strain HIMB624 are either found in closely related strains or in strains isolated from marine environments. C13:02-OH was detected in HIMB624 but not in HTCC2181, and C15:1
iso G was only found in strain HTCC2181.

Genome sequencing and annotation

Genome project history

Strain HIMB624 was selected for whole genome sequencing because of its phylogenetic affiliation with a lineage (OM43) of coastal marine bacterioplankton that is common in 16S rRNA gene surveys of coastal and estuarine systems [24], but is underrepresented in culture collections [1,4]. In addition, a sister lineage is common in freshwater systems [24]. The respective genome project is deposited in the Genomes OnLine Database (GOLD) as project Gi02451, and in GenBank under the accession number ABXG00000000. A summary of the main project is given in Table 2.

Table 2

Genome sequencing project information

MIGS ID

Property

Term

MIGS-31

Finishing quality

Final draft

MIGS-28

Libraries used

Sanger (one each of 1-4 and 10-12 kbp inserts)

MIGS-29

Sequencing platforms

ABI 3730XL

MIGS-31.2

Sequencing coverage

19.78×

MIGS-30

Assemblers

Celera Assembler30

MIGS-32

Gene calling method

Glimmer

INSDC ID

13602

Genbank Date of Release

17 March 2008

GOLD ID

Gi02451

NCBI taxon ID

314607

Database: IMG

2503283018

MIGS-13

Source material identifier

HIMB624

Project relevance

environmental

Growth conditions and DNA isolation

Strain HIMB624 was grown at 27°C in 100 L of coastal Hawaii seawater sterilized by tangential flow filtration and autoclaving. Cells from liquid culture were collected on a 0.1 µm pore-sized polyethersulfone membrane filter, and DNA was isolated from the microbial biomass using a standard phenol/chloroform/isoamyl alcohol extraction protocol. A total of 74 µg of DNA was obtained.

Genome sequencing and assembly

The genome of strain HIMB624 was sequenced by the J. Craig Venter Institute (Rockville, MD) as part of the Gordon and Betty Moore Foundation Marine Microbial Genome Sequencing Project. Two genomic libraries of insert sizes of 1-4 and 10-12 kb were constructed [25]. Clones were sequenced from both ends on ABI 3730XL DNA sequencers (Applied Biosystems, Carlsbad, CA) at the JCVI Joint Technology Center to provide paired-end reads. A total of 27,957 reads with average read length of 943 bp were assembled using the Celera Assembler30, resulting in four contigs of 1,272; 146,687; 709,553 and 474,927 bp in length. Sequencing provided 19.78× coverage of the genome.

Genome annotation

The whole genome sequence was automatically annotated using the genome annotation pipeline in the Integrated Microbial Genomes Expert Review (IMG-ER) system [26]. Genes were identified using Glimmer [27]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. The tRNAScanSE tool [28] was used to find tRNA genes, whereas ribosomal RNAs were found by using the tool RNAmmer [29]. Other non-coding RNAs were identified by searching the genome for the Rfam profiles using INFERNAL (v0.81) [30]. Additional gene prediction analysis and manual functional annotation was performed within IMG-ER.

Genome properties

The genome is 1,333,209 bp long and comprises four contigs in a single scaffold, with an overall GC content of 35.37% (Table 3 and Figure 3). Of the 1,420 genes predicted, 1,381 were protein-coding genes and 39 were RNAs. The majority (83.59%) of the protein coding genes was assigned with a putative function, while the remaining genes were annotated as hypothetical proteins. The distribution of genes into COGS functional categories is presented in Table 4.

Table 3

Genome Statistics

Attribute

Value

% of totala

Genome size (bp)

1,333,209

100.00

DNA coding region (bp)

1,284,895

96.38

DNA G+C content (bp)

471,303

35.37

Total genes

1,420

100.00

RNA genes

39

2.75

Protein-coding genes

1,381

97.25

Genes with function prediction

1,187

83.59

Genes assigned to COGs

1,174

82.68

Genes assigned to Pfam domains

1,195

824.15

Genes with signal peptides

229

16.13

Genes with transmembrane helices

315

22.18

a) The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome.

Number of genes associated with the 25 general COG functional categories

Code

Value

%agea

Description

J

142

11.1

Translation

A

1

0.1

RNA processing and modification

K

47

3.7

Transcription

L

67

5.2

Replication, recombination and repair

B

1

0.1

Chromatin structure and dynamics

D

22

1.7

Cell cycle control, mitosis and meiosis

Y

0

0

Nuclear structure

V

8

0.6

Defense mechanisms

T

27

2.1

Signal transduction mechanisms

M

110

8.6

Cell wall/membrane biogenesis

N

13

1.0

Cell motility

Z

0

0

Cytoskeleton

W

0

0

Extracellular structures

U

40

3.1

Intracellular trafficking and secretion

O

78

6.1

Posttranslational modification, protein turnover, chaperones

C

91

7.1

Energy production and conversion

G

63

4.9

Carbohydrate transport and metabolism

E

117

9.1

Amino acid transport and metabolism

F

44

3.4

Nucleotide transport and metabolism

H

92

7.2

Coenzyme transport and metabolism

I

40

3.1

Lipid transport and metabolism

P

45

3.5

Inorganic ion transport and metabolism

Q

16

1.3

Secondary metabolites biosynthesis, transport and catabolism

R

123

9.6

General function prediction only

S

95

7.4

Function unknown

-

246

17.3

Not in COGs

a) The total is based on the total number of protein coding genes in the annotated genome.

Insights from the Genome

Of 1,381 protein encoding genes in the genome of HIMB624, 1,135 are shared with HTCC2181, representing 82-84% of the two genomes (Figure 4). Pathways for the synthesis of all twenty amino acids are present in both strains, as well as for the synthesis of all major vitamins except B12. The family Methylophilaceae consists of obligate methylotrophs and, while HIMB624 and HTCC2181 lack genes coding for either the large (mxaF) or small (mxaI) subunit of a confirmed methanol dehydrogenase, both organisms appear to have genes coding for a related analog of mxaF, known as xoxF. Methanol dehydrogenase activity of this paralog has been questioned for some time (see [4] and references therein), but current evidence suggests that the xoxF genes in these organisms code for a large subunit having methanol dehydrogenase activity [4]. The xoxF gene in HIMB624 is 87.4% similar in protein sequence to the xoxF gene in HTCC2181. Strains HTCC2181 and HIMB624 also have many of the other subunits required to form a methanol dehydrogenase holoenzyme including mxaA,C,D,E,G,J,K,R,L and S, and operons pqqBCDEFG. Neither strain possesses genes coding for the E1 subunit (sucA, EC:1.2.4.2) of the α-ketoglutarate dehydrogenase complex, though they do appear to possess the E2 subunit (sucB, EC: 2.3.1.61). Both subunits are required to complete the tricarboxylic acid (TCA) cycle, and the absence of the E1 subunit suggests that these strains are obligate methylotrophs.

The genomes of HIMB624 and HTCC2181 were compared to two closely related species within the family Methylophilaceae whose whole genomes are publicly available: Methylotenera mobilis (NC_012968) and Methylovorus glucosotrophus SIP3-4 (NC_012969, NC_012970, NC_012972). For this comparison only, the four strains were automatically annotated using the RAST annotation server [31] and protein sequences were compared using the sequence based analysis tool in order to identify all shared and unique gene combinations (Figure 4). In addition to a single large chromosome, Methylovorus glucosotrophus SIP3-4 has 2 plasmids, while the remaining three genomes are all single chromosomes only. Strain HIMB624 contains one gene for a Type 4 fimbrial assembly/ATPase PilB that shares 43.44% protein identity with a gene located on one of the plasmids of Methylovorus glucosotrophus SIP3-4, and strain HTCC2181 contains a single DNA methylase gene that shares 31.1% protein identity with the same plasmid. Other than these, all genes located on the plasmids are exclusive to Methylovorus glucosotrophus SIP3-4, and the large majority of the genes on the plasmids are hypothetical proteins. The genomes of Methylotenera mobilis and Methylovorus glucosotrophus SIP3-4 share over 100 genes associated with motility (twitching, flagella related, pili), along with 13 genes for chemotaxis and 13 genes for secretion that are absent from the genomes of HIMB624 and HTCC2181, while the two smaller genomes have a higher percentage of their genomes (9.13% and 9.19%) dedicated to amino acid transport and metabolism than Methylovorus glucosotrophus SIP3-4 (6.76%) and Methylotenera mobilis (5.81%); and a higher percentage of translation, ribosomal structure and biogenesis genes (11.08% and 11.47%) than Methylovorus glucosotrophus SIP3-4 (6.12%) and Methylotenera mobilis (7.16%). Due to the small size of the two OM43 lineage genomes, the higher percentages result in a similar total number of genes between all genomes in these categories, at approximately 120 genes for amino acid transport and metabolism and approximately 140 genes for translation, ribosomal structure and biogenesis. The general distribution of genes in all other predicted COG categories are comparable between the four strains, resulting in smaller numbers of total genes in each COG category for the two members of the OM43 lineage due to their comparatively smaller genome sizes.

Declarations

Acknowledgments

The authors thank Cornelia Schmidt for her work in isolating strain HIMB624, and the Gordon and Betty Moore Foundation, which funded sequencing of this genome through its Marine Microbial Sequencing Project. The authors also thank the J. Craig Venter Institute for performing the sequencing and assembly, Tina Carvhalo for assistance with electron microscopy, and Steve Giovannoni and H. James Tripp for useful discussion. We also thank Steve Giovannoni for providing access to annotation and analysis tools through the Oregon State University Center for Genome Research and Biocomputing. This research was supported by National Science Foundation Grant DEB-0207085, and NSF Science and Technology Center Award
EF-0424599. This is SOEST contribution 8496 and HIMB contribution 1467.