Abstract

Anaerococcus senegalensis strain JC48T sp. nov. is the type strain of A. senegalensis sp. nov. a new species within the genus Anaerococcus. This strain whose genome is described here was isolated from the fecal flora of a healthy patient. A. senegalensis is an obligate anaerobic coccus. Here we describe the features of this organism together with the complete genome sequence and annotation. The 1,790,835 bp long genome (1 chromosome but no plasmid) contains 1,721 protein-coding and 53 RNA genes including 5 rRNA genes

Keywords:

Anaerococcus senegalensisgenome

Introduction

Anaerococcus senegalensis strain JC48T (= CSUR P156 = DSM25366) is the type strain of A. senegalensis sp. nov. This bacterium is a Gram-positive, anaerobic, indole-negative coccus that was isolated from the stool of a healthy Senegalese patient as part of a “culturomics” study aimed at cultivating individually all species within human feces.

Defining bacterial species is a matter of debate. This is notably due to the elevated cost and poor reproducibility and inter-laboratory comparability of the “gold standard” DNA-DNA hybridization and G+C content determination [1]. In contrast, the development of PCR and sequencing methods is now widely available and cost-effective, which profoundly changes the way Archaea, Bacteria and are classified. Using 16S rRNA sequences with internationally-validated cutoff values enabled the taxonomic classification or reclassification of hundreds of taxa [2]. More recently, high throughput genome sequencing and mass spectrometric analyses of bacteria gave unprecedented access to a wealth of genetic and proteomic information [3]. As a consequence, we propose to use a polyphasic approach [4] to describe new bacterial taxa that includes their genome sequence, MALDI-TOF spectrum and main phenotypic characteristics (habitat, Gram-stain reaction, culture and metabolic characteristics, and when applicable, pathogenicity).

The genus Anaerococcus (Ezaki et al. 2001) was created in 2001 [5]. To date, this genus, comprised of saccharolytic and butyrate-producing anaerobic and non-motile Gram-positive cocci, contains seven species including A. hydrogenalis (Ezaki et al. 1990) Ezaki et al. 2001, A. lactolyticus (Li et al. 1992) Ezaki et al. 2001, A. murdochii (Song et al. 2010), A. octavius (Murdoch et al. 1997) Ezaki et al. 2001, A. prevotii (Foubert and Douglas 1948) Ezaki et al. 2001, A. tetradius (Ezaki et al. 1983) Ezaki et al. 2001, and A. vaginalis (Li et al. 1992) Ezaki et al. 2001. Members of the genus Anaerococcus have mainly been isolated from the human vagina, but have also occasionally been identified in the nasal cavity, on the skin, and various infectious processes including ovarian, peritoneal, sacral, digital and cervical abscesses, vaginoses, bacteremias, foot ulcers, a sternal wound, and a knee arthritis [5-9]. In addition, uncultured bacteria with 16S rRNA sequences highly similar to members of the Anaerococcus genus have been detected in metagenomes from the human skin flora [10]. However, to the best of our knowledge, our report is the first to describe the isolation of a member of the genus Anaerococcus from the normal fecal flora.

Here we present a summary classification and a set of features for A. senegalensis sp. nov. strain JC48T together with the description of the complete genomic sequencing and annotation. These characteristics support the circumscription of the species A. senegalensis.

Classification and features

A stool sample was collected from a healthy 16-year-old male Senegalese volunteer patient living in Dielmo (a rural village in the Guinean-Sudanian zone in Senegal), who was included in a research protocol. The patient gave an informed and signed consent, and the agreement of the National Ethics Committee of Senegal and the local ethics committee of the IFR48 (Marseille, France) were obtained under agreement 09-022). The fecal specimen was preserved at -80°C after collection and sent to Marseille. Strain JC48 (Table 1) was isolated in June 2011 by anaerobic cultivation on 5% sheep blood-enriched Columbia agar (BioMerieux, Marcy l’Etoile, France). This strain exhibited two distinct 16S rRNA sequences, with a 97.8% nucleotide sequence similarity with A. vaginalis, the phylogenetically closest validated Anaerococcus species (Figure 1). This value was lower than the 98.7% 16S rRNA gene sequence threshold recommended by Stackebrandt and Ebers to delineate a new species without carrying out DNA-DNA hybridization [2]. By comparison to the GenBank database [26] strain JC48 also exhibited nucleotide sequence similarities greater than 99% with uncultured bacterial clones detected in a metagenomic study of the human skin flora [10]. These bacteria are most likely classified within the same species as strain JC48 (Figure 1).

Figure 1

Phylogenetic tree highlighting the position of Anaerococcus senegalensis strain JC48T relative to other type strains within the Anaerococcus genus. GenBank accession numbers are indicated in parentheses. For A. senegalensis, the two different 16S rRNA sequences were included. Sequences were aligned using CLUSTALW, and phylogenetic inferences obtained using the maximum-likelihood method within the MEGA software. Numbers at the nodes are bootstrap values obtained by repeating the analysis 500 times to generate a majority consensus tree. Peptoniphilus harei was used as outgroup. The scale bar represents a 2% nucleotide sequence divergence.

Evidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [20]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements.

Different growth temperatures (25, 30, 37, 45°C) were tested; no growth occurred at 25°C and 45°C, growth occurred between 30 and 37°C, and optimal growth was observed at 37°C. Colonies were 0.5 mm to 1 mm in diameter on blood-enriched Columbia agar and Brain Heart Infusion (BHI) agar. Growth of the strain was tested under anaerobic and microaerophilic conditions using GENbag anaer and GENbag microaer systems, respectively (BioMérieux), and in the presence of air, with or without 5% CO2, and in aerobic conditions. Optimal growth was achieved anaerobically. Weak growth was observed in microaerophilic conditions and with 5% CO2.. No growth was observed in aerobic conditions. Gram staining showed Gram positive cocci. A motility test was negative. Cells grown on agar are Gram-positive (Figure 2) and have a mean diameter of 0.87 µm, and are mostly grouped in pairs, short chains or small clumps (Figure 3).

Matrix-assisted laser-desorption/ionization time-of-flight (MALDI-TOF) MS protein analysis was carried out as previously described [21]. Briefly, a pipette tip was used to pick one isolated bacterial colony from a culture agar plate, and to spread it as a thin film on a MTP 384 MALDI-TOF target plate (Bruker Daltonik, Leipzig, Germany). Four distinct deposits were done for strain JC48 from four isolated colonies. Each smear was overlaid with 2 µL of matrix solution (saturated solution of alpha-cyano-4-hydroxycinnamic acid) in 50% acetonitrile, 2.5% tri-fluoracetic-acid, and allowed to dry for five minutes. Measurements were performed with a Microflex spectrometer (Bruker). Spectra were recorded in the positive linear mode for the mass range of 2,000 to 20,000 Da (parameter settings: ion source 1 (IS1), 20 kV; IS2, 18.5 kV; lens, 7 kV). A spectrum was obtained after 675 shots at a variable laser power. The time of acquisition was between 30 seconds and 1 minute per spot. The four JC48 spectra were imported into the MALDI BioTyper software (version 2.0, Bruker) and analyzed by standard pattern matching (with default parameter settings) against the main spectra of 2,843 bacteria, including spectra from the seven validated Anaerococcus species used as reference data, in the BioTyper database. The method of identification included the m/z from 3,000 to 15,000 Da. For every spectrum, 100 peaks at most were taken into account and compared with the spectra in the database. A score enabled the identification, or not, from the tested species: a score > 2 with a validated species enabled the identification at the species level; a score > 1.7 but < 2 enabled the identification at the genus level; and a score < 1.7 did not enable any identification. For strain JC48, the obtained score was 1.4, thus suggesting that our isolate was not a member of a known species. We incremented our database with the spectrum from strain JC48 (Figure 4).

Figure 4

Reference mass spectrum from A. senegalensis strain JC48T. Spectra from 5 individual colonies were compared and a reference spectrum was generated. The spectrum was made available online in our free-access URMS database.

Genome sequencing information

Genome project history

The organism was selected for sequencing on the basis of its phylogenetic position and 16S rRNA similarity to other members of the genus Anaerococcus, and is part of a “culturomics” study of the human digestive flora aiming at isolating all bacterial species within human feces. It was the second genome of an Anaerococcus species and the first genome of Anaerococcus senegalensis sp. nov. A summary of the project information is shown in Table 2. The Genbank accession number is PRJE70539 and consists of 39 contigs. Table 2 shows the project information and its association with MIGS version 2.0 compliance [5].

Table 2

Project information

MIGS ID

Property

Term

MIGS-31

Finishing quality

High-quality draft

MIGS-28

Libraries used

One 454 paired end 3-kb library

MIGS-29

Sequencing platforms

454 GS FLX Titanium

MIGS-31.2

Sequencing coverage

55×

MIGS-30

Assemblers

Newbler version 2.5.3

MIGS-32

Gene calling method

Prodigal

Genbank ID

PRJE70539

Genbank Date of Release

02/28/2011

Project relevance

Study of the human gut microbiome

Growth conditions and DNA isolation

A. senegalensis sp. nov. strain JC48T, CSUR P156, was grown anaerobically on 5% sheep blood-enriched Columbia agar at 37°C. Four petri dishes were spread and resuspended in 3×100µl of G2 buffer (EZ1 DNA Tissue kit, Qiagen). A first mechanical lysis was performed by glass powder on the Fastprep-24 device (Sample Preparation system; MP Biomedicals, USA) using 2×20 seconds cycles. DNA was then treated with 2.5µg/µL lysozyme (30 minutes at 37°C) and extracted through the BioRobot EZ 1 Advanced XL (Qiagen).The DNA was then concentrated and purified on a Qiamp kit (Qiagen). The yield and the concentration was measured by the Quant-it Picogreen kit (Invitrogen) on the Genios_Tecan fluorometer at 50ng/µl.

Genome sequencing and assembly

DNA (5 µg) was mechanically fragmented on a Hydroshear device (Digilab, Holliston, MA,USA) with an enrichment size at 3-4 kb. The DNA fragmentation was visualized through the Agilent 2100 BioAnalyzer on a DNA labchip 7500 with an optimal size of 3.785 kb. The library was constructed according to the 454 GS FLX Titanium paired end protocol. Circularization and nebulization were performed and generated a pattern with an optimal at 614 bp. After PCR amplification through 15 cycles followed by double size selection, the single stranded paired end library was then quantified on the Quant-it Ribogreen kit (Invitrogen) on the Genios Tecan fluorometer at 96 pg/µL. The library concentration equivalence was calculated as 2,87E+08 molecules/µL. The library was stored at -20°C until further use.

The library was clonally amplified with 0.5 cpb and 1 cpb respectively in 2×8 emPCR reactions with the GS Titanium SV emPCR Kit (Lib-L) v2 (Roche). The yields of the emPCR were lower than expected at 3.92%, compared to the range of 5 to 20% from the Roche procedure.

Approximately 790,000 beads were loaded on the GS Titanium PicoTiterPlate PTP Kit 70×75 and sequenced with the GS FLX Titanium Sequencing Kit XLR70 (Roche). The run was performed overnight and then analyzed on the cluster through the gsRunBrowser and Newbler assembler (Roche). A total of 310,172 passed filter wells were obtained and generated 99 Mb with a length average of 320 bp. The passed filter sequences were assembled using Newbler with 90% identity and 40bp as overlap. The final assembly identified 4 scaffolds and 39 contigs (>100bp).

Genome annotation

Open Reading Frames (ORFs) were predicted using Prodigal [22] with default parameters but the predicted ORFs were excluded if they were spanning a sequencing GAP region. The predicted bacterial protein sequences were searched against the GenBank database and the Clusters of Orthologous Groups (COG) database using BLASTP. The tRNAScanSE tool [23] was used to find tRNA genes, whereas ribosomal RNAs were found by using RNAmmer [24] and BLASTn against the GenBank. ORFans were identified if their BLASTP E-value was lower than 1e-03 for alignment length greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, we used an E-value of 1e-05. Such parameter thresholds have already been used in previous works to define ORFans. To estimate the mean level of nucleotide sequence similarity at the genome level between Anaerococcus species, we compared the ORFs only using BLASTN at a query coverage of ≥ 70% and a minimum nucleotide length of 100 bp.

Genome properties

The genome is 1,790,835 bp long (one chromosome, no plasmid) with a 28.56% G + C content (Table 3). Of the 1,774 predicted genes, 1,721 were protein-coding genes, and 53 were RNAs. Two distinct copies of 16S rRNA, differing by two point mutations, were identified. A total of 1,296 genes (73.0%) were assigned a putative function. Fifty-one genes were identified as ORFans (3%). The remaining genes were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4. The properties and the statistics of the genome are summarized in Tables 3 and 4.

Table 3

Nucleotide content and gene count levels of the genome

Attribute

Value

%totala

Genome size (bp)

1,790,835

100

DNA coding region (bp)

1,597,818

82.22

DNA G+C content (bp)

503,715

28.56

Total genes

1,774

100

RNA genes

53

3.0

Protein-coding genes

1,721

88.74

Genes with function prediction

1,296

73.0

Genes assigned to COGs

1,364

79.26

Genes with peptide signals

142

8.0

Genes with transmembrane helices

270

15.22

a) The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome.

Table 4

Number of genes associated with the 25 general COG functional categories

Code

Value

%age

Description

J

132

7.67

Translation

A

0

0

RNA processing and modification

K

122

7.09

Transcription

L

115

6.68

Replication, recombination and repair

B

1

0.06

Chromatin structure and dynamics

D

17

0.99

Cell cycle control, mitosis and meiosis

Y

0

0

Nuclear structure

V

63

3.66

Defense mechanisms

T

45

2.61

Signal transduction mechanisms

M

67

3.89

Cell wall/membrane biogenesis

N

5

0.29

Cell motility

Z

0

0

Cytoskeleton

W

0

0

Extracellular structures

U

19

1.10

Intracellular trafficking and secretion

O

53

3.08

Posttranslational modification, protein turnover, chaperones

C

87

5.06

Energy production and conversion

G

112

6.51

Carbohydrate transport and metabolism

E

104

6.04

Amino acid transport and metabolism

F

54

3.14

Nucleotide transport and metabolism

H

55

3.20

Coenzyme transport and metabolism

I

31

1.80

Lipid transport and metabolism

P

75

4.36

Inorganic ion transport and metabolism

Q

16

0.93

Secondary metabolites biosynthesis, transport and catabolism

R

170

9.88

General function prediction only

S

121

7.03

Function unknown

-

357

20.74

Not in COGs

The total is based on the total number of protein coding genes in the annotated genome.

To date, the genome from A. prevotii strain PC1T is the only genome from Anaerococcus species that has been sequenced [25]. By comparison with A. prevotii, A. senegalensis exhibited a lower G + C content (35.64% vs 28.56%, respectively) and a smaller number of genes (1,913 vs 1,774) and genes with peptide signals (337 vs 142). In contrast, A. senegalensis had higher ratios of genes per Mb (957 vs 990) and genes assigned to COGs (74.28% vs 79. 26%). However, the distribution of genes into COG categories (Table 4) was highly similar in both genomes.

Conclusion

On the basis of phenotypic, phylogenetic and genomic analyses, we formally propose the creation of Anaerococcus senegalensis sp. nov. that contains the strain JC48T. This bacterium has been found in Senegal.