Received November 9, 2017; Revised November 28, 2017; Accepted December 4, 2017.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Type VI secretion system (T6SS) has been discovered in a variety of gram-negative bacteria as a versatile weapon to stimulate the killing of eukaryotic cells or prokaryotic competitors. Type VI secretion effectors (T6SEs) are well known as key virulence factors for important pathogenic bacteria. In many Burkholderia species, T6SS has evolved as the most complicated secretion pathway with distinguished types to translocate diverse T6SEs, suggesting their essential roles in this genus. Here we attempted to detect and characterize T6SSs and potential T6SEs in target genomes of plant-associated and environmental Burkholderia species based on computational analyses. In total, 66 potential functional T6SS clusters were found in 30 target Burkholderia bacterial genomes, of which 33% possess three or four clusters. The core proteins in each cluster were specified and phylogenetic trees of three components (i.e., TssC, TssD, TssL) were constructed to elucidate the relationship among the identified T6SS clusters. Next, we identified 322 potential T6SEs in the target genomes based on homology searches and explored the important domains conserved in effector candidates. In addition, using the screening approach based on the profile hidden Markov model (pHMM) of T6SEs that possess markers for type VI effectors (MIX motif) (MIX T6SEs), 57 revealed proteins that were not included in training datasets were recognized as novel MIX T6SE candidates from the Burkholderia species. This approach could be useful to identify potential T6SEs from other bacterial genomes.

Keywords : Burkholderia species, T6SS, type VI effector

Introduction

Secretion in bacterial species is a critical mechanism for survival and adaptation to their natural surrounding environmental stresses (Abby et al., 2016; Alfano and Collmer, 2004; Russell et al., 2014a). Many bacteria have evolved different secretion pathways termed from type I to type IX secretion systems to deliver effector molecules such as proteins, enzymes or toxins from a bacterial cell to its exterior (Abby et al., 2016; Tseng et al., 2009). Among them, the type VI secretion system (T6SS) is involved in both bacterial-host and inter-bacterial interactions, by delivering anti-eukaryotic or anti-bacterial effectors, respectively (Filloux et al., 2008; Russell et al., 2014a). The type VI secretion effectors (T6SEs) in plant-associated bacteria have been demonstrated to operate various functions such as inter-bacterial competition, quorum sensing, stress response, biofilm formation, and symbiosis (Bernal et al., 2017b; Cianfanelli et al., 2016; Hachani et al., 2016; Russell et al., 2014a; Ryu, 2015). Anti-bacterial effectors usually have corresponding immune proteins to help self-protect bacteria from the toxicity of their own secreted effectors. Notably, many components of T6SSs such as Hcp and VgrG proteins may serve as specialized effectors with toxic extension domains delivered into target cells along with cargo effectors (Cianfanelli et al., 2016; Hachani et al., 2016).

In this report, we attempted to detect and characterize T6SS clusters and T6SE candidates in target genomes of plant-associated or enviromental Burkholderia species based on computational analyses. The component proteins of each cluster were manually checked in detail and phylogenetic trees of three components (TssC, TssD, and TssL) were established to elucidate the relationship among annotated T6SS clusters. The potential T6SEs in the bacterial genomes of Burkholderia species were inferred based on homology searching. Furthermore, we attempted to reveal a T6SEs containing marker for type six (MIX T6SEs) using a screening approach based on the profile hidden Markov model (pHMM) method.

Materials and Methods

Annotation of Type VI Secretion Systems

Annotations of available T6SS clusters and their components in some bacterial genomes were obtained from the SecreT6 database ( http://db-mml.sjtu.edu.cn/SecReT6/) (Li et al., 2015). To annotate T6SS clusters in the remaining uncharacterized genomes, we extracted nucleotide sequences from the public sequence database at National Center for Biotechnology Information NCBI ( https://www.ncbi.nlm.nih.gov/genome/) and utilized the CD-easy tool in SecReT6 to annotate the open reading frames (ORFs) along with their corresponding amino acid sequences. Next, the T6SS-BLASTP program with default values in SecreT6 was used to detect the T6SS component proteins in target genomes and loci with at least five components were considered as potential T6SS clusters. The component genes of all clusters were then manually checked in detail. Furthermore, we compared the T6SS clusters obtained here with those detected by using TXSScan MacSyFinder program (Abby et al., 2016; Abby and Rocha, 2017), to carry out the optimized validation for potential functional systems. In final, the clusters that either were detected by both programs or contained at least 10 core components explored by SecReT6 were further used in distribution and phylogenetic analyses because of their highly confident presence and functional ability in genomes. Three important component proteins (e.g., TssB, TssC, TssL) were utilized for constructing phylogenetic trees to elucidate the phylogenetic relationship among the detected T6SS clusters. Protein sequences were aligned using the ClustalW algorithm (Thompson et al., 1994) in MEGA6 ( http://www.megasoftware.net/) (Tamura et al., 2013), and phylogenetic trees were constructed based on the Neighbor-Joining (NJ) method (Saitou and Nei, 1987) with 1000 bootstrap replicates and other default parameters (phylogenetic reconstruction, substitution type: amino acids, model/methods: Jones-Taylor-Thornton (JTT) model (Jones et al., 1992), rates among sites: uniform rates and gap missing data treatment: partial deletion).

Prediction of type VI secretion effectors based on homology search

The complete sequenced genomes and protein sequences of Burkholderia and other reference species were downloaded from the NCBI genome database. In total, 105 experimentally verified T6SEs in different bacterial species were extracted from the integrated database SecReT6 and existing literature (Bernal et al., 2017a; Li et al., 2015). The evidential effectors were then used as query sequences to recover homologous proteins from the complete genomes of Burkholderia bacterial strains using BLASTP search ( https://blast.ncbi.nlm.nih.gov/Blast.cgi) (Altschul et al., 1990) with identity ≥ 30% and E-value < 10−6. Next, the amino acid sequences of the collected proteins were checked manually. Proteins with length less than 100 residues or containing a signal peptide checked by the SignalP program ( http://www.cbs.dtu.dk/services/SignalP/) (Petersen et al., 2011) were removed. To avoid overlapping results obtained from the similarity of short domains, we also checked the length skew between query sequences and the identified protein sequences in all hits; hits with a length skew greater than 200 amino acids were then eliminated. For hits with identity ranging from 30% to lower than 50%, only proteins that were detected in at least three hits by three different query effectors or having an annotation related to type VI effectors (i.e., Hcp, VgrG, Rhs/YD-repeat, PAAR, phospholipase) were retained. The TMHMM server (Krogh et al., 2001) and Conserved Domain Database CDD ( https://www.ncbi.nlm.nih.gov/cdd/) (Marchler-Bauer et al., 2011) were used to explore the transmembrane (TM) regions and conserved domains in potential T6SE sequences.

Detection of MIX T6SEs using profile hidden Markov model

To reveal MIX T6SEs, we built profile HMM models for MIX-containing effectors and used these models to scan all target genomes. In total, 217 T6SS-associated effector proteins that contained MIX motifs were extracted from a recent study by Salomon et al (2014). These T6SEs were divided into five groups as MIX I, MIX II, MIX III, MIX IV, and MIX V – containing proteins (named as dataset 1 to dataset 5 in this study), including six evidential T6SEs (i.e., VP1388, VPA1263, IdsD, BTH_I2691, VCA0020, V12G01_02265). These input protein sequences were aligned using the MUSCLE 3.8 program with default parameters (Multiple Sequence Comparison by Log-Expectation) (Edgar, 2004). Next, the HMMER ( http://hmmer.janelia.org/) (Finn et al., 2011), particularly the basic function of building a profile HMM from a dataset of sequences (hmmbuild), was used to establish five pHMMs for the five datasets of aligned sequences. The function of searching a profile HMM against a database (hmmsearch) in HMMER was then used to screen the trained pHMMs against protein sequences, to identify matched proteins that were considered MIX T6SE candidates. The E-value along with sequence score for whole sequence searching of matched proteins was valid based on the results of searching evidential MIX T6SEs and matched proteins with a signal peptide checked by SignalP (Petersen et al., 2011) were eliminated. To assess the ability of each pHMM to recognize potential MIX T6SEs, we performed cross-searching using five built pHMMs against all 217 collected T6SEs in the five training datasets. The detection rate of five models against non-T6SE datasets was also tested. The test datasets were extracted from two negative datasets of SecretomeP program, one from Gram-negative bacteria (196 proteins) and another from Gram-positive bacteria (1084 proteins). Next, to evaluate the ability of recognizing MIX T6SE candidates from separate bacterial genomes, we implemented a search of pHMMs against reference genomes of bacterial strains containing the known MIX T6SEs and some bacteria (e.g., Escherichia and Pseudomonas strains) with well-characterized T6SSs (Cascales and Cambillau, 2012; Cianfanelli et al., 2016; Filloux et al., 2008). Finally, we applied the pHMM-based screening approach described above to detect MIX T6SE candidates in bacterial genomes of the Burkholderia species. To explore the relationship among the revealed candidates, their amino acid sequences were aligned using the ClustalW algorithm (Thompson et al., 1994) and a phylogenetic tree was then constructed based on the Neighbor-Joining (NJ) method (Saitou and Nei, 1987) with 1000 bootstrap replicates and other default parameters in MEGA6 ( http://www.megasoftware.net/) (Tamura et al., 2013). In addition, the available neighboring T6SS components of MIX T6SE candidates on the genome were checked manually. The conserved domains of the proteins were detected using the CDD program (Marchler-Bauer et al., 2011) and transmembrane (TM) regions were explored using the TMHMM server (Krogh et al., 2001).

Results and Discussion

Annotation of T6SS clusters

A large amount of attention has been given to studies regarding mechanisms of T6SSs and their components because they are involved in both inter-bacterial and bacterial-host interactions (Bingle et al., 2008; Cascales and Cambillau, 2012; Cianfanelli et al., 2016). In this work, the potential function T6SS clusters in representative genomes of target Burkholderia strains which are plant-associated pathogenic and beneficial bacteria, biocontrol bacteria, or isolated from environment were annotated (Table 1). In total, 86 potential T6SS clusters were found in 30 target genomes of Burkholderia strains and the components of each cluster were screened manually (Supplementary Table 1). Of those, 66 clusters (mean number of clusters per strain is 2.2) were found to contain at least 10 components out of the 13 core components (TssA – TssM) (Table 1). All these clusters were also detected by TXSScan models as single-loci with exception of one cluster in Burkholderia sp. CCGE1002 due to strict requirements regarding number of predicted core proteins by TXSScan models which requires more than 11 core proteins over 14 full componnents (TssA-TssM, and evpJ proteins) (Abby et al., 2016; Abby and Rocha, 2017).

Next, to assess the phylogenetic relationship among T6SS clusters, we constructed phylogenetic distributions of three proteins (i.e., TssC, TssD, and TssL) which are important components and are also found in most of the assigned T6SS clusters (Supplementary Fig. 1–3). Proteins that belong to the same annotated T6SS subtypes (i.e., i3, i4b, i2, i1, i4a) were mainly grouped together, strongly indicating that the T6SS clusters could be classified based on the phylogenetic relationship of their components. Therefore, we assigned specific subtypes for T6SS-N/A clusters, which were not annotated availably based on the groups depicted in the phylogenetic trees. Remarkably, the assigned T6SS subtypes obtained from all three phylogenetic trees of TssC, TssD, and TssL proteins were the same and are specified as an inferred T6SS subtype in Supplementary Table 1. The predicted subtype number of the collected Burkholderia strains was presented in Table 1, with the highest proportions in the T6SS-i3 (39%) and T6SS-i4b (38%) subtypes, indicating the important role of these subtypes especially in plant-pathogenic Burkholderia species.

Homology-based prediction of Type VI secretion effectors

The detection of T6SEs in bacterial species may help in developing efficient strategies to inhibit or promote their competition with other neighboring bacteria dwelling in the same niches. Here, the experimental effectors were used as query sequences to search for homologous proteins in the bacterial genomes of Burkholderia species, in order to infer potential type VI effectors. At first, hits with identity ≥ 30%, E-value < 10−6, and a length skew less than 200 amino acids were collected. The proteins revealed in hits with lengths less than 100 residues or containing a signal peptide inside were then eliminated. Next, for hits with identity ranging from 30% to lower than 50%, only proteins with at least three hits or having a functional annotation related to type VI effectors were retained. Overall, a total of 322 potential T6SE proteins were inferred based on the homology search of the 30 genomes of Burkholderia species. Of these, 215 (66.8%) proteins were predicted as non-classically secreted proteins (NCSPs) by the SecretomeP program (Bendtsen et al., 2004) and only 17 proteins (5.3%) were found to contain at least one TM domain based on the TMHMM program (Krogh et al., 2001). The number distribution of the T6SE candidates along with the T6SS clusters in the analyzed Burkholderia bacterial strains are presented in Fig. 1. The plant-pathogenic Burkholderia strains obtained the highest average number 14.5 of putative T6SEs, followed by the group of environmental bacteria or biocontrol agent with 11.6, and the group of plant-associated beneficial bacteria got around 7.1. Clearly, the type VI effectors in Burkholderia plant pathogens would be associated with the ability of toxic infecting on their plant host cells. However, there seems to be no clear relationship between the number of T6SS clusters and number of T6SE candidates when considering them in all bacterial strains. The GenBank accession numbers of all induced T6SE proteins and related annotations can be found in Supplementary Table 2.

Domains or motifs may provide biologically meaningful features in the characterization of putative type 6 effectors and could be related to processes that connect the cargo effectors with the component effectors (Cianfanelli et al., 2016; Russell et al., 2014a). Based on the CDD program with concise results, conserved domains in the putative T6SEs induced in Burkholderia species were assigned with an E-value < 10−4. The distribution of domain types is depicted in Fig. 2, showing the highest fractions of VgrG-related domains (i.e., VgrG, T6SS_Vgr, VI_Rhs_Vgr) and Hcp-related domains (i.e., T6SS_HCP, VI_effect_Hcp1) with 40% and 15%, respectively. Lower fractions were observed for COG4253 (13%) and DUF2345 (8%), which are unknown functional domains usually found in type VI associated proteins that could be involved in the transport of putative effector islands (Ma et al., 2013). The Rhs – related domain type (i.e., RhsA, RHS, PAAR_RHS) also showed the fraction of up to 13% in total. Important domains like the Phospholipase D domain, included in the type VI secretion phospholipase D effectors targeting both eukaryotes and prokaryotes (Jiang et al., 2014), and especially the HNH super family comprising the conserved AHH, WHH, HNHc, and/or Tox-SHH domains that play an important role in the toxicity of proteins (Veluchamy et al., 2009) were also observed. The remaining domains (i.e., DUF3274, Cls, DUF2235, Ntox8 superfamily, PAAR, Pput2613-deam superfamily, SCP1201-deam, TVP38, and YadA_anchor superfamily) were pooled with others and hence were gathered together in term of other domain. The results suggested that these domains of type VI effectors in Burkholderia species are important in the secretion process or function in plant host cells, being consistent with findings from recent studies regarding the type VI effector – related domains (Flaugnatti et al., 2016; Jiang et al., 2014).

Detection of MIX T6SE candidates based on profile HMM

Five profile hidden Markov models (termed as pHMM1 to pHMM5) were generated separately for the five datasets of T6SE candidates (Supplementary Table 3). The number of T6SEs in the training datasets recognized by pHMMs through cross-searching are presented in Supplementary Table 4, showing the ability of each pHMM to reveal potential MIX T6SEs. As expected, all five pHMMs could recognize most of the proteins in their own training datasets (diagonal values, from 96.7% to 100%). The pHMM5 seemed to be the most unique model, mainly recognizing proteins in dataset 5. The pHMM3 was found to be the most effective model in the recognition of candidates of all three datasets 2–4, in which it could recognize proteins in dataset 2 and 4 with high fractions of 96.7% and 97.1%, respectively. Despite this, we used all three models of pHMM2, pHMM3, pHMM4 along with pHMM1 and pHMM5 for further detection of potential MIX T6SEs to obtain comprehensive results. In addition, in the tests to assess the false positive rates of pHMMs against negative datasets, only one protein (Q81J59) out of 1280 non-T6SE proteins was carried out by pHMM2 with E-value < 0.001 (Supplementary Table 4). Particularly, it got a quite low sequence-score of 15.5, suggesting that the proteins detected with higher cores would be more confident candidates of MIX T6SEs. Next, matched proteins obtained in the process of searching pHMMs against the reference genomes were determined (Supplementary Table 5), comprising all known MIX T6SEs along with four candidates included in the training datasets. Remarkably, two of the inferred MIX T6SE candidates, which were not included in the training datasets, were recently discovered to be experimental type VI effectors (EC042_4534 and BTH_I2698) (Flaugnatti et al., 2016; Russell et al., 2013) and two others (V12G01_01565 and BTH_I2701) shared significant similarity with known effectors (VP1388 and BTH_I2698) when checked by a BLASTP search (E-value < 10−6 and identity ≥ 70%). EC042_4534 and BTH_I2698 were explored only by the pHMM5 model, and have been indicated as belonging to the Tle1 (type VI lipase effector) family, one of the five divergent families (Tle1-5), suggesting that the T6SEs recognized by this model (pHMM5) are highly related to the Tle1 effector family (Russell et al., 2013). These results demonstrated that the pHMM-based screening approach could be useful to reveal potential MIX T6SEs from separate bacterial genomes and was therefore applied to the bacterial genomes of Burkholderia species.

MIX T6SE candidates inferred in the plant-associated or enviromental Burkholderia bacterial strains were then obtained using the pHMM-based screening approach described above, comprising 66 proteins in total (Table 2). Impressively, 20 of those proteins (~30.3%) were recognized by only pHMM5 and thus these proteins may belong to the Tle1 effector family, the anti-effectors aforementioned. It is also notable that we found no any MIX T6SE candidate inferred from genomes of some plant-associated beneficial bacteria including Burkholderia sp. CCGE1001, Burkholderia sp. CCGE1002, Burkholderia sp. CCGE1003, Burkholderia sp. RPE64, and Burkholderia sp. RPE67 in the screening process. Of the novel candidates, 3 proteins that were not included in the training datasets, shared significantly high similarity with at least one of the evidential T6SEs when checked by BLASTP search (E-value < 10−6 and identity ≥ 70%) (Table 2). Additionally, a larger number were found to have T6SS component proteins located in their proximity (Supplementary Table 5). This corresponded with the discovery that MIX-containing type VI effectors usually reside near components of T6SSs on the genome map (Hachani et al., 2016; Russell et al., 2014a). The phylogenetic tree of all 66 candidates clearly manifested two main groups of which group I covered most MIX T6SE candidates detected by one or more models from pHMM1-4 and group II covered all 20 MIX T6SE candidates that were recognized by pHMM5 alone (Fig. 3). The relationship presented in the phylogenetic distribution could provide hinders of translocating ability among close T6SEs in different Burkholderia bacterial strains. The conserved domains were founded in only 27 out of 66 candidates in which the fraction of DUF2235 domains (uncharacterized alpha/beta hydrolase domain) was the highest (~69%), followed by lower fraction of UhpC superfamily domain (~10%) (Supplementary Fig. 4). A considerable fraction (~62%) of potential T6SE sequences was found to contain at least one transmembrane region inside based on the TMHMM program results, in which the number of proteins with 3 TM domains received the highest proportion (~36.4%) (Supplementary Table 5). The results are in agreement with recent findings about domains associated with MIX motif - containing type 6 effector candidates (Cianfanelli et al., 2016; Flaugnatti et al., 2016; Salomon et al., 2014).

In conclusion, we focused on the detection and characterization of T6SS clusters and potential T6SEs at genome-scale in plant-associated pathogenic and beneficial Burkholderia bacteria based on computational analyses. 66 potential functional type VI systems were found in 30 collected Burkholderia strains, showing the higher number of the systems for plant pathogens compared with that of plant beneficial bacteria. Furthermore, the phylogenetic trees of important components (i.e., TssC, TssD, TssL) showed the close relationship of species that have close lifestyles, besides the distribution based on specific subtypes of T6SS clusters. Also 322 potential T6SEs in the analyzed genomes based on the homology search method were identified and their important conserved domains were explored. In addition, using the screening approach based on profile hidden Markov models of such MIX T6SEs, 57 proteins that were not included in training datasets were recognized as novel MIX T6SE candidates in Burkholderia species. Our findings of potential T6SS along with its effectors in Burkholderia species could provide useful hinders contributing to managing the inter-bacterial interactions of this genus, in order to repress the infection of phytopathogens on their host plants through encouraging the competitive ability of beneficial bacterial strains in the same niches.

Acknowledgments

This work was supported by grants from the Strategic Initiative for Microbiomes in Agriculture and Food, Ministry of Agriculture, Food and Rural Affairs, Republic of Korea (No. 916009021SB010), and from the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2016R1D1A1B03930678).

Fig. 1. Distribution of putative T6SEs and annotated T6SS clusters. The number of T6SS loci ranges from 1 to 5, existing in all 30 analyzed genomes, and the number of T6SE candidates ranges from 4 to 20 in which the mean numbers of type VI effectors for three groups of plant-associated pathogenic bacteria, plant-associated beneficial bacteria, environmental bacteria and/or biocontrol agent were 14.5, 7.1, and 11.6, respectively.

Fig. 3. Phylogenic tree of 66 MIX T6SE candidates detected in Burkholderia bacterial species. The tree was constructed in MEGA6 using the Neighbor-Joining method with 1000 bootstrap replicates. Branches corresponding to partitions reproduced in less than 50% bootstrap replicates were collapsed. All positions with less than 95% site coverage were eliminated. The evolutionary distances were computed using the JTT matrix-based method and are in the units of the number of amino acid substitutions per site. Two main groups were observed on the inferred tree; group I covered most proteins inferred by one or more models of pHMM1-4 and group II covered 20 proteins that were all recognized by only pHMM5.

Tables

Table 1

Distribution of T6SS loci with at least 10 components identified in Burkholderia species genomes

Bacterial strain

Number of T6SS locus

Potential T6SS subtype

i1

i2

i3

i4a

i4b

Plant-associated pathogenic bacteria

B. glumae BGR1

4

-

1

1

1

1

B. gladioli BSR3

3

-

-

2

-

1

B. glumae LMG 2196

3

-

1

1

-

1

B. glumae PG1

3

-

-

2

-

1

B. plantarii ATCC 43733

3

-

-

2

-

1

B. cepacia ATCC 25416

2

-

-

1

-

1

B. gladioli ATCC 10248

2

-

-

1

-

1

B. gladioli KACC 11889

2

-

-

1

-

1

Plant-associated beneficial bacteria

Burkholderia sp. CCGE1002

3

-

1

1

1

-

B. phenoliruptrix BR3459a

2

-

1

-

-

1

B. phytofirmans PsJN

2

-

-

1

-

1

B. pyrrocinia DSM10685

2

-

1

-

-

1

Burkholderia sp. KJ006

2

-

1

-

-

1

Burkholderia sp. CCGE1001

2

-

-

1

-

1

Burkholderia sp. CCGE1003

2

-

-

1

-

1

Burkholderia sp. RPE67

2

1

-

1

-

-

Burkholderia sp. RPE64

1

1

-

-

-

-

B. vietnamiensis LMG 10929

1

-

-

-

-

1

B. vietnamiensis G4

1

-

-

-

-

1

Environmental bacteria or biocontrol agent

B. ambifaria MC40-6

4

-

1

2

-

1

Burkholderia sp. YI23

4

1

1

2

-

-

B. cepacia JBK9

3

1

-

1

-

1

Burkholderia lata sp. 383

3

-

-

2

-

1

B. ambifaria AMMD

2

-

-

1

-

1

B. cepacia GG4

2

-

-

1

-

1

B. phymatum STM815

2

-

-

1

1

-

B. cenocepacia HI2424

1

-

-

-

-

1

B. cenocepacia MC0-3

1

-

-

-

-

1

B. cepacia DDS 7H-2

1

-

-

-

-

1

Burkholderia sp. Ch1-1

1

-

-

-

-

1

Total (%)

66

4

8

26

3

25

100%

6%

12%

39%

5%

38%

Table 2

Putative MIX T6SE proteins detected from the genomes of Burkholderia species