Abstract

Metazoans contain multiple complex microbial ecosystems in which the balance between
host and microbe can be tipped from commensalism to pathogenicity. This transition
is likely to depend both on the prevailing environmental conditions and on specific
gene-gene interactions placed within the context of the entire ecosystem.

Review

Metazoans and higher plants are not single-species organisms, but are complex ecosystems
composed of a multicellular eukaryotic host, with its unique genetic complement [1], and a multitude of 'microbiomes'. Each microbiome is composed of multiple prokaryotic
and eukaryotic symbionts, and the microbiomes and the host collectively make up the
'symbiome' (Table 1) [2]. Symbiotic relationships within these ecosystems exist between each of the microbial
strains and the host, and also between and among the members of each microbiome. These
interdependencies run the gamut from mutualism (in which both or all species benefit)
to commensalism (where one party benefits and does no appreciable harm to the others)
to parasitism (where one of the species benefits at the expense of the other(s)).
Finally, a pathogenic relationship exists if the parasite produces a morbid condition
in the host. These divisions are themselves an oversimplification of what is, in all
likelihood, a continuum: where a given strain of microorganism falls within this spectrum
depends not only on its genomic complement but also on the makeup of the microbiome
as well as the individual host's genetics and other environmental factors.

Table 1. Definitions of some terms used in discussing microbial-host symbiosis

Pathogenicity is not only dependent on qualitative issues such as the presence of
specific species, strains, or genes, but also on their relative abundances. Thus,
the differential growth of one microbe may result in others transitioning into or
out of pathogenic status. It is therefore likely that many pathogens did not initially
evolve as pathogens, but simply take on this role as a result of a lack of ability
of the host to maintain homeostasis [3]. Interestingly, not all bacteria associated with pathogenic processes cause disease
by their presence; some bacteria are pathogenic by their absence, such as the vaginal
lactobacilli whose loss results in an increased pH, which permits overgrowth by invasive
species [4-6]. What makes a pathogen, therefore, is the addition, or deletion, of metabolic capabilities
in the symbiome that results in a disruption of homeostasis.

Bacterial plurality embodies the following concepts: bacteria within a species display
enormous phenotypic and genotypic heterogeneity [7]; microbial colonization is nearly universally polyclonal [8-11]; and microbiomes occupying the same niche in different hosts are vastly different
with respect to phylogenetic structure [12-14]. Thus, the hologenome (see Table 1 for a definition) is not fixed, but varies with age, health, diet, and other environmental
factors. In spite of this plasticity, however, we hope to be able to characterize
a set of common features associated with a healthy hologenome as opposed to a disease-state
hologenome [15] - the goal of the NIH Microbiome Roadmap Project [16]. We hypothesize that disease-state hologenomes will often display reduced complexity
(for example, Clostridium difficile overgrowth in the intestine following antibiotic treatment [17], or a reduced gut microflora associated with patients with inflammatory bowel disease
[18]) in a manner analogous to damaged sites in the environment that have been shown to
have reduced microbial complexity [19-21].

For many bacterial pathogens, such as the non-typeable Haemophilus influenzae (NTHi) [22,23], Pseudomonas aeruginosa [24,25], Staphylococcus aureus (RJ Boissy, unpublished data), Streptococcus agalactiae [26], and Streptococcus pneumoniae [27,28], whole-genome sequencing has shown that the supragenome is several times larger than
the core genome (see Table 1 for definitions). Thus, for these species there are more distributed genes (see Table
1) than core genes. This leads to the realization that bacterial species-level diagnostics
are woefully inadequate as prognosticators of disease potential. Therefore, it was
not surprising that disease phenotyping for multiple independent isolates of NTHi
[29] and pneumococcus (Streptococcus pneumoniae) [30] revealed a spectrum of diseases - from localized chronic infections to universal
lethality.

Similarly, species within the Enterobacteriaceae each reveal a broad spectrum of symbiotic
relationships with their hosts. The species Escherichia coli contains both mutualistic strains that have a role in host nutrition, and other strains
associated with either chronic urinary disease or acute enterohemorrhagic infections
[31,32]. Similarly, pathogenic strains of Enterococcus faecium have emerged from a commensal species, as we discuss below. Whole-genome sequencing
of the divergent strains in these species has revealed massive gene loss and gene
gain, resulting in intra-species genomes that vary by more than 30% in size [32].

Bacterial species are usually defined by their 16S rRNA gene. Whereas this is useful
for determining phylogenetic relationships based on vertically acquired genetic traits,
it does not account for horizontally acquired traits, that is, genes acquired by transfer
from other species, which are the major driving force in bacterial evolution [23]. Thus, 16S-rRNA-based phylogenies lump together strains that have widely divergent
gene distributions, metabolic capabilities, and pathogenic characters [23,26,28-33]. A species definition based on possession of a core genome has been proposed [7], but even this is too inclusive to be useful in clinical diagnostics. With the increasing
availability of whole-genome sequencing and comparative genomic hybridization (CGH),
it should be possible to obtain and analyze very large amounts of bacterial genomic
data, which could be cross-indexed with strain-specific disease virulence information
to develop effective clinical prognostic indicators.

Genes and gene combinations determine pathogenicity

As discussed above, within-species comparative genomics combined with disease phenotyping
can identify classes of virulence genes that are associated with different pathogenic
profiles [22-32]. These findings strongly implicate specific distributed genes and gene combinations
as the determinants of which bacterial strains are likely to act as pathogens. Both
genotypic and phenotypic heterogeneity have been demonstrated for the pneumococcus,
with some strains associated with chronic indolent infections whereas others are associated
with invasive or systemic disease [30]. Similarly, the NTHi display a broad spectrum of phenotypes [29] as well as having a highly plastic genome [22,23], making it likely that correlation studies would find virulence-specific genetic
and metabolic pathways.

This view is a departure from classical medical microbiology in which a species-level
diagnosis is used to make a prognosis. Thus, diagnostics development would profit
from large-scale bacterial genotype-phenotype correlation studies designed to provide
information on the distributed genes, which are the genes most frequently associated
with disease states. Such disease-associated genes may be largely confined to a single
species, or may be passed among related species, or may be more widely transmitted
across broader taxonomic lineages. Examples of species-specific distributed genes
include the various heme-acquiring genes found among the NTHi, and the multiple IgA-cleaving
proteases isolated among the pneumococci. Within the order Enterobacteriaceae, the
shiga-like toxin genes have been isolated from multiple species, and at higher taxonomic
levels, gene cassettes for antibiotic resistance and for natural competence (that
is, the ability to take up DNA from the environment) have been passed between Gram-negative
and Gram-positive bacteria.

The ability to carry out whole-genome sequencing of relatively large numbers of bacterial
strains using 454-based sequencing technology [34] provides a means of rapidly and inexpensively characterizing the species' core genomes
and supragenomes. Once a relatively complete species supra-genome is available [23,28], microarrays can be constructed containing probes for each distributed gene. These
CGH arrays can then be used to interrogate the genomes of large numbers of clinical
isolates with different disease phenotypes, providing the information to perform quantitative
trait locus-based gene-association studies for the identification of disease-specific
virulence genes. Such a statistical approach to bacterial genetics is new, as until
now there have been insufficient sequence data for such an approach. The application
of this technology would also provide a comprehensive means of characterizing the
functional roles of the plurality of unannotated genes that exist in even the best-studied
bacterial species.

How do pathogens evolve and where do they come from?

The distributed genome hypothesis [35,36] states that bacterial pathogens arise and acquire virulence traits primarily via
horizontal gene transfer (Figure 1). More recently, it has become clear that many bacteria are multicellular organisms
during part of their life cycle [37], and this has led to the recognition that bacteria possess a number of virulence
traits that are expressed only at the population level and are not operational at
the single-cell level [38]. These hypotheses are based on the observation that nearly all classes of pathogenic
bacteria maintain highly energy-demanding mechanisms for accessing foreign DNA [39], in spite of the fact that most of these species maintain small genomes. The importance
of this observation is that in a background of processes that favor gene deletion
[40], the maintenance of multiple horizontal gene transfer mechanisms indicates that these
traits are highly selected for. The distributed genome hypothesis also posits that
chronic pathogens utilize the distribution of non-core genes among strains of a species
as a survival strategy, whereby the continuous recombination of genetic characters
between strains serves as a supra-virulence factor that improves population survival
through the generation of new strains with novel combinations of genes. Thus, this
population-level gene reassortment acts as a counterpoint to the adaptive immune response
of vertebrates, providing a means for pathogens to constantly present the host with
novel antigens obtained from any of the constituent species of the symbiome.

Figure 1. The distributed genome hypothesis. (a) Schematic showing the distributed (non-core) genes of a species supragenome in a population
pool with individual strains below each containing the same set of core genes (green
helix). (b) Schematic showing each of the strains of a species with the core genome and a unique
distribution of non-core genes.

Many pathogenic bacteria have complex life cycles that include stages in the environment
and passage through multiple hosts. These organisms, therefore, come in contact with
many different selective pressures at various stages of their life cycle, and some
of the adaptations that provide protection from predation or competition in one stage
can induce pathogenicity in another stage. One way in which pathogens evolve is that
environmental organisms acquire genes through horizontal transfer that give them an
advantage within their non-pathogenic ecosystem. A classic example is the evolution
of pathogenic forms of Vibrio cholerae, non-pathogenic progenitor strains of which are principally found in aquatic ecosystems.
Pathogenic strains originate from non-pathogenic strains through a multistep process
that includes the acquisition of the type IV toxin-co-regulated pilus (TCP). This
acquisition is followed by infection with the filamentous phage CTXϕ, which uses the
pilus as a point of entry and provides the genes encoding cholera toxin [41]. Studies of cholera epidemics suggest that this general series of genomic rearrangements
occurs independently in each epidemic in response to competition among extant environmental
strains. These studies led Faruque et al. [41] to hypothesize that "continual emergence of new toxigenic strains and their selective
enrichment during cholera outbreaks constitute an essential component of the natural
ecosystem for the evolution of epidemic V. cholerae strains to ensure its continued existence."

Legionella pneumophila, a bacterium that lives intra-cellularly, also probably evolved its pathogenic characters
outside the human host. In humans, L. pneumophila grows and replicates in human alveolar macrophages to cause pneumonia, particularly
in immunocompromised hosts. The ability to live within phagocytic cells is the critical
virulence factor for this organism and is encoded by the icm/dot secretion system
[42], which originally evolved to permit the bacterium's survival within free-living grazing
protozoa. Similarly, E. coli O157, although notorious as a highly virulent enterohemorrhagic pathogen of humans,
is primarily a commensal microorganism of cattle that also lives in the environment.
Although E. coli O157 can be transmitted from person to person, this is not its principal means of
propagation; thus, it is likely that its virulence in humans is a byproduct of other
evolutionary forces. Many E. coli strains, including O157, that contain a lambda-like prophage carrying the shiga-like
toxin genes (stx) have been shown to have a survival advantage in the presence of the ubiquitous bactivorous
protozoan Tetrahymena pyriformis [43]. These investigations showed that most of the survival advantage of the stx-containing strains can be attributed to better survival within the protozoan's food
vacuoles. Thus, for both L. pneumophila and O157 it would appear that the primary virulence factors associated with human
disease actually evolved to play a critical role in the organisms' survival in other
stages of their life cycles. Interestingly, however, the shiga toxin of O157 causes
diarrhea in humans, which could lead to increased spread of this strain through fecal
contamination. Thus, it is tempting to speculate that acquisition of shiga toxins
may be under multiple unrelated evolutionary pressures.

Competition among microorganisms can also generate strains that are pathogenic in
their host as a side effect of the intermicrobial arms race. Microorganisms rarely
live in isolation, and the myriad interactions amongst co-colonizing species and strains
impose a constant selective pressure that ensures the continual evolution of new strains.
Thus the same bacterial horizontal gene transfer mechanisms that provide a counterpoint
to the host's adaptive immune response also serve to generate more competitive strains
for interspecies competition, with some of these antibacterial mechanisms also resulting
in increased virulence towards the host. There is abundant evidence that the numerous
bacterial species colonizing the human respiratory mucosa are in competition with
each other. Both NTHi and the pneumococcus form biofilms on the middle-ear mucosa
that are associated with chronic otitis media but, even when both species are present
in the same sample, they do not form mixed biofilms [44]. NTHi can also induce an anti-pneumo-coccal host response during mixed infections
that is characterized by increased recruitment of neutrophils into the paranasal spaces
[45]. This favors H. influenzae - in spite of the fact that in mixed laboratory culture the pneumo-coccus predominates.
Conversely, H. influenzae is competed against by S. pneumoniae. Both H. influenzae and Neisseria meningitidis use sialylation of lipooligosaccharides as a mechanism to evade host immune surveillance
through mimicry, whereas S. pneumoniae expresses NanA, which desialylates the cell surface of both these bacteria [46]. NanA also alters multiple surface carbohydrates and removes sialic acid residues
from human epithelial cells [47]. Disruption of NanA decreases the ability of the pneumococcus to establish a persistent
infection, as it can no longer expose the sialylated host-cell receptors needed for
attachment [48]. Thus, NanA plays a role in pathogenesis as well as in inter-species competition.

A single molecule is, however, not always advantageous in interactions both with the
host and between competing microorganisms. The pore-forming toxin of S. pneumoniae, pneumolysin, increases access of the peptidoglycan of H. influenzae cell walls to cytoplasmic immune molecules that initiate an anti-pneumococcal response,
thus providing an advantage to H. influenzae [49]. Thus, the balance between fitness in different environmental settings is critical
when considering how pathogens evolve. Mutations that offer a fitness advantage in
one environment may confer a disadvantage in another. This is perhaps best understood
in respect of microbial drug resistance, where mutations that confer an advantage
in the presence of drugs are often deleterious (resulting in slower growth rates)
in its absence.

In the monitoring of emerging pathogens it will become increasingly important to recognize
the genes and regulatory systems that facilitate transition into a new niche or that
balance gene expression within a strain such that it can survive in different environments.
In a recent study, Giraud and colleagues [50] created gnotobiotic mice by colonizing germ-free mice with E. coli. In each of eight independent experiments, after habituation, the bacteria were shown
to have mutations in the EnvZ-OmpR two-component response regulator, a signal transduction
system that controls an entire regulon. This strongly implicates this locus as providing
a fitness advantage in this particular environment [50]. This is likely to be the case for many master regulators, and given such an important
role in adaptation one might expect these genes to be mostly part of the core genome.
In the pneumococcus, however, only a subset of the predicted two-component signal-response
systems are core-encoded. Thus, it remains to be determined whether the distributed
two-component systems affect pneumococcal fitness under any particular environmental
condition, and how the presence, absence, and mutation of these master regulators
provides an advantage for one strain over another.

Many pathogens evolve in situ from species that are commensals in the eukaryotic host. This is not surprising, as
these organisms are already adapted for survival within the extant symbiome and acquisition
of virulence genes can produce a pathogen de novo. Examples of adaptation to a new niche selecting for virulence are commonly observed
within the genus Salmonella. Salmonella enterica subspecies I is well adapted to warm-blooded vertebrates. There are more than 1,000
serotypes of this subspecies with different degrees of host adaptation. The level
of host specificity among the serotypes correlates with their capacity to cause disease.
Mononuclear phagocytes are barriers to the host range of S. enterica, and mechanisms enabling survival of the bacteria within these cells allow adaptation
to individual host species [51]. The serotype Typhimurium is successful in mice, and survives well in murine, but
not human, macrophages; the reverse is true for the serotype Typhi, which causes disease
in humans. In contrast, other subspecies of S. enterica are mainly associated with cold-blooded vertebrates. It is thought that these subspecies
survive in the alimentary tract of reptiles, where they are well adapted as commensal
organisms [51].

Another example of pathogenic strains evolving from non-pathogenic ones via horizontal
gene transfer is the case of Enterococcus faecium. This bacterium has recently evolved from a commensal into a frequently isolated
nosocomial (hospital-acquired) pathogen in intensive care units [52]. Comparative genomics has shown that the pathogenic strains have arisen from multiple
backgrounds, but all show evidence of having acquired insertion elements (a type of
transposable element) that are not present in the commensal strains. Thus, the creation
of a new environmental niche, the intensive care unit, has facilitated the evolution
of a new subpopulation of this species. The degree of genetic variation among strains
in the 'hospital clade' of E. faecium (as assessed by pulsed-field gel electrophoresis and multi-locus sequence typing)
was compared with the degree of variation among all other strains. This revealed that
the diversity indices (ratio of average genetic similarities) were higher for the
hospital clade [52], strongly suggesting increased genomic plasticity within this population that is
likely to facilitate its further adaptation.

Host mutations are associated with the development of bacterial pathogenicity

An example of specific host-bacterium gene combinations resulting in pathogenesis
(and the evolution of a pathogen from a commensal) involves the human genetic disease
cystic fibrosis. This disease is caused by mutations in the human CFTR gene that lead to the loss of a chloride channel, resulting in highly viscous pulmonary
mucus that prevents the normal activity of the 'mucociliary escalator', which is designed
to sweep bacteria out of the airways. The disease first becomes apparent with colonization
and chronic infection by NTHi, which leads inexorably to secondary infection by the
opportunistic environmental bacterium P. aeruginosa, which establishes a chronic infection involving a biofilm. The pseudomonal infection
is ultimately lethal (although modern medical practice can extend life for decades).
What is most interesting is that as the P. aeruginosa infection transitions from acute to chronic, there is significant evolution of the
bacterial genome [53-56] that makes P. aeruginosa much more pathogenic in the lungs of cystic fibrosis patients. Proof of this hypothesis
came with the observation that preadolescents with cystic fibrosis who attended the
same clinics and summer camps as older adolescents with the disease were experiencing
very rapid clinical progression. Molecular typing of the P. aeruginosa isolates revealed that the young children were being infected with the highly evolved
chronic pathogens, adapted to the cystic fibrotic lung, from the older people [56]. In the final analysis, sequential colonization by multiple bacterial species, none
of which is highly pathogenic in the healthy host, evolves into what becomes a lethal
infection in the presence of a defective host gene. Thus, the cystic fibrosis lung
illustrates the concept that the entire composition of the hologenome is important
in defining pathogenicity and virulence.

Novel pathogens are constantly emerging from environmental and commensal bacterial
flora as a result of competitive selective pressures and ubiquitous horizontal gene
transfer. Many, perhaps most, virulence traits did not arise originally to damage
the host, but rather as a means to compete with other microbes or to prevent predation,
or as a means to obtain nutrients from the host. Humans come into contact with a large
range of ecological niches through agriculture, aquaculture, and other harvesting,
commercial and recreational activities. Given the enormous numbers of microbial species
in each of these niches, and the vast size of the accessible supragenomes available
to each of these species, novel pathogens are likely to be a permanent feature of
human existence.

Acknowledgements

This work was supported by Allegheny General Hospital and Allegheny Singer Research
Institute, as well as by grants from the Health Resources and Services Administration
and the NIH-NIDCD: DC02148, DC04173, and DC05659. We thank Mary O'Toole for help with
the preparation of this manuscript.

Tong HH, Blue LE, James MA, DeMaria TF: Evaluation of the virulence of a Streptococcus pneumoniae neuraminidase-deficient mutant in nasopharyngeal colonization and development of otitis
media in the chinchilla model.