An overview of the molecular phylogeny of lentiviruses

Brian T. Foley

T10, MS K710, Los Alamos National Laboratory, Los Alamos, NM
87545

Introduction

Lentiviruses are one of several groups of retroviruses. In early
studies of the molecular phylogenetic analysis of endogenous and
exogenous retroviruses it was suggested that the retroviruses could
be divided into four groups, two complex with several accessory or
regulatory genes (lentiviruses; and the bovine and primate leukemia
virus group now known as deltaretrovirus) and two simple, with just
the gag, pol and env genes and few or no
accessory genes (a group including spumaretroviruses, Ctype
endogenous retroviruses and MMLV; the other group including Rous
Sarcoma virus, HERV-K Simian Retrovirus 1 and MMTV)
[1-5]. A more recent study, including many more
recently discovered exogenous and endogenous retroviruses,
illustrates the diversity of retroviruses [6]. The
retroviridae are currently officially classified into seven different
genera, according to the Seventh Report of the International
Committee on Taxonomy of Viruses, 2000
(http://www.ncbi.nlm.nih.gov/ICTV). The seven genera are named alpha
through epsilon retroviruses plus lentiviruses and spumaviruses.
Phylogenetic trees based on pol gene sequences can be found in
several recent papers[6-8]. None of the retroviruses
in either group of complex retroviruses have been found in an
endogenous state to date.

Within the Lentiviruses, the primate lentiviruses discovered to
date form a monophyletic cluster (Figure 1). One notable difference
between the primate lentiviruses and the non-primate lentiviruses is
that all nonprimate lentiviruses, except the BIV/JDV lineage, contain
a region of the pol gene encoding a dUTPase, whereas all
primate lentiviruses lack this region of pol. In the BIV/JDV
lineage, an insert is still present in this region of pol, but
the sequence similarity is low, and the region seems to be no longer
capable of encoding a functional dUTPase. This lack of dUTPase,
coupled with the distances between primate lentiviruses as compared
to the distances between other mammalian lentiviruses, suggests that
the lentiviruses originated in a non-primate mammal. A single
transfer of virus from a non-primate mammal into primates was
followed by spread of the virus into many different primate species.
If the primate lentiviruses were older than the non-primate
lentiviruses, one would expect greater diversity within the primate
clade than between nonprimate clades. Feline immunodeficiency viruses
have been studied from wild African lions [9], as well as
North American wildcats [10], the Kazakhstan Pallas cat
[11] and housecats from around the world [12]. The
global diversity of feline immunodeficiency viruses is similar to the
diversity of the primate lentiviruses. The global diversity of
housecat FIVs is roughly twice as great as the diversity of the HIV-1
M group in humans. These facts indicate that domestic cats have been
spreading FIV around the world for a longer period of time than
humans have been spreading the HIV-1 M group, assuming that the two
viral lineages evolve at close to equal rates.

Within the primate lentivirus group, each species and subspecies
of primate seems to carry its own monophyletic lineage of lentivirus
or to be free from lentiviral infection. Humans seem to not have
their own lineage, but instead have acquired two very different
lineages, named HIV-1 and HIV-2, from chimpanzees and sooty
mangabeys, respectively. Baboons and macaques, like humans, seem only
to acquire lentiviruses from other species and not have their own
lineage[13-17]. The intra-species diversity in each
lineage is not yet fully understood, because only one or a relatively
small number of virus isolates from each species of primate have been
sequenced to date. Primates and felines captured in distant
geographic locations tend to have more diverse viral genotypes
[12], whereas animals of the same species captured in
proximity to one another tend to share similar genotypes of virus.
Cross-species transmission of virus appears to be rare in comparison
to within-species transmission, as evidenced by different primate

Figure 1. Phylogenetic Tree of Lentiviral pol
and gag Genes.

B: Nearly complete gag genes, from start codon
to the end of the EIAV sequences near the Gag-Pol ribosomal slip
site, were aligned. For both trees, columns containing gaps were
removed (thus removing the dUTPase region from the Pol data) and the
DNAdist and NEIGHBOR programs in the PHYLIP package were used. In
DNAdist, the F84 (also called maximum likelihood) model of evolution
was used with a transition/transversion ratio of 1.7. The trees were
re-sized in TREETOOL so that they both have the same distance scale.
The alignments used to build the trees are available from
btf@t10.lanl.gov.

species which share overlapping habitats and feline species which
share overlapping habitats but each species carries its own lineage
of virus. For example chimpanzee habitat overlaps African green
monkey habitat, but there is no evidence of exchange of virus between
chimpanzees and African green monkeys. Both interactions between the
host animals and species restrictions of the viruses can contribute
to cross-species transfer or lack of transfer. Likewise housecats all
over the world appear to share a domestic cat FIV, rather than
picking up FIVs from local wildcats. Examples of the relatively rare
cross-species transmission events are noted in the literature. A
Peruvian wildcat sampled in a zoo appeared to have picked up a
domestic cat FIV in captivity [10]. A Japanese wildcat was
infected with domestic cat FIV in the wild [18]. Humans are
another notable example, having apparently picked up three lineages
of virus from chimpanzees to create the HIV-1 M, N and O groups, and
as many as seven lineages of virus from sooty mangabeys to create
HIV-2 subtypes A though G [19-21]. Baboons, which are
noted to lack their own strain of simian immunodeficiency virus, have
repeatedly become infected from the vervet subspecies of African
green monkeys with which they share habitats[13, 14].
Cross-species transmissions of visna and Caprine
Arthritis-Encephalitis Virus (CAEV) between domestic goats and sheep
are reported to be rather common [18, 22-26]. The SIVs
from sooty mangabeys were also transferred into several species of
macaques in captivity in the USA [15, 17, 27].

Molecular evolution of the lentiviruses

Lentiviruses have been estimated to mutate at a rate as much as
106 to 107 times the rate of mutations found in eukaryotic germ-line
DNA [28, 29]. The rate of evolution of any region of a
genome, viral or cellular, is a function of the spontaneous mutation
rate, positive and negative selection pressures, and numbers of
generations per unit of time upon which selection can act. Factors
such as codon usage bias, methylation of CpG dinucleotides and
nucleotide composition bias are examples of selective pressures that
can contribute to the evolution rate. Lentiviruses not only mutate
faster than eukaryotes, but also have vastly higher numbers of
generations per unit time, and thus they are expected to, and are
observed to, evolve at a much faster rate than eukaryotes
[29]. One of the most recent estimates of the evolution rate
of members of the HIV-1 M group is 0.0024 (0.0018 to 0.0028)
substitutions per base pair per year in the env gp160 gene
region and 0.0019 (0.0009 to 0.0027) substitutions per base pair per
year in the gag gene region [30]. The
deltaretroviruses, also known as T-cell leukemia viruses (TCLVs)
[31], have many parallels to the lentiviruses. Both TCLVs and
lentiviruses infect white blood cells. Both are complex retroviruses
with regulatory genes in addition to the gag, pol and
env genes found in all retroviruses. Both have made numerous
cross species jumps from non-human primates into humans, and between
non-human primate species [14, 32-36]. Both are found
in non-primate mammals as well as in primates, with inter-primate
distances less than inter-mammalian distances [6]. Neither
has ever been found in an endogenous state in any mammal. It has
therefore been of interest to compare and contrast these two classes
of complex retroviruses.

Two recent papers have compared the molecular evolution of TCLVs
and primate lentiviruses [29, 37]. The Wodarz paper
[37] suggests that the large difference in evolution rate
between the two viruses is primarily due to the ability of HIV-1, but
not HTLV-I, to infect macrophages in addition to CD4+ T-helper cells.
The Sala paper [29] suggests that the difference in rates of
evolution between HIV-1 and HTLV-I is due to clonal expansion of
HTLV-infected cells, versus active replication of HIV-1. The Wodarz
paper specifically disagrees with the explanation offered by Sala.
Neither paper addressed the issue of the difference in nucleotide
composition between the TCLVs and lentiviruses. The lentiviruses are
all A-rich and C-poor, with nucleic acid compositions of 36% A, 18%
C, 24% G and 22% T. The A-richness and C-poorness is roughly evenly
distributed across the lentiviral genomes, with the exception of
bovine immunodeficiency-like virus and Jembrana disease virus, which
both show a dramatic difference in nucleotide composition between the
5' and 3' halves of their genomes (Figure 2 panels A and B). The
T-cell leukemia viruses are C-rich and G-poor with nucleic acid
compositions of 23% A, 33% C, 18% G and 23% T, again with a
relatively constant composition bias across the genome (Figure 2
panels C and D). Although Figure 2 uses a 500 base window to smooth
out the nucleotide composition plots,

Figure 2. Nucleotide Composition of Lentiviruses
and Deltaretroviruses. A Microsoft EXCEL spreadsheet was used to
count the number of each base within a 500 base sliding window across
each genome. In the top two panels, HIV-1 isolate B-HXB2 (K03455) is
compared to bovine immunodeficiency virus isolate HXB3 (M32690) to
illustrate the marked change in composition in the BLV genome. All
other lentiviruses examined (FIV, Visna, CAEV, HIV-2, SIV-SMM,
SIV-AGM) were A-rich across the genome, similar to HIV-1 (data not
shown). In the lower panels, HTLV-I isolate CAR (D13784) and BLV
isolate B19 (AF257515) are plotted for comparison to each other and
to the lentiviral genomes.

smaller window sizes can be used to detect regions of the genome,
such as the TAR element and the RRE, where RNA secondary structures
require a more G+C-rich nucleotide composition bias. No biological
reason for the difference in nucleotide base composition bias between
the T-cell leukemia viruses and lentiviruses is apparent at this
time. The fact that there is a striking difference suggests
differences in the pattern of evolution as well as differences in
rates of evolution between these two types of virus. Within the
lentiviruses, the bovine viruses were exceptional in their nucleotide
composition pattern across their genome (Figure 2), and also
displayed nucleotide composition within the pol gene that
differed from the other lentiviruses in being less A-rich (Figure
3).

The lentiviruses are extremely diverse in the DNA sequences of
their genomes and the proteins encoded by them. Sala and Wain-Hobson
point out that the amino acid sequence distance between the Pol
protease peptides of HIV-1 and HIV-2 is roughly equivalent to the
distance observed in homologous proteins from eubacteria and
eukaryotes which last shared a common ancestor some 2 billion years
ago [29]. The 106-fold faster rate of evolution in
lentiviruses compared to those DNA genomes indicates that this
diversity could have accumulated in closer to two thousand years than
to two billion years. Any such speculations about the date of the
most recent common ancestor of the primate lentiviruses are highly
uncertain at this time, for numerous reasons. Korber et al.
have shown that there is some uncertainty in the slope of the DNA
sequence distances vs. time line even within the relatively closely
related HIV-1 M group of primate lentiviruses [30]. They
estimated that the HIV-1 M group of viruses last shared a common
ancestor between 1915 and 1941 with a 95% confidence interval on the
slope. Salemi et al. used a different method and arrived at a
similar date for the origin of the HIV-1 M group. They further
calculated that the common ancestor of the HIV-1 M group and the
SIV-CPZs isolated from Pan troglodytes troglodytes dated to
the late 17th century with a 99% confidence interval for a date
between 1591 and 1761 [38]. The epidemiology of the AIDS
pandemic tends to agree with a 20th century origin of the HIV-1 M
group. The epidemic apparently went unnoticed in central sub-Saharan
Africa for several decades [39]. Both the epidemiology and
the molecular phylogenies of the HIV-1 O group and the HIV-2 viruses
are less well understood than those of the HIV-1 M group. The
nucleotide and amino acid sequence diversity of HIV-2 viruses is
greater than the diversity of the HIV-1 M group viruses, but roughly
equivalent to the diversity within the HIV-1/SIV-CPZ clade (Figure
1). The diverse HIV-2 lineages are thought to have arisen in sooty
mangabeys, with several cross-species transfers from sooty mangabeys
into humans, one for each subtype of HIV-2. The subtypes of HIV-2 are
thus analogous to the groups (M, N and O) of HIV-1, both in terms of
sequence diversity and in terms of the cross-species transfer events
thought to have created them.

Figure 3. Nucleotide Composition of Lentivirus
pol genes. The pol gene alignment used to produce the pol tree in
figure 1 was analyzed for nucleotide composition. Each set of points
(A, C, G and T) is for one viral sequence.

In any estimation of rates of evolution and times of divergence from
common ancestors, an assumption that different lineages of the same
organism evolve at similar rates greatly simplifies the calculations.
A recent paper by Salemi et al. estimated that lineages of the
HTLV-II virus evolves at a rate 150 to 350 times faster when the
virus is transmitted between IV drug users, than lineages of the same
virus propagated by mother-to-infant transmission [40]. Many
studies of HIV-1 evolution in IV drug users and other communities
have not turned up any such discrepancy in the rate of evolution of
HIV-1 [41-44]. One study found that the rate of
mutation of the env gene was 62% higher in frequent drug
injectors, compared to those who had not injected drugs in the last 6
months [45]. Salemi et al. postulated that the
increase in viral transmission rate between IV drug users, which can
be many transmissions per year, accounted for the increase in HTLV-II
evolution rate. HTLV-II mother to infant transmission would be
expected to occur just once every 14 to 30 years for a given viral
lineage. With HIV-1, both sexual and IV drug user transmissions can
occur several times per year, and long-term chains of mother to
infant transmission are never expected due to the lethality of
HIV-infection in infants. In the rapidly expanding IV drug user
epidemics that have been studied, no increase in HIV-1 evolution rate
of needle-spread HIV1 compared to sexually transmitted HIV-1, has
been noted [46-48].

Conserved elements in the lentiviral genome

Despite the rapid evolution rate of lentiviruses, many elements in
the lentiviral genome have been conserved over time. One of the most
conserved regions of the lentiviral genome is the Lys-tRNA primer
binding site (PBS), where the host Lysine transfer RNA hybridizes to
the viral RNA genome to serve as a primer for reverse transcription.
The PBS is short, just 15 bases (GAACAGGGACUUGAA), but nearly
perfectly conserved in all lentiviruses. It exists within a secondary
structure element that is conserved in structure, but not sequence
[49]. The polypurine tract, Rev-responsive element, Phi
element, and other elements involved in replication and packaging of
the viral genome are also conserved to varying degrees between all
lentiviruses. The protein-coding regions of the genome are also
conserved to varying degrees, with gag and pol being
more conserved overall, than tat or env. Within the
proteins, catalytic and/or functional domains are highly conserved.
One such element is the C-X2-C-X4-H-X4-C zinc knuckle domain of the
Gag p7 nucleocapsid peptide that binds to the psi element to
specifically package the viral genome into budding virions
[50]. Whenever two regions of the genome need to coevolve to
retain the interaction between the two elements, such as Tat protein
binding to the TAR element or Rev protein binding to the RRE element,
the evolution rate of both regions is slowed.

The gag and pol genes of lentiviruses are conserved
well enough that multiple sequence alignments of this region can be
built with confidence. The env gene is much more variable,
including many insertions and deletions, and even if the cysteines
that form disulfide bonds to create the loop structures of the Env
protein are aligned, one cannot be sure that the alignment is
phylogenetically correct. Although the Env protein of the bovine
immunodeficiency and Jembrana disease viruses display an aberrant
nucleotide composition bias in comparison to the other lentiviruses,
there is no evidence that this is due to recombination. A BLAST
search of their env genes against the databases produces other
lentiviral env genes as the most significant matches (Figure 2
and data not shown).

Although the linear sequence of nucleotides in the env
genes or amino acids in the Env proteins are not highly conserved,
the coiled-coil structure of the fusion domains of lentiviruses are
conserved and share structural similarity with the fusion domains of
other viral and host proteins [51-54]. Likewise, the
structures of polymerases and proteases are conserved, as are key
residues in catalytic sites [55-66]. A review of HIV
protein structures was published in 1998 [67]. Likewise,
although the lentiviruses display remarkable diversity in linear DNA
sequences, the nucleotide composition bias appears to be constrained
as shown in Figures 2 and 3.

[24] Leroux, C., et al., Genomic heterogeneity of
small ruminant lentiviruses: existence of heterogeneous populations
in sheep and of the same lentiviral genotypes in sheep and goats.
Arch Virol, 1997. 142(6):1125-37.

[38] Salemi, M., et al., Dating the common ancestor
of SIVcpz and HIV-1 group M and the origin of HIV-1 subtypes using a
new method to uncover clock-like molecular evolution. FASEB
J., 2001. 15(2):276-278.

[55] Ding, J., et al., Structure and functional
implications of the polymerase active site region in a complex of
HIV-1 RT with a double-stranded DNA template-primer and an antibody
Fab fragment at 2.8 A resolution. J Mol Biol, 1998.
284(4):1095-111.