The AIDS pandemic is a new problem for humans, but it is unclear whether the
human immunodeficiency virus (HIV) giving rise to AIDS is also new to humans.
Either (1) HIV has recently infected humans, in which case we have a new virus
and a new disease, or (2) HIV infected humans long ago (being mild and/or
restricted in range until recently), in which case we have an old virus and a
new disease. There are precedents for each scenario among known viruses
causing diseases (see Shope and Evans, 1993). The new virus and old virus
scenarios have profoundly different implications for understanding the
mechanisms of HIV propagation and the etiology of AIDS, for combating AIDS,
and, potentially, for efforts to prevent future epidemics. The terms "new" and
"old" are ambiguous beyond denoting relative age; so, for purposes of this
article we consider a new virus one that has infected its host species within
the past 50 years or so.

The first view, that HIV has only recently contacted humans, entails recent
cross-species transmission of a simian immunodeficiency virus (SIV) from one or
more nonhuman primates, and represents the current conventional wisdom
(Dietrich et al., 1989; Doolittle, 1989; Allan et al., 1991; Fox, 1992; Myers
et al., 1992, 1993; Hirsch et al., 1993; Temin, 1993; Myers and Korber, 1994).
However, some have suggested that certain rural African populations of humans
may have been infected with an immunodeficiency virus for many decades,
centuries, or even millennia (Montagnier, 1985; Hahn, 1990; McClure, 1990), and
Ewald (1991, 1994) has described an evolutionary model in which virulent
strains are placed at a selective advantage by higher rates of sexual partner
change.

Understanding HIV origins is of general interest to systematists as well.
Viruses evolve by descent with modification like any other group of organisms,
and systematists will become increasingly involved in attempts to understand
their complex histories as more of their DNA sequences become available.
Systematists working on viruses need to consider distinctive features of viral
evolution, including extremely high rates of molecular sequence evolution,
subsequent high levels of within-population sequence variability (variously
described as yielding species swarms or quasispecies), evolutionary rates that
vary depending on the species of host and type of cell infected, potential for
recombination when representatives of different viral lineages infect the same
host cell, and potential biases in the sampling of host species.

Our objectives here are to (1) assess the evidence used in support of the "new
virus" hypothesis, (2) present our own phylogenetic analyses of representative
viral taxa, (3) estimate the most-parsimonious evolution of the character
"virus host" in the study taxa, and (4) comment on methodological issues in the
systematics of viruses.

Although it is not possible to reject either hypothesis, we conclude that any
consensus favoring the "new virus" hypothesis is not justified on the basis of
current evidence, and that the "old virus" hypothesis remains a viable
alternative.

Phylogenetic Analyses of HIV Origins

Previous Studies

HIVs and SIVs are retroviruses, a group characterized by the ability to reverse-transcribe RNA into DNA. HIV and SIV genomes are about 10 kilobases in size and contain at least nine recognizable genes. Previous phylogenetic analyses of primary HIV and SIV lineages have used, variously, the pol and gag genes. The pol gene encodes reverse transcriptase and endonuclease enzymes, whereas gag encodes capsid protein (gag p24), which forms a shell around the viral RNA, and several internal proteins (gag p15, gag p18) functioning in viral reproduction (Stine, 1993; Hahn, 1994). The regions of pol encoding reverse transcriptase and endonuclease comprise the most slowly evolving regions of the genome (McClure et al., 1988). Previous studies have found HIVs and SIVs to form a monophyletic group, with immunodeficiency viruses from the domestic cat (FIV), sheep (VISNA), and horse (EIAV) being closely related to the primate immunodeficiency viruses (Doolittle et al., 1989; Yokoyama, 1991).

Because HIVs are parasites, the question of their origins includes two issues: specifically, (1) the phylogeny of the viruses, and (2) the history of virus transmission among host species. Parsimony may be used in addressing both of these questions. Parsimony is used in estimating the history of host shifts for viral taxa by minimizing the number of ad hoc assumptions of host shift on a phylogenetic tree. This minimization of ad hoc assumptions of host shift is justified on the same logical basis as the minimization of ad hoc assumptions of character convergence (homoplasy) in phylogenetic analyses using a parsimony criterion (see Farris, 1983). Further, most viruses are narrowly host-specific and unable to survive immune system surveillance in new host species. This is also consistent with a parsimony approach to assessing virus transmission history. It is important to note, however, that patterns of the phylogeny for extant viruses and their history of transmission need not be congruent. For example, the appearance of a virus from a particular host species as basal in a phylogeny could be the result of extinction of lineages from the true early host, which "gave" the virus to the current host recently, or a lack of sampling from the true early host species. Also, phylogenetic trees for viral taxa do not indicate whether a particular host species was a virus donor or a virus recipient. The trees simply denote hypothesized lineage-splitting events among viruses. For these reasons, phylogenetic trees for viruses must be interpreted with caution in assessing the history of virus host shifts.

Use of phylogenetic analyses to support the "new virus" hypothesis is common in the literature. For example, Myers et al. (1993:126) present a phylogenetic hypothesis for HIVs and SIVs based on a 648-nucleotide base section of the gag p24 gene (Fig. 1) and say that this "phylogenetic tree analysis... strongly supports the hypothesis of the simian origin of AIDS (the "new virus" hypothesis)". However, we see no such support in the tree itself. We note that there are both HIVs and SIVs on either side of the first bifurcation event within the tree, and that various HIVs could have either descended from or given rise to various SIVs. For example, sooty mangabey viruses (ancestral to SIVsmmh4 and SIVsmpbj in Fig. 1) might have given rise to HIV2s (HIV2rod, HIV2d205) or, conversely, an ancestral HIV2 lineage may have given rise to the sooty mangabey viruses. The tree itself is consistent with either scenario. Phylogenetic trees show purported sister relationships among extant lineages, but do not denote ancestor desendent relationships among those extant lineages.

Figure 1. Phylogenetic tree topology from Myers et al. (1993) based on
immunodeficiency viral gag p24 sequences for 648 nucleotide positions.
Tree is rooted at the midpoint of the greatest patristic distance. Host
species and abbreviations for viruses are defined in the Appendix.

Rather, support for the "new virus" hypothesis is non-phylogenetic and
circumstantial, being rooted in unsupported assumptions that (1) surveys of HIV
presence in human blood samples collected before 1980 are based on reliable and
sufficiently large samples, (2) high virulence denotes recency of the infection
of the host species, (3) all HIVs are virulent, giving rise to AIDS, (4) SIVs
are not virulent in their natural hosts. Regarding the first point,
researchers point out that HIV seropositivity assays for human blood samples
collected prior to the 1980s are largely negative, and that assays for blood
samples collected during the 1980s in particular, show increasing levels of
positivity (Grmek, 1990; see Myers et al., 1993 and references therein). This
is consistent with the notion of the first human infection occurring during the
middle portion of this century, but does not refute the alternative hypothesis
that HIVs were present in one or more small and possibly isolated human
populations not represented in pre-1980 or even pre-1959 blood samples (1959 is
the date of collection for the earliest known seropositive sample). As
dramatic as the seropositivity surveys are, their obvious geographic and
quantitative sampling limitations compromise their ability to delineate the
timing of infection. In a review of retrospective seropositivity surveys,
Myers et al. (1993b) discuss the data available, which comes from just seven
geographic locations across Africa involving hundreds or thousands of human
blood samples. Not surprisingly, vast regions of Africa with millions of human
inhabitants are not represented in retrospective seropositivity surveys. HIVs
are lentiviruses which are characterized as a group by their potential for long
periods of latency with no visible effects on hosts. HIVs present, though
perhaps inconspicuous, in small isolated human populations or demes could
readily have been missed by limited pre-1959 (or even pre-1980) blood
samplings. Current high levels of virulence, particularly for HIV1s, may have
been generated by recent changes in host behavior relating to virus
transmission opportunities (Ewald, 1994).

Regarding the other three assumptions listed above, Doolittle (1989:339) notes
the circumstantial view in pointing out that "primary hosts all seem to be
healthy" whereas the likely secondary hosts are not healthy. Myers et al.
(1992:373) stated,

"given the pervasiveness of SIVs in diverse African monkey
populations, and the relative newness of HIV in human populations, the
hypothesis of a recent simian origin of human AIDS through one or more events
of cross-species transmission has gained widespread acceptance over the past
few years."

However, if the "relative newness of HIV in human populations" is
"given" how can one reach any other conclusion? Often, authors simply note
the sister relationship between viruses isolated from different host species
and presuppose the direction of infection to be from nonhumans to humans. The
assumptions mentioned should not be accepted uncritically and are discussed
below. Other researchers have been more cautious in inferring direction of
infection from phylogenetic analyses. For example, Hirsch et al. (1989:391)
noted that their

"data cannot exclude the possibility that HIV2 from a human
was passed to a sooty mangabey and subsequently evolved as SIVsm... [and that]
sequences of older HIV2 and SIV isolates (from mangabeys or other species) are
required to resolve these issues."

Although direction of infection cannot be inferred from a sister relationship
between two viruses from different host species, accurate tree topology is
crucial in estimating the number of host shifts that have occurred, and the
most-parsimonious sequence of host shift events. Many aspects of HIV and SIV
relationships are poorly resolved, particularly the earliest divergences
involving five lineages: (1) HIV1s/SIVcpz, (2) HIV2s/SIVsm/SIVmac, (3) SIVmnd,
(4) SIVagms, and (5) SIVsyk. HIVs, as currently named, are clearly not
monophyletic. The two primary HIV types, HIV1 and HIV2, each include
representative strains (or taxa) that are more closely related to one or more
SIVs than they are to other HIVs. As a corollary, SIVs are also not
monophyletic. Whether HIV1s are monophyletic and HIV2s are monophyletic has
been less clear. SIVcpz, previously placed as sister to all HIV1s (Huet et
al., 1990), may belong inside an HIV1 clade when divergent HIV1s are included
in analyses. Similarly, SIVs from sooty mangabeys and macaques are often,
though not always, placed within the HIV2 clade.

Clearly, there have been multiple host species shifts by the viruses, and,
given a nonsister relationship between the HIV1 and HIV2 types, proponents of
the "new virus" hypothesis must invoke at least two recent and independent
human infections with quite different viruses, each capable of causing AIDS.
HIV1s and HIV2s are about 40% different in nucleotide sequence over the entire
genome and differ in presence of some accessory genes. HIV1 and SIVcpz contain
two open reading frames, termed vpr and vpu, whereas
HIV2/SIVmac/SIVsm lack the vpu gene but contain another gene termed
vpx. Rather than requiring at least two recent human infections with
quite different viruses, the "old virus" hypothesis requires only one
cross-species infection of humans.

Phylogenetic analysis was conducted on the aligned, edited sequences using
PAUP, 3.1.1 (Swofford, 1993). Given the large number of terminal taxa and
large size of the data matrices, we used a heuristic search algorithm with the
tree bisection-reconnection branch swapping procedure. Because the heuristic
search does not explore all possible topologies to find the shortest tree, we
repeated the search 100 times for each analysis. Each search was initiated
using a different randomly constructed starting topology, reducing the
possibility that the algorithm will find a local parsimony optimum rather than
the universal optimum for a particular data set.

Inaccuracies in phylogenetic analyses stem from an inability to discriminate
homologous similarity (due to descent) from homoplasious (convergent or
parallel) similarity. Two steps for molecular systematists in making this
discrimination are choosing genes that are not saturated with change (having
multiple substitutions at individual base positions), and using a
data-set-dependent, a priori weighting scheme to place greater weight on those
characters whose rates of change are relatively slow, as similarities among
such characters will tend to include less homoplasious similarity (Mindell and
Honeycutt, 1990; Hillis et al., 1993). Toward this end, we have calculated the
number of third codon position transition and transversion changes for
pol and gag p24 DNAs between representative HIV and SIV lineage
pairs (Table 1). We expect any homoplasious similarity to be found
particularly in third codon position transitions. Third codon positions tend
to have faster rates of change due to the greater number of synonymous
substitutions that are possible there, relative to first and second codon
positions, and a tendency for transition substitutions to accumulate more
rapidly than transversions has long been known (Brown et al., 1982; Graur,
1985).

If DNA sequence characters are saturated with change, the number of inferred changes will not increase as divergence time increases between taxon pairs. That is, the correspondence between time and increasing amounts of sequence divergence will break down. We can make such comparisons among our study taxa by noting that divergences within subsets of HIV1s (excluding HIV1ant70 and HIV1mvp) and within HIV2s (excluding HIV2d205 and HIV2uc1) are more recent than divergences among the primary HIV/SIV lineages (HIV1, HIV2, SIVagm, SIVmnd, SIVsyk), which in turn are more recent than the divergence of their common ancestor from FIV (Doolittle et al., 1989; Yokoyama, 1991). Table 1 indicates that pol and gag p24 third codon position transitions are relatively saturated with change, as pairwise comparisons with FIV show no more changes than do comparison among the primary HIV/SIV lineages. Conversely, third position transversions are relatively unsaturated with change, as comparisons with FIV consistently show more changes than do other comparisons. Interestingly, third position transition comparisons with FIV are actually smaller than more recent divergences among HIV/SIV lineages, as would be expected when more slowly accumulating transversions begin to overwrite transitions. Similar comparisons to those in Table 1 for codon positions one and two (data not shown) indicate nonsaturation for both transitions and transversions at those positions. Thus, we give third position transitions a weight of zero, a priori, in our phylogenetic analyses to reduce the confounding effects of nonhomologous similarity.

The relative support for each node within the minimal-length topology was evaluated using the support index (Bremer, 1988; Källersjö et al., 1992), which denotes the difference in length between the most-parsimonious tree and the shortest tree in which the particular node (clade) is not present. To estimate the support index for a particular clade, we constructed a constraint tree in which the clade is the only resolved relationship among the study taxa and then used 10 replicate heuristic searches, with random stepwise addition of taxa, to find the shortest fully-resolved topology in which that relationship was not present.

Results.---We found a single most-parsimonious tree in analysis of the
combined pol and gag p24 DNAs, giving third codon position
transitions zero weight, and we consider this our current best estimate of
phylogenetic relationships for the viruses (Fig. 2). Nodes within the tree
vary in their degree of support, based on the indices reported. SIVcpz is
sister to six HIV1s, and two other HIV1s (HIV1ant70, HIV1mvp) are basal to this
group. This placement of SIVcpz inside a larger HIV1 clade suggests that there
was either one viral host shift from humans to chimp or two host shifts from
chimp to humans. Myers et al. (1992:373) have suggested that HIV1ant70 "may
ultimately be interpreted as yet another SIV form." This interpretation might
seem to reduce the likelihood that a human to chimp transfer has occurred, in
that no HIV1 would diverge "prior to" the divergence of SIVcpz. However,
interpretation of HIV1ant70 as an SIV (and an aberrant colonist in humans)
connotes that HIV1ant70 represents a rare divergent viral lineage in humans,
and recent evidence shows this not to be the case. Nkengasong et al. (1993)
found that blood samples from 16 humans from Cameroon and Gabon Africa reacted positively with HIV1ant70 peptides in enzyme-linked immunosorbent assays
(ELISA), indicating HIV1ant70 to be endemic among HIV1 seropositive individuals
in these two countries and, thus, more common than previously thought. Further
indicating endemism, HIV1mvp (also from a Cameroonian) is placed as sister to
HIV1ant70 at a strongly supported node in our parsimony analysis. On the basis
of current evidence, we are left with the indication of human to chimp viral
transfer as more parsimonious than the reverse (one host species shift versus
two; Fig. 3). Publication of sequences from additional chimp SIVs will help
resolve this issue.

Figure 3. Most parsimonious evolution of the character "viral host" based on
our most parsimonious tree topology (presented in Fig. 2). Changes in viral
host are shown (patterns and shadings of branches) invoking the fewest possible
number of shifts. Branches shown as equivocal denote that two or more
character states (viral hosts) are possible without altering the minimum number
of changes invoked. Because species distinctions among African green monkeys
are unclear, we have conservatively listed the four SIVagms shown (here and in
the Appendix) as representing one host species, although we note that some
might recognize three (SIVagm155, SIVagmtyo, SIVagm3) as being from
Cercopithecus pygerythrus and one (SIVagm677) as being from C.
aethiops. Including two distinct African green monkey species here,
however, does not result in any additional changes in the character "viral
host" elsewhere in the tree.

Our analysis shows HIV2s to be polyphyletic, by placement of two HIV2s
(HIV2d205, HIV2uc1) as sister to a clade including HIV2s and SIVs from sooty
mangabeys and macaques (Fig. 2), with the latter clade (excluding HIV2d205 and
HIV2uc1) being moderately well supported. Like the situation with HIV1s and
SIVcpz, we are left with the more parsimonious scenario indicating virus
transmission from humans (HIV2s) to sooty mangabeys and macaques (Fig. 3). Our
tree places SIVmnd and SIVsyk as basal to the other HIVs and SIVs, although
those nodes have relatively moderate levels of support. SIVagms are sister to
HIV2s/SIVsms/SIVmacs in agreement with most previous analyses, but differing
from Doolittle (1989). Based on current evidence our topology is preferable
due to its incorporation of more character evidence and a data-set-dependent, a
priori weighting scheme which reduces effects of homoplasious similarity.

Our tree (Fig. 2) differs from that of Myers et al. (1993; Fig. 1) in not
indicating HIV2 monophyly and in the relative placement of SIVmnd and SIVsyk.
Myers et al. used unweighted gag p24 region sequences alone, which can
be seen to include homoplasious similarity based on our pairwise comparisons
(Table 1). Their exclusion of pol sequences weakens their analyses, as
pol includes the most conserved, and hence most phylogenetically
reliable, sequences in the genome. They also used midpoint rooting which
should only be used as a last resort, when no suitable outgroup is available.
The midpoint method places the basal node for any tree arbitrarily along the
longest path connecting any pair of taxa. This assumes constant rates of
character change across taxa without justification, and differences in rate
sufficient to affect placement of the basal node will change sister
relationships shown in the tree. Although they acknowledge the basal position
of HIV1ant70 relative to other HIV1s and SIVcpz, they do not include HIV1ant70
(or its sister taxon HIV1mvp) in their analysis. This exclusion and their
diagnosis of HIV2s as monophyletic, despite numerous analyses contradicting
HIV2 monophyly (e.g., Dietrich et al., 1989; Gao et al., 1992; Myers et al.,
1992; Barnett et al, 1993) allow Myers et al. (1993) to consistently favor the
"new virus" hypothesis and ignore the alternative "old virus" hypothesis.

Minimum Evolution of the Character "Virus Host"

We used our tree topology (Fig. 2) to infer the most-parsimonious pathway of cross-species infection within the primate immunodeficiency viruses. Host species for each of the 28 viral isolates was coded as a character state, and the minimum number of changes among alternative states were then distributed on the tree (Fig. 3; using MacClade; Maddison and Maddison, 1992). As mentioned above, within the HIV1/SIVcpz clade, human is shown as the ancestral host species. Similarly, within the HIV2/SIVsm/SIVmac/SIVagm clade, human is also shown as the ancestral host species. Basal character state for the entire HIV/SIV clade is "equivocal." That is, two or more different states (virus hosts) could be invoked without altering the number of changes on the overall tree. Recently, Myers and Korbin (1994) have been able to include a second SIV from a chimpanzee (cpzant) in their phylogenetic analyses (although this sequence is currently unpublished), and their analysis places cpzant as sister to the clade including the HIV1s and SIVcpz. Even with inclusion of cpzant in this position on our tree diagnosing change in the character "virus host" (Fig. 3) human remains the most-parsimonious ancestral host for the HIV1/SIVcpz/cpzant clade as well as for the HIV2/SIVsm/SIVmac/SIVagm clade.

Obviously, this analysis does not resolve the sequence of host species shifts,
and we make no such claim. Inference from this analysis is confounded by
sampling bias, as many more viruses have been sequenced from humans than from
any other primate species. Human is shown as the ancestral host for the two
clades mentioned above because two of the five most divergent primate
immunodeficiency viruses are isolated from humans, whereas each of the other
three divergent viral lineages is unique to a different host species. If, for
example, further sampling of SIVs from African green monkeys or mandrills were
to uncover taxa within each of those lineages as divergent as HIV1 and HIV2
types are from each other, that would alter the character-state changes as
inferred in Figure 3. We point out, however, that some such sampling has been
done for African green monkeys from disparate locales in Africa, and
divergences as great as those seen between HIV lineages are not observed.
SIVagms from western Africa (e.g., Senegal) and from eastern Africa (Kenya,
Ethiopia; as included in our study), form a monophyletic group (Allan et al.,
1991), in contrast to HIV1s from eastern and central Africa and HIV2s from
western Africa which do not form a monophyletic group (Fig. 2). We also note
that changing the tree topology in Figure 3, such that SIVagms are basal to the
entire HIV1/HIV2 clade, does not change the diagnosis of ancestral host species
within either the HIV1 or the HIV2 clade. Our point in presenting this
analysis is simply to show that the current evidence does not support the "new
virus" (new in humans) hypothesis.

Virulent Viruses Are New and Nonvirulent Viruses Are Old: An
Oversimplification

It has long been thought that mutualistic associations between parasites and
hosts are more stable evolutionarily than are parasitic or destructive ones
(Smith, 1939; Burnet and White, 1972). Parasites that quickly kill their hosts
will provide little opportunity for their progeny to successfully colonize new
host individuals, and, hence, may go extinct. This observation is reflected in
a widely claimed tendency for viruses to evolve toward avirulence (particularly
found in medical texts), and the following quote from Dubos (1965), "Given
enough time a state of peaceful coexistence eventually becomes established
between any host and parasite."

This has led to an oversimplified prescription that virulent viruses are new
and that nonvirulent viruses are old. It is becoming increasingly evident,
however, that there can be great variation in the timing and direction of
virulence change. Just as a virus can change in its effects from pathogenic to
benign, it can also change from benign to pathogenic, depending on natural
selection and the effects of changing replication rates on the fitness of the
virus (Ewald, 1994). As described by May (1993:66),

"There is no
generalization [regarding change in virulence for many or most viruses]. The
virus may become less virulent, more virulent, or exhibit unchanging virulence;
the virus may become less transmissible, more transmissible, or show unchanging
transmissibility. All of this depends on the tradeoffs among virulence,
transmissibility, and the cost of resistance, which are also constrained by the
nature of the host-pathogen association."

Examples of viruses that have shown
an increase in virulence at one time over another include influenza A (see
Langmuir and Schoenbaum, 1976; Webster, 1993) and myxoma virus (Dwyer et al.,
1990; Fenner and Kerr, 1994). Levin and Pimentel (1981) simulated the
evolution of a simple system with one host species susceptible to two viral
lineages, one of which is more virulent than the other. They found no general
trend toward avirulence, and that increased virulence may be favored when it
increases transmission rate.

In keeping with the older view of virulence, apparent mildness of SIV in sooty
mangabeys and African green monkeys has been attributed to an old virus/host
association. However, one need not invoke an old association to explain
avirulence. The mildness could be attributed to relatively low rates of sexual
partner change. Sooty mangabey females have apparent low rates of sexual
partner change, restricting copulation to a few males during their estrus
period, and not copulating during a prolonged period of maternal care (T.
Butynski, pers. comm.). African green monkey females are sexually receptive
only seasonally and in groups controlled by a single male (Fedigan and Fedigan,
1988). Thus, potential for rapid spread of SIV through these species appears
limited, and viral strains with a rapid replication rate (compromising their
host's immune system and health) will have little selective advantage.

Results of laboratory infections of chimps with HIV1 are also inconsistent
with the supposition that low virulence denotes a long virus/host association.
No AIDS-like disease has been observed among over 100 chimps that have been
experimentally infected with HIV1, nor among the minority that have remained
infected for 5 to 10 years (Fultz, 1993; Johnson et al., 1993). Further, in
chimps in which HIV1s have become established and have increased in numbers,
the capability for successful infection of chimp blood cells has increased
(Gendelman et al., 1991; Watanabe et al., 1991), indicating a potential for
virulence to increase over time.

The avirulence of SIVs in sooty mangabeys, chimps, mandrills, and other
species also remains open to question. A severely ill individual would not
last long in nature, compared to infected but asymptomatic or recovered
individuals that could complete normal life spans. For this reason, snapshot
seropositivity surveys of existing populations may underestimate the frequency
of infections associated with severe illnesses. A highly virulent, molecularly
cloned, SIV strain originally from a sooty mangabey (SIVsmpbj; Dewhurst et al.,
1990) causes death in experimentally infected sooty mangabeys and macaques
(Fultz, 1993), whereas the original parental virus caused a chronic AIDS-like
syndrome in macaques and only asymptomatic infection in sooty mangabeys. This
belies the notions that SIVs in their "natural" host species are exclusively
avirulent and that they cannot become more virulent over time. The laboratory
transmission that has favored increased virulence in this SIV variant is
similar to that proposed for HIV. Rapidly reproducing and severe variants can
be maintained if the rapid reproduction provides them with a fitness advantage
over more slowly reproducing strains.

The "old virus" hypothesis holds that primitive HIVs may have had
low virulence and were maintained in a population that displayed low levels of
sexual partner change, perhaps in a rural area. This leads to a prediction
that some early divergent, low virulence viral strains could still be extant in
such populations, and viral isolates have been discovered. HIV2d205 and
HIV2uc1 represent an early divergent lineage within HIV2/SIVsm clade and were
obtained from asymptomatic individuals from rural Ghana and Ivory Coast,
respectively. HIV2uc1 is entirely noncytopathic and readily neutralized by
sera from HIV2 infected individuals (Barnett et al., 1993).

Issues in Systematics of Viruses

Conflicting Topologies for Viral Gene-trees May or May Not Indicate
Recombination Among Viral Lineages

When two or more individual viruses penetrate a particular host cell and begin
nucleic acid replication, the potential exists for recombination among the
viral genomes due to a replicase enzyme slipping from one viral genome template
to another (Coffin, 1979; Hu and Temin, 1990). Recombination among HIV
variants has been found to occur in vitro (Clavel et al., 1989) and has been
inferred or suggested to occur in vivo based on (1) observed viral sequences
having a mixture of components from formerly distinct lineages (Howell et al.,
1991) and (2) conflicting tree topologies based on phylogenetic analyses of
different genes (Li et al., 1988; McClure et al., 1988; Gao et al., 1992; Myers
et al., 1993). Conflicting phylogenies based on different genes, however, may
also stem from differential success in phylogenetic analyses. That is, one
tree might be accurate whereas the other is not, despite absence of any
recombination. In analyzing different genes and different types of
substitutions changing at different rates, systematists often find different
data sets supporting different trees. This may stem from differential success
in distinguishing homologous from homoplasious similarity (distinguishing
signal from noise) in the different data sets (Farris, 1983; Swofford and
Olsen, 1990; Hillis, 1991; Mindell, 1991). In considering recently diverged
taxa, this might also stem from the confounding effects of within-population
variation on analyses among higher level taxa (Neigel and Avise, 1986; Avise,
1989). Within-population variation for retroviruses can be extreme, depending
on which gene regions are considered (Zarling and Temin, 1976; Holmes et al.,
1992). Systematists working on viruses will need to consider these
possibilities prior to invoking recombination to explain such conflicts in gene
tree topologies.

High Extinction Rates and Sampling Problems

Because of their short generation times, large numbers of progeny, and high
mutation rates, viruses have a great capacity for rapid diversification.
Consequently, there is also a great capacity for lineage extinction events, and
this has implications for studies of phylogeny and the history of host shifts.
Inclusion of fossil taxa has been seen to alter inferred phylogenetic
relationships in studies of plants and animals (Doyle and Donoghue, 1987;
Gauthier et al., 1988) and the same can be expected to occur in analyses of
viruses. Obtaining more samples of extant taxa and, where possible, extinct
taxa on the basis of viral sequences from preserved tissues will help in
understanding the effects of this taxon inclusion/exclusion problem. Peter
Houde and colleagues (ms in review) at New Mexico State University are working
on amplifying and sequencing SIVs from primate museum study skins and, in the
process, of identifying new host species and minimum dates for host species
infection.

Clear distinction must be made between phylogeny of the viruses and the
history of their distribution, as the two need not be congruent. Divergent
lineages such as SIVsyk and SIVmnd may appear basally in a phylogenetic
analysis (as in Fig. 2), without Sykes' monkey or mandrill being old (early)
host species. That basal appearance could be the result of the extinction of
lineages from the true early host, which "gave" the virus to current hosts
relatively recently, or of a lack of sampling from the true early host species.
Just as the true phylogeny for any set of taxa is unknowable (unless directly
observed) and can only be inferred, the true history of viral host-shifts can
also only be inferred. For this reason, attempts to determine the natural or
ancestral host of a virus will always be susceptible to biases from
"unobserved" host shifts, related to high extinction rates for viral lineages
and the inevitably small samples available for analysis.

Rate Heterogeneity

RNA viruses like the primate immunodeficiency viruses, with base substitution
rates averaging 10-3 per site per year, often have rates of
evolution exceeding that of their eukaryotic host species by a million fold or
more (Holland, 1992). This is a result of the high error rate of the
viral-encoded reverse transcriptase and the lack of misincorporation repair
mechanisms. Although this rapid rate of viral sequence change is not
qualitatively different from that encountered by systematists working on other
taxa, there are several sources of rate variability among viruses that are not
currently recognized in other taxa. Retroviruses undergo replication involving
three different enzymes with variable error rates. In the viral stage (in the
host cell cytoplasm), retroviral RNA is transcribed into retroviral DNA by
reverse transcriptase having a high error rate, as mentioned above. In the
proviral stage (in the host cell nucleus), retroviral DNA is replicated by the
host cell's DNA polymerase which is less error prone and entails efficient
mutation repair mechanisms. Subsequently, the proviral DNA is transcribed back
into RNA by the host cell's RNA polymerase. The error rate for cellular RNA
polymerase is not well known, though it may be similar to that of reverse
transcriptase (Coffin, 1991). Thus, there is the potential for closely related
viral lineages to differ in their rates of change, due to experiencing
different amounts of high error (reverse transcriptase and cellular RNA
polymerase) and relatively low error (cellular DNA polymerase) replication.
These differences will tend to vary with changing virulence, as low virulence
entails longer proviral times and fewer replication cycles, and high virulence
entails greater amounts of low fidelity reverse-transcription. A further
consequence of the proviral stage is the opportunity for recombination with
cellular genes and the possible addition of new sequences into the retrovirus
genome (Bishop and Varmus, 1985).

We expect that rates of retroviral change may vary depending upon the
particular host species infected, given that different animal species may show
different rates of molecular sequence evolution (e.g., Britten, 1986; Li and
Tanimura, 1987; Avise et al., 1992; Martin and Palumbi, 1993) and that
retroviruses use the host's replication machinery. Rates might also vary
depending on the particular cell type infected. This follows from observations
of correlation between rates of sequence change and metabolic rate (rate of
oxygen metabolism) and of differences in metabolic rate for different cell and
tissue types. Underlying the correlation with metabolic rate is apparent DNA
damage due to oxygen-derived free radicals (Joenje, 1989; Shigenaga et al.,
1989). Oxidative damage potentially influences rates of sequence evolution
across all taxa; however, the generally fast rate of retroviral evolution
accentuates these and other effects to a greater degree than is seen in other
organisms.

Viral sequences also show patterns of rate heterogeneity correlated with codon position and transition/transversion differences as seen in other organisms (Graur, 1985; Table 1). We have sought to account for the effects of some of these in our current analyses with an a priori weighting scheme. The effect on phylogeny of other rate heterogeneity sources mentioned (three different replication enzymes, host and cell specific effects) are poorly known at present, although potentially significant. In light of the fast pace of primary sequence evolution and subsequent low levels of sequence similarity among many viral taxa, the more slowly evolving features of secondary and tertiary structures for encoded proteins may prove useful for alignment and phylogenetic analyses in the future (see Johnson et al., 1990; Eickbush, 1994).

Estimates of lineage divergence times assume rate constancy over time and will
be distorted to the extent that rate heterogeneity exists for the characters
analyzed. Not surprisingly, this has given rise to incongruent estimates by
different researchers. Estimates for divergence time between HIV1s and HIV2s
range from 40 (Smith et al., 1988) to 600-1200 years ago (Eigen and
Nieselt-Struwe, 1990).

Naming Virus Clades Rather Than Grades

Up to this point the names used for primate immunodeficiency virus taxa have been based on the host species in which the viruses have been found. What these represent, then, are viral grades based on their distribution. Named grades are less desirable than named clades given the primary purpose of taxonomy to communicate results of evolutionary history (phylogenetic analysis) using a system of names. As more viral taxa become known and are added to phylogenetic analyses, viral taxonomy can be revised to provide a more accurate history of their evolution. Such a revision can discourage misconceptions or premature conclusions regarding lineage origins. For example, the association by name of HIV1s and HIV2s suggests (to systematists) a common origin for them to the exclusion of other immunodeficiency viruses, and as discussed above, this appears not to be the case. Similarly, the taxon "SIVs" gives the unsupported impression that all SIVs are more closely related to each other than they are to various HIVs. de Queiroz and Gauthier (1992) have described useful conventions for naming taxa, the most basic of which is that all names refer to clades.

We can begin by recognizing clades in Figure 2 as taxa. The clade that is
descendent from the hypothetical common ancestor at node A in Figure 2 includes
all the known HIV1s and SIVcpz, and can be called primate immunodeficiency
virus 1 (PIV1). The clade that is descendent from the hypothetical common
ancestor at node B in Figure 2 includes all the known HIV2s and SIVs from sooty
mangabeys and macaques, and can be called PIV2. Other taxa may be recognized
in a similar fashion as the need arises. Members of the taxon PIV1 have an
apparent synapomorphy (shared derived character) in the presence of the
vpu accessory gene, whereas members of PIV2 uniquely possess the
accessory genes vpr and vpx in combination (Gibbs and Desrosiers,
1994). In a nonphylogenetic taxonomy such characters might have been used to
define taxa. However, in our proposed phylogenetic taxonomy, such characters
are used in diagnosing clades, but not in defining them (determining inclusion
or exclusion of species or taxa). Rather, taxon names are defined in terms of
common ancestry and relationship.

Conclusions

Evidence currently available does not support the popular view (the "new virus" hypothesis) that HIVs (or PIVs to use our term introduced above, Figure 2) have recently colonized humans and that PIVs in humans are recent descendants from one or another of the PIV lineages known from nonhumans. Phylogenetic trees show only sister relationships for extant taxa, not ancestor-descendant relationships for extant taxa. Use of our phylogenetic hypothesis and a parsimony criterion to estimate the fewest number of host species shifts (that is, to diagnose changes in the character "viral host") indicates humans to be the ancestral host species for a clade including SIVcpz from chimpanzee and for a clade including SIVsms from sooty mangabeys. We specifically do not claim that the latter analysis resolves the issue of ancestral host, however, in light of potential sampling biases. Our point is to show that current evidence does not support the "new virus" hypothesis. Support for the "new virus" hypothesis then devolves to unjustified assumptions that pre-1959 human blood samples testing negative for PIV presence successfully represent all human populations and demes potentially harboring PIVs, and that new viruses are virulent and old viruses are mild. Small human populations with dormant PIVs may readily have been missed by limited sampling, and the assumption that new viruses are virulent and old viruses are mild ignores the ability of natural selection to affect an increase, a decrease or stasis in virulence over time. Even if the latter assumption were valid, inferred newness of PIV infection of humans is contradicted by discovery of noncytopathic HIV2uc1 and relatively low virulence (longer latency and asymptomatic periods) of PIV2s in rural human populations having relatively low rates of sexual contact among individuals.

Retroviral evolution challenges systematists with a variety of distinctive and
potentially confounding features, including (1) extremely fast rates of
molecular sequence evolution (due to short generation times, large numbers of
progeny, and low fidelity replication), (2) evolutionary rate heterogeneity
within and among virus sequences (due to potential host specific and cell-type
specific rate differences, and variable use of three different replication
enzymes having variable error rates, and (3) potential for genetic
recombination among different lineages infecting the same cell, complicating
character homology determinations. Improved understanding of these features
and greater sampling of primate host species will enhance future studies of
immunodeficiency virus phylogeny, and may entail revision of current hypotheses
of relationship.

Acknowledgments

We would like to thank J. J. Bull, M. J. Donoghue, D. M. Hillis, T. D. Kocher,
M. M. Miyamoto, G. Myers, and T. W. Scott for valuable discussion and comments
on various drafts of this manuscript. DPM's work was supported by the National
Science Foundation (BSR-9019669), and JWS's work was supported by an Alfred P.
Sloan Post-doctoral Fellowship.

Fenner, F., and P. J. Kerr. 1994. Evolution of the poxviruses, including the coevolution of virus and host in myxomatosis. Pages 273-292 in The evolutionary biology of viruses (S. S. Morse ed). Raven Press, Ltd., New York.