The phylogenetic complexity of the primate lentiviruses is
legendary and our understanding of its scope was dramatically
increased recently by the discovery of a new branch of the HIV-1
cluster, the "N" viruses [Simon et al., 1998]. There are now
two major branches, "N" and "O", in the phylogenetic tree of HIV-1
sequences in addition to the cluster of sequences that form the "M",
or Main group (Figure 1). The HIV-1 M, N,
O, and the chimpanzee CPZ sequences cluster some what differently
depending on the region compared (Figure~1). This is discussed in
detail in conjunction with a new CPZ sequence, CPZ-US, a full length
sequence described in [Gao et al., 1999]. (The CPZ-US
sequence became available too late for inclusion in this study, but
is available in the reference set at our website,
http://hiv-web.lanl.gov/ALIGN_99/subtype_alignments.html.) At this
point in time, however, only the viruses in the M group have
significant public health importance. Genetic diversity within the M
group takes the form of phylogenetic clusters which have been named
subtypes. There are now at least 8 different subtypes of HIV-1 which
circulate to varying extents in populations around the globe. A
variety of factors make the genetic structure of HIV-1 particularly
fluid both in time and space. This article will provide a description
of our current understanding of the major circulating forms in the
HIV-1 epidemic, and a subtyping reference set which can be used as a
basis for the classification of new sequences.

The number of HIV-1 viruses which have been sequenced in their
entirety has increased dramatically in the past few years [Korber
et al., 1997], as have the number of tools designed to detect the
presence of mosaic genomes [Salminen et al., 1995; Siepel et al.,
1995]. It is important to distinguish newly discovered subtypes
from recombinants, and to identify recombinant forms of epidemic
importance. Now that full-length genomic sequencing is no longer a
major obstacle, we propose that a virus isolate should fulfill the
following criteria to be considered a subtype: (1) at least two
isolates should be sequenced in their entirety, (2) they should
resemble each other but no other existing subtype throughout the
genome and, (3) they should have been found in at least two
epidemiologically unlinked individuals. By these criteria, there are
currently 8 subtypes of HIV-1. We are also aware that there are many
mosaic genomes of HIV-1, some of which are unique, or restricted to
one isolated transmission cluster, and others which are major
circulating forms. Recombinant viruses are not as uncommon as
previously thought and are especially prevalent in populations where
multiple subtypes co-circulate. While possibly interesting for other
reasons, the unique recombinant viruses do not play a major role in
the global epidemic. In contrast, mosaic viruses which have spread
from one location to another and have been associated with new
outbreaks of the virus, such as the AE recombinant virus in Southeast
Asia, have established a distinct and recognizable genetic lineage.
It is proposed that those recombinant viruses be designated
"Circulating Recombinant Forms", or CRFs, in distinction to the
recombinants which are not known to be in circulation. We propose
that a virus isolate should fulfill the following criteria to be
considered a CRF: (1) at least two isolates should be sequenced in
their entirety, (2) they should resemble each other but no other
existing CRF in their subtype structure and (3) they should have been
found in at least two epidemiologically unlinked individuals. These
forms can be distinguished by associating the CRF with the name of
the first full-length viral sequence of that form. By these criteria,
there are currently 4 CRFs of HIV-1, the AE virus from Southeast
Asia, called "AE(CM240)" [Carr et al., 1996; Gao et al., 1996a;
Gao et al., 1996b], the AG recombinant from west and central
Africa, called "AG(IbNG)" [Carr et al., 1998], the AGI
recombinant from Cyprus and Greece, called "AGI(CY032)" [Gao et
al., 1998a; Kostrikis et al., 1995; Nasioulas et al., 1999], and
the AB recombinant from Russia, called "AB(Kal153)" This sequence was
provided prior to publication by Mika Salminen and is representative
of the CRF found in the Kaliningrad IVDU epidemic described in
[Liitsola et al., 1998]. The 8 subtypes and 4 major
circulating recombinant forms create 12 major branches in the
phylogenetic tree representing the lineage of the M group of HIV-1
(Figure 1).

The 12 major genetic forms of the HIV-1 M group are listed, with
selected full-length sequences to use as references, in Table
1. The subtypes are A, B, C, D, F, G, H, J, and the four CRFs
are: AE(CM240), AG(IbNG), AGI(CY032) and AB(KAL153). New to this
edition of the database are the first full-length sequences from
subtypes F, G, H and J, as well as more new isolates from subtypes A,
C and D. In addition, there are new full-length sequences from the
CRFs AGI(CY032), AB(Kal153) and AG(IbNG).

Subtype A, the most prevalent subtype in Africa, has recombined
with many other subtypes in a myriad of permutations and
combinations. So far, however, only four of those recombinations, to
our knowledge, have yielded viruses which have spread to a
significant extent. The first is a recombination with subtype E,
forming the AE virus prevalent in Southeast Asia [Carr et al.,
1996; Gao et al., 1996a; Gao et al., 1996b]. The parental E virus
has never been found. The virus contains a subtype E env and LTR, but
most if not all of the remainder of the virus derives from subtype A.
The second and third A recombinants which have spread extensively are
ones which have recombined with subtype G. The first of these viruses
is called "AG(IbNG)". The first isolate of this CRF which was fully
sequenced was from Ibadan, Nigeria and was named "IbNG" [Howard
et al., 1994; Howard et al., 1996; Gao et al., 1996b]. Other
viruses with the same structure have been fully sequenced from
Djibouti and Ivory Coast [Carr et al., 1998; Carr et al.,
1999] and there are many partial sequences from west or west
central Africa, all of which cluster with IbNG [Ellenberger et
al., 1999; Takehisa et al., 1998; McCutchan et al., 1999]. The
AG(IbNG) virus is mosaic in pol and LTR, but since both gag and env
derive largely from subtype A these viruses were initially classified
as subtype A [Howard et al., 1996]. In fact, in both gag and
env they form a significant subcluster within the A subtype and can
be recognized even using partial sequencing of familiar regions. The
third major CRF is AGI(CY032), a recombinant between subtypes A and G
and possibly another previously unknown subtype, I. Like the parental
E virus, the parental, "pure" I virus is not known. Two of the three
viruses of this type have been found in epidemiologically unlinked
individuals in Cyprus and Greece [Gao et al., 1998; Nasioulas et
al., 1999]. The last A recombinant to be identified is the
AB(Kal153) virus from the city of Kaliningrad in Russia. Some of this
recombinant is from subtype A but most of the env region is subtype
B. This recombinant has been responsible for an explosive epidemic
among drug users in the city of Kaliningrad [Liitsola et al.,
1998].

The genetic structure of the first full length AG(IbNG) and
subtype G viruses has been recently published [Carr et al.,
1998]. In protease, in the accessory gene region, and at the very
end of env, an unusual phylogenetic relationship exists between
subtypes A, G, AE(CM240) and AG(IbNG). In a genetic sense, they are
neither as close, nor as distant, as in other regions of the genome,
where it is simple to identify the region as belonging to a given
subtype. In these regions they show an intermediate relationship.
While this phenomenon is observed with the A, G, E and IbNG cluster,
it is not observed with the other subtypes in the same regions. It is
therefore not due to a general weakness in the information content of
the region or to the analytic approach. Some have suggested that the
G viruses are actually recombinant with subtype A in these regions
(Gao et al., 1998), and while this is a possibility, others are
unable to convincingly demonstrate a recombinant nature for the G
viruses [Carr et al., 1998]. At the moment the issue is not
completely resolved.

A variety of intersubtype recombinants combining segments of A and
C, A and D, B and F, and others have been described or are known in
yet-to-be-published studies. Each of these unique forms could
potentially spread epidemically, and as new recombinants are studied
it is increasingly important to compare them in detail to the full
spectrum of known recombinant forms. The initial events leading to
the emergence of recombinants may be better understood in future
years.