Abstract

We have sequenced six overlapping clones from a library of bacterial artificial chromosome (BAC) clones derived from a laboratory strain of the mosquito, Anopheles gambiae, the major vector of human malaria in Africa. The resulting uninterrupted 528-kb sequence is from the 8C region of the mosquito 2R chromosome, at or very near the major refractoriness locus associated with melanotic encapsulation of parasites. This sequence represents the first extensive view of the mosquito genome structure encompassing 48 genes. Genomic comparison reveals that the majority of the orthologues are found in six microsyntenic clusters in Drosophila melanogaster. A BAC clone that is wholly contained within this region demonstrates the existence of a remarkable degree of local polymorphism in this species, which may prove important for its population structure and vectorial capacity.

Diagram of the 528-kb genomic sequence, presented in two segments. Predicted genes are in green (+ strand, left to right and telomere to centromere orientation) and red (− strand). tRNA genes are shown in black. The sequenced BAC clones are identified, in blue for those containing the Consensus Sequence and green for the variant 8N20 BAC. The sequence axis includes red tickmarks representing 51 microsatellites (more than 8 dinucleotide repeats each); two light blue arrowheads represent microsatellites that partially delimit the Pen1 region, of which H175 has not been separated from Pen1 recombinationally. The genes belong to the following functional categories according to the Gene Ontology classification (www.geneontology.org): 7 signal transduction, 6 metabolism, 5 transport/binding, 4 transcription/translation, 3 each structural protein and regulation, 2 nucleic acid metabolism, and 1 each cell adhesion and replication. The figure was produced with the Bio∷Graphics library from the Generic Model Organism Database (GMOD) project (http://www.gmod.org).

Syntenic analysis, displaying all protein-coding genes of the Pen1 sequence in two large blocks, 0–280 and 280–528 kb (telomere to centromere orientation). Genes are numbered and colored as in Fig. . The chromosomal location of this sequence (Ag 2R:8C) is seen Lower Left. Microsyntenic Drosophila clusters are shown to the left of the mosquito sequence diagrams. Orthologues in both species are connected and shown in bold. C and T indicate the centromeric/telomeric orientation in Drosophila, and CG numbers refer to genes in the consensus fruit fly genome. Of the six microsyntenic clusters, one each maps to 3L and 2R and four to 3R in Drosophila (diagrammed at Lower Right). The Drosophila 69A-B cluster consists of two adjacent orthologues in both species, the 91E cluster shows a simple transposition, and the other four clusters are “hyphenated” by extra genes in either species. Hyphenated and nonclustered genes are not in bold. Their locations in Drosophila, if not in clusters, are indicated by individual cytogenetic locations to the right of the mosquito sequence; x represents the absence of a Drosophila orthologue. Two pairs of duplicated Anopheles genes are shown in brackets.

Sequence variants. (A) Percent identity plot (PIP) analysis of the displayed BAC 8N20 (jagged black line) vs. the Consensus Sequence. Percent sequence identity is shown by colored blocks in pink (>98%), yellow (95–98%), green (90–95%), blue (80–90%), and purple (10–80%). Blocks in white between 1.6–1.7 kb, 88.9–90.1 kb, and 93.8–94.6 kb represent sequences present only in 8N20; the block between 20.1 and 25.8 kb is a retrotransposon that is only present in 8N20. Exons of the indicated genes are shown in black numbered boxes (short gray and white boxes represent CpG islands). (B) Examples of sequence variations with each letter corresponding to 3 nucleotides (amino acids or z for introns). The top sequence is exons 5 and 6 of (8N20.5/22J3.4) interrupted by an intron, and the bottom sequence is the four exons of (8N20.6/22J3.5) interrupted by introns. In each case, the aligned (8N20.5/22J3.4) and (8N20.6/22J3.5) sequences are shown by dots if identical. Coding differences are indicated as amino acid changes, whereas silent changes are indicated by tickmarks, + or §, depending on whether they involve one, two, or three DNA changes in the same triplet, respectively. (C) Phylogenetic tree of conceptually translated confirmed alleles of the complete gene 22J3.4 from susceptible 4Ar/r (red), refractory L3–5 (blue), and PEST laboratory strains. The PEST strains are from BAC 22J3 (Consensus Sequence, yellow) and BAC 8N20 (Variant, green). Bootstrap values are associated with the tree nodes.