Abstract

Genes encoding odorant-binding protein (OBP) form a large family in an insect genome. Two OBP genes, Obp57d and Obp57e, were previously identified to be involved in host-plant recognition in Drosophila sechellia. Here, by comparing the genomic sequences at the Obp57d/e locus from 27 Drosophila species, we found large differences in gene number between species. Phylogenetic analysis revealed that Obp57d and Obp57e in the D. melanogaster species group arose by gene duplication of an ancestral OBP gene that remains single in the obscura species group. Further gain and loss of OBP genes were observed in several lineages in the melanogaster group. Site-specific analysis of evolutionary rate suggests that Obp57d and Obp57e have functionally diverged from each other. Thus, there are two classes of gene number differences in the Obp57d/e region: the difference of the genes that have functionally diverged from each other and the difference of the genes that appear to be functionally identical. Our analyses demonstrate that these two classes of differences can be distinguished by comparisons of many genomic sequences from closely related species.

GENES involved in the animal chemosensory system, such as the olfactory and gustatory receptor-encoding genes, tend to form large families in a genome. Size differences in these multigene families among animal species were explained by differences in selection pressure maintaining functional genes. For example, the higher proportion of olfactory receptor pseudogenes in monkeys was explained by the acquisition of full trichromatic color vision that reduced the dependence of these species on olfactory cues (Giladet al. 2004). Also, the loss of gustatory receptor functions in primates was suggested to be the result of changes in the environment and species-specific food preference (Goet al. 2005).

With the completion of many genome sequences, however, comparisons of genomic data have raised a question of whether all the differences in multigene-family size are consequences of selection. Alternatively, they might be caused by merely a stochastic gain and loss of genes. Indeed, results from genome analyses revealed that, at least in part, the size difference in multigene families between species can be explained by neutral evolution (Karevet al. 2003, 2004; Reed and Hughes 2004; Hahnet al. 2005; De Bieet al. 2006; Rudnickiet al. 2006). On the basis of these observations, it was proposed that the size difference in multigene families is not a consequence, but a cause of evolutionary changes in phenotypes (Nei 2005).

These two theories, however, may not look at the same phenomenon. It is known that genes generated by a duplication undergo two successive but distinct stages of evolution (Lynch and Conery 2000). At the earlier stage, the two genes are functionally identical and tend to be reduced to a single gene by degeneration of either gene. Once they have functionally diverged from each other, however, both genes independently contribute to fitness, and selection pressure maintains the two genes stably for a long time. The selection model of gene-family evolution may explain differences at the later stage, while the stochastic gain-and-loss model may fit events occurring at the earlier stage. Thus, it is important to know which stage contributes more to the size difference between the gene families of interest, because it influences the conclusion of analyses.

Genes encoding odorant-binding proteins (OBP), secreted molecules that function in insects' chemosensilla, form a large family in an insect genome. In the Drosophila melanogaster genome, there are ∼50 OBP genes; this number is comparable to that of odorant receptors and gustatory receptors (Galindo and Smith 2001; Graham and Davies 2002; Hekmat-Scafeet al. 2002). Similarly to those of the chemoreceptor gene families, the size of the OBP gene family also varies between species, suggesting that OBP genes are under the control of the same kind of evolutionary mechanisms that determine the sizes of chemoreceptor gene families (Xuet al. 2003; Forêt and Maleszka 2006).

Two OBP genes, Obp57d and Obp57e, have been identified to be responsible for species-specific host-plant preference in D. sechellia (Matsuoet al. 2007). These genes are expected to be under selective pressure by ecological conditions, and their evolution might have affected behavior also in other species. Here, by comparing the genomic sequences at the Obp57d/e locus from 27 Drosophila species, we revealed the rapid evolution of the two OBP genes, which resulted in gene-number differences between species. The differences were divided into two classes: the difference of the genes that have functionally diverged from each other and the difference of the genes that appear to be at the early stage of evolution after gene duplication and thus have not yet functionally diverged. Our findings demonstrate that the comparative analysis of many genomic sequences from closely related species is useful for discrimination between these two classes of differences.

MATERIALS AND METHODS

Dot-plot analysis:

Dot-plot analysis between D. melanogaster and D. pseudoobscura genomic sequences was carried out using Dotlet 1.5 with the following settings: window size, 59; threshold, 30 (Junier and Pagni 2000).

Fly stocks:

The fly stocks used in this study are listed in Table 1. All the stocks are maintained in our laboratory, except for D. melanogaster and D. pseudoobscura, whose genomic sequences were obtained from FlyBase (release 5.1 and 2.0, respectively).

Sequencing of the Obp57d/e region:

The Obp57d/e genomic region was amplified by PCR with KOD plus enzyme (Toyobo, Tokyo), using the primers P1 5′-AGCCACAAACTGGAGGACAG-3′ and P2 5′-GCCTCCAGGCCGTCGAACTC-3′ that recognize highly conserved regions between GA14778 and CG18066 and between GA15677 and CG30148, respectively. For all the fly species, a single band was obtained. The PCR-amplified fragment was purified using a QIAquick spin column (QIAGEN, Valencia, CA) and directly sequenced with the primers listed in supplemental Table S1 at http://www.genetics.org/supplemental/. PCR amplification was independently carried out at least three times for each species and the most frequent observation was adopted when there was inconsistency among replicates. D. simulans, D. sechellia, and D. mauritiana sequences were previously deposited in the DNA Data Bank of Japan (Matsuoet al. 2007).

ORF identification:

The genomic sequences at the Obp57d/e region were searched for second exons (ORF) using an OBP signature cystein motif (C-X10-C-X8-C). First exons (ORF) were determined among possible ORFs using the following criteria: starts with ATG, length is sufficient (>50 bp), and exon–intron boundary can be assigned to form in-frame connection with the corresponding second exon. By using these criteria, the first exon and the exon–intron boundary could be uniquely determined for every second exon found by using the C-X10-C-X8-C motif.

Amino acid sequence analyses:

Alignment and phylogenetic analyses of the deduced amino acid sequences were carried out using the MEGA3.1 sequence analysis package (Kumaret al. 2004). The signal peptide sequence was predicted using SignalP 3.0 (Bendtsenet al. 2004). The ancestral amino acid sequence at each internal node was inferred by the maximum-parsimony (MP) method (Fitch 1971). An original script running on the R statistical package was used for ancestral state inference and for counting the number of amino acid substitution events for each site (see supplemental materials at http://www.genetics.org/supplemental/ for the script and a detailed description). Types I and II functional divergences between Obp57d and Obp57e were examined using DIVERGE 2.0 (Gu and Velden 2002; Gu 2006).

RESULTS

Gene number difference between species at the Obp57d/e locus:

In the D. melanogaster genome, genes encoding OBP form clusters on each chromosome (Graham and Davies 2002; Hekmat-Scafeet al. 2002). One of the two OBP gene clusters at cytological position 57 of the second chromosome consists of Obp57d and Obp57e (Figure 1A). These two genes are tightly surrounded by CG18066 and CG30148, forming a gene-dense region. The position and the order of these surrounding genes are conserved in the D. pseudoobscura genome, but there is only one OBP gene (GA15675) between GA14778 and GA15677, the orthologs of D. melanogaster CG18066 and CG30148, respectively (Figure 1B). A homology search using BLAST for Obp57d, Obp57e, and D. pseudoobscura GA15675 against the D. pseudoobscura genome failed to hit any other OBP genes, indicating that D. pseudoobscura has only one gene that corresponds to D. melanogaster Obp57d or Obp57e.

Genomic structure around the Obp57d/e region. (A) Genomic structure around two OBP gene clusters at cytological position 57 on the second chromosome of D. melanogaster. (B) Dot-plot analysis of the Obp57d/e genomic region in D. melanogaster and D. pseudoobscura.

To determine whether the observed difference in gene number was caused by gene duplication or gene loss, we sequenced the Obp57d/e region from 25 additional species phylogenetically located between D. pseudoobscura and D. melanogaster (Table 1). Two OBP genes were found in most species, but some species have only one OBP gene. Furthermore, several species (i.e., D. takahashii, D. biarmipes, D. ficusphila, and D. elegans) have more than two OBP genes in this region (Figure 2).

Figure 3 shows the minimum evolution (ME) tree of the deduced amino acid sequences of the OBP genes. Obp57a and Obp57b in D. melanogaster were used as the outgroup. The OBP genes in the obscura group branched first, and then the other OBP genes formed two major clades, which correspond to D. melanogaster Obp57d and Obp57e. The two clades appear to be equally diverged from the OBP genes in the obscura group, indicating that Obp57d and Obp57e are generated by a gene duplication event at the very early stage of evolution of the melanogaster group.

ME tree of deduced amino acid sequences. Obp57d and Obp57e genes form two clades, both of which have equally diverged from the OBP genes in the obscura group. The ME tree was constructed with the Poisson correction using the complete deletion option of MEGA 3 for the alignment shown in Figure 2. D. melanogaster Obp57a and Obp57b were used as an outgroup. Bootstrap values >80% are shown.

Figure 4 shows the genomic structure of the Obp57d/e region of all the species along with the phylogenetic tree of the species. The difference in OBP gene number between the species revealed that there have been multiple duplication and deletion events during the evolution of the melanogaster group. The first duplication occurred at the very early stage of the melanogaster group evolution, generating Obp57d and Obp57e from an ancestral OBP gene, which remains single in the obscura group (Figure 4, arrow 1). Immediately after the duplication, Obp57e was lost in the ananassae subgroup (arrow 2). The BLAST search for Obp57e against the genomic sequence of D. ananassae failed to hit any other OBP genes, proving that Obp57e was not translocated to the other position but was completely lost from the D. ananassae genome. In contrast to the ananassae subgroup, Obp57d was lost in the auraria–rufa lineage (arrow 3).

Genomic structure of the Obp57d/e region in analyzed species. The phylogenetic relationship between the species is based on Da Lageet al. (2007). Arrows indicate the position of three major gene duplication/loss events (see text).

Further duplication of the remaining OBP gene was observed in D. varians and D. constricta. Extra copies of Obp57d genes are also observed in D. takahasii, D. biarmipes, D. ficusphila, and D. elegans. The exact number and timing of each duplication event cannot be determined because the phylogenetic relationship between these species is not clearly resolved. Nevertheless, structural similarity between these multiple genes (Figure 3) considered with the phylogenetic relationship between species (Figure 4) indicates that most of the multiple genes are supposed to be duplicated independently in each lineage. Indeed, Obp57d gene number differs between the two closely related species, D. takahasii and D. biarmipes, suggesting that this difference evolved rapidly. We also found that in D. takahasii and D. elegans, one of these duplicated Obp57d genes has degenerated into a pseudogene by frameshift mutations (Figure 2). Because such tandem repeats of similar genes are known to fluctuate easily in their number, there might be an intraspecies variation of gene number in these species.

Obp57d or Obp57e knock-out flies showed similar changes in behavioral response to octanoic acid, indicating that these two OBP genes, at least in part, share the same function in perception of octanoic acid (Matsuoet al. 2007). We searched for amino acid sites that are conserved between Obp57d and Obp57e. For the analysis, we selected species with single copies of Obp57d and Obp57e genes, to ensure better conservation of functions in each gene (Figure 5A). A bifurcating tree is not adequate to describe the phylogenetic relationship between the selected species. Thus, we employed the multibranched tree for ancestral state inference by the MP method, and the substitution events at all branches were counted. The total numbers of substitutions were almost the same between Obp57d and Obp57e, indicating that the strengths of overall constraints on these two genes are equivalent to each other (Table 2). When the distribution of the number of amino acid substitutions at each site was analyzed, 16 sites were conserved, being beyond the expectation by the negative binomial distribution (see supplemental Figure S1 at http://www.genetics.org/supplemental/). Amino acids at these sites are shown with those in D. pseudoobscura and D. obscura (Table 3). In addition to the six OBP-signature cysteines, three sites at positions 59, 97, and 124 are conserved between the obscura and melanogaster groups. They are the candidates for the amino acids that determine the common function between Obp57d and Obp57e.

Phylogenetic relationships between species selected for site-specific analysis of functional constraint. (A) Phylogenetic tree used for analysis of amino acid substitution. (B and C) Phylogenetic trees used for analysis of functional divergence using DIVERGE. Because the software does not accept multibranched trees, the realistic tree (shown in A) was approximated by bifurcating trees.

Summary of conserved amino acids between Obp57d and Obp57e in the melanogaster species group

Functional divergence between Obp57d and Obp57e:

The ME tree of the amino acid sequences supported the two clades representing Obp57d and Obp57e (Figure 3). To examine whether this pattern reflects the functional divergence between the two genes, we examined the site-specific difference in evolutionary rate between the two clades using DIVERGE (Gu and Velden 2002; Gu 2006). The software examines two types of functional divergence: type I functional divergence in which the site-specific evolutionary rate is different between two clades and type II functional divergence in which functionally different amino acids are conserved at the corresponding sites between two clades. In general, the type I functional divergence is observed when the functional constraint on particular sites was lost in either clade and maintained in the other clade, while the type II divergence is observed when a novel functional role was acquired at particular sites in either clade (Gu and Velden 2002; Rothet al. 2007). Because the software accepts only bifurcating trees, we tested two tree shapes that approximate the actual topology between the species (Figure 5, B and C). By combining maximum-likelihood estimation with ancestral sequence inference to avoid underestimation caused by multiple substitutions at a single branch, DIVERGE gave a larger estimation of the substitution number per site than that shown in Table 2 (Table 4). The coefficients for type I functional divergence (θI) between Obp57d and Obp57e are significantly different from zero for both trees 2 and 3, showing that the site-specific evolutionary rate differs between the two clades. On the other hand, the coefficients for type II functional divergence (θII) are not significantly different from zero, showing that there is no site-specific shift of amino acid property between the two clades.

Summary of analysis for functional divergence between Obp57d and Obp57e

We further examined which site is responsible for the type I functional divergence. Figure 6 shows plots of the number of substitutions and the posterior probability of type I divergence at each site. The highest probability was observed at the sites where the number of substitutions highly differed between Obp57d and Obp57e. Among those sites with a probability of type I divergence >0.8, six were completely conserved in either clade and substituted four times or more in the other clade (Table 5, site positions indicated by footnote a). These are candidates to be the responsible sites for functional divergence between Obp57d and Obp57e.

Site-specific analysis of the difference in functional constraint. (A) The number of amino acid substitutions at each site. Arrows indicate the conserved sites among all Obp57d and Obp57e sequences in the selected species (see Table 3). (B) Posterior probability of type I functional divergence (being different in evolutionary rate) at each site calculated by using DIVERGE. Sites with probabilities >0.8 have a large difference in the number of amino acid substitutions between Obp57d and Obp57e (shown in A, connected by dashed lines).

In the subfunctionalization of duplicated genes, ancestral functions are divided into duplicated genes (Rothet al. 2007). Concerning Obp57d and Obp57e, the obscura group preserves a single OBP gene that is supposed to retain ancestral functions. We examined whether the cluster-specifically conserved amino acids in the melanogaster group are also conserved in the obscura group. Among the 19 sites that are cluster-specifically conserved in the melanogaster group, only 7 are conserved in the obscura group (Table 5). This denotes that at the other 12 sites, the evolutionary rate decreased after gene duplication. In other words, these 12 sites may reflect the newly acquired functional constraints in Obp57d and Obp57e, which did not exist in the ancestral OBP gene. However, it is possible that the OBP gene in the obscura group did not preserve the ancestral functions completely, and some of the ancestral functions have been lost.

DISCUSSION

Specifically conserved amino acids in Obp57d and Obp57e:

In general, OBP genes evolve rapidly. Comparisons of OBP genes in the D. melanogaster genome revealed that the difference between amino acid sequences of the OBP genes is so large and saturated that the phylogenetic relationship between the OBP genes is not resolved (Graham and Davies 2002; Hekmat-Scafeet al. 2002). There are only three conditions that define OBP: (1) OBP has a signal sequence to be secreted, (2) OBP has six α-helical domains, and (3) OBP has six cysteines at particular intervals that are necessary for appropriate conformation. Most of the other sites in the OBP genes are not conserved at the amino acid level (Galindo and Smith 2001; Graham and Davies 2002; Hekmat-Scafeet al. 2002).

Because most OBP genes in a genome are supposed to have diverged from others in function, the comparison of the amino acid sequences of OBP genes within a genome is not effective for elucidating the relationship between the structure and the specific function of each OBP. Thus, it is more preferable to compare orthologs from closely related but different species, which are expected to retain the same function, e.g., ligand repertoire. By comparisons of the orthologous genes from many species, we found the conserved amino acids in both Obp57d and Obp57e. They might be the key sites for the specific functions shared by these two genes. We also found the type I functionally diverged sites between the two OBP genes. They are possibly the key amino acids responsible for the specific functions of each OBP.

Functional divergence between Obp57d and Obp57e:

There are two theories for the functional divergence of duplicated genes: subfunctionalization and neofunctionalization (Rothet al. 2007). The subfunctionalization of duplicated genes is a key process in the DDC model, in which the functions of an ancestral gene are divided to the duplicated genes that functionally complement each other. On the other hand, the neofunctionalization of duplicated genes results in the acquisition of a novel function by one gene, while preserving the ancestral function by another gene. The ME tree of the amino acid sequences showed that Obp57d and Obp57e have equally diverged from the OBP gene in the obscura group, suggesting that the neofunctionalization of either gene is not likely. Also, type II functional divergence was not supported, which means that there was no radical substitution of amino acids leading to the acquisition of a novel function. However, not all of the type I diverged sites between Obp57d and Obp57e appear to be caused by the loss of functional constraints; among the 19 conserved sites that are clade specific, 12 sites are not conserved in the obscura group, in which the ancestral functions should be conserved (Table 3). The specific condition for OBP genes needs to be considered to understand these observations. Because the most sites in OBP genes are evolutionarily free, acquisition of a novel function after gene duplication might be observed as an increase of functional constraints at the sites that had been free before duplication. Such site-specific differences of evolutionary rate will be detected as type I divergence, but in this case, it should be related to neofunctionalization rather than subfunctionalization. Positional shift of functionally important sites, for example, may cause such changes. Our analysis did not include insertion/deletion variations, which clearly affect positional relationships between functional amino acids. Thus, it remains possible that each of Obp57d and Obp57e inherited a subdivision of ancestral functions (subfunctionalization), and at the same time they gained a novel function that is specific to each of the two OBP genes (neofunctionalization). It has been proposed that subfunctionalization has a role as a transition state to neofunctionalization (He and Zhang 2005; Rastogi and Liberles 2005). This possibility should be examined experimentally by in vitro assay, as well as by behavioral assay of genetically manipulated flies.

Birth-and-death process and selection:

The ananassae subgroup and the auraria–rufa lineage provide interesting examples in which either Obp57d or Obp57e has been lost. Even in the subfunctionalization process, duplicated genes require selective pressure for the preservation of ancestral functions. This selective pressure is also necessary for the maintenance of both subfunctionalized genes. In other words, the species lacking either subfunctionalized gene may exhibit defective phenotypes that had been deleterious during the subfunctionalization process. Thus, the loss of either gene in the ananassae subgroup and the auraria–rufa lineage may indicate a shift in selective pressure, such as a reduction in population size leading to genetic drift, or an environmental change leading to a shift in food availability. Indeed, some of the amino acids conserved in the other species that have both Obp57d and Obp57e (Tables 3 and 5) are not conserved in the ananassae subgroup and the auraria–rufa lineage (Figure 2), indicating that the functional constraint on the remaining gene has changed. It should be also noted that the OBP gene number at the Obp57d/e locus is under a particular selection mechanism. In natural populations of D. melanogaster, there is polymorphism at the Obp57e locus (Takahashi and Takano-Shimizu 2005). The Obp57e null allele was found worldwide, indicating the existence of balancing selection. This denotes that there is a selection-based mechanism that affects the gene number at the Obp57d/e locus. The mechanism of this selection may have great importance as a determinant of the OBP gene family size.

Contrary to the examples discussed above, the multiple Obp57d genes in D. takahasii, D. biarmipes, D. ficusphila, D. elegans, and D. vaians and the two Obp57e genes in D. constricta appear to be at the early stage of evolution after gene duplications in each lineage. Although the amino acids conserved in Obp57d genes in the other species (Tables 3 and 5) are not conserved in some of these multiple Obp57d genes (Figure 2), it may indicate, in this case, a gene degeneration process rather than functional divergence. Indeed, Obp57d pseudogenes are found in D. takahashii and D. elegans. Analyses of intraspecies variations in each species might reveal selection pressure on these extra Obp57d genes, if any. Although the observed number difference of genes at the earlier stage of evolution (e.g., three Obp57d genes in D. biarmipes) is larger than that at the later stage (e.g., loss of Obp57d in D. auraria) and it may contribute more to the size difference of the gene family between species, its contribution to phenotypic differences would be less than that of the number difference of the functionally diverged genes.

Functional relationship with receptors:

In the evolution of OBP genes, not only the populational and environmental factors, but also the local factors at the molecular level are important determinants of selection pressure. For its proper function, OBP must be coexpressed with functionally corresponding chemoreceptors in the same sensilla (Xuet al. 2005). Changes in the structure (function) and expression pattern of the corresponding receptor are expected to change the selection pressure on OBP genes. More importantly, changes in the expression pattern of OBP itself also alter its local environment. The evolution in the expression pattern of each OBP may affect the selection pressure on itself, possibly resulting in the gene number difference between species. It is necessary to examine whether these genes are expressed in the same pattern as seen in D. melanogaster or in different patterns in the species in which the number of Obp57d/e genes is altered.

Conclusion:

There are two classes of gene number differences in the Obp57d/e region: the difference of the genes that have functionally diverged from each other and the difference of the genes that appear to be functionally identical. Although both of these two classes contribute to the size difference of the gene family between species, their contributions to the phenotypic differences are not equal, and the evolutionary mechanisms underlying them are different. Thus, it is important to distinguish between these two classes in the analysis of the size difference of multigene families among species. Comparisons of many genomic sequences from closely related species are effective for this purpose.

Acknowledgments

This work was supported by grant-in-aid for Young Scientists (B) 19770210 from the Ministry of Education, Culture, Sports, Science and Technology, Japan.

Footnotes

Sequence data from this article have been deposited with the DNA Data Bank of Japan under accession nos. AB370270–AB370291.

The Genetics Society of America (GSA), founded in 1931, is the professional membership organization for scientific researchers and educators in the field of genetics. Our members work to advance knowledge in the basic mechanisms of inheritance, from the molecular to the population level.