& objectives: Comparative genomics and evolutionary analyses of conserved genes have enabled us to understand the complexity of genomes of closely related species. For example: β-globin gene present in human hemoglobin is one such gene that has experienced many genetic changes in many related taxa and produced more than 600 variants. One of the variant, HBS causes sickle-cell anemia in humans but offers protection against severe malaria due to Plasmodium falciparum. In the present study, we characterized and performed evolutionary comparative analyses of the β-globin gene in different related and unrelated taxa to have a comprehensive view of its evolution. Methods: DNA and protein sequences of β-globin gene were downloaded from NCBI and characterized in detail in nine eutherian (Homo sapiens, Pan troglodytes, Macaca mulatta, Mus musculus, Rattus norvegicus, Bos taurus, Canis familiaris, Equus caballus, Oryctolagus cuniculus), a dinosaurian (Gallus gallus) and a neopterygii (Danio rerio) taxa. Three more eutherian (Papio anubis, Ovis aries and Sus scrofa) taxa were included for an analysis at the protein level but not included at the gene level owing to lack of genomic information. Computational and phylogenetic analyses were performed using evolutionary comparative approach. Results: Results of comparative and phylogenetic analyses revealed less conservation of genetic architecture of β-globin compared to its protein architecture in all eutherian taxa. Both dinosaurian and neopterygii taxa served as outgroups and varied at gene and protein levels. Interpretation & conclusion: Most remarkably, all primates from eutherian taxa including P. anubis showed only nine codon position differences and an absolute similarity between H. sapiens and P. troglodytes. Absolute conservation of coding region in Equus caballus (horse) was observed. The results were discussed with an inference on the role of evolutionary forces in maintaining such close similarities and variations across closely related taxa. Further, the need to utilize more comparative approaches in understanding the disease causing genes' evolution in closely related taxa is hoped for. Key words β-globin gene; evolution; malaria; phylogeny; sickle-cell anemia.

Comparative genomic studies have helped to decipher within and between species variations by comparing genes conserved across species with those that have taken different functions according to need and evolved to perform specific functions1. From basic biology to highly complex dynamic mechanisms of genes, these studies have helped in identifying and comparing functional sequences based on high levels of evolutionary conservation. Such comparisons have proven successful not only for closely related species such as human-primate or human-mouse but also for distant evolutionary comparisons, such as humanfish and human-bird2. Majority of these studies have contributed to better understanding of highly important human genes related to infectious diseases that are simultaneously evolving in various taxa3-5 and how natural selection of human genes has provided increased adaptive fitness on exposure to infectious diseases6. Since gene duplication with subsequent interaction divergence is one of the primary driving forces in the evolution of genetic systems and little is known about the precise mechanisms and the role of duplication divergence in evolution, these observations might prove beneficial to infer evolution of medically important genes in different taxa.

As agents of natural selection, infectious diseases like malaria have played a major role in the evolution of human species by showing association between genetic variation in beta globin gene and protection from severe malaria due to P. falciparum6. This protective effect causes a balanced polymorphism of the sickle-cell mutation in malaria endemic regions also. Considering this, the human beta globin (HBB) gene, present in hemoglobin and located on the short arm of chromosome 11 at p15.5 is a highly important gene in understanding the complexities of malaria. Hemoglobin is a major blood protein that consists of four polypeptides, two alpha globins and two beta globins and belongs to the globin gene family, a group of genes involved in oxygen transport7. Structurally, five transcriptionally active b-like globin genes are present within a cluster of 45 Kb in the following order8: 5' - ε - Gγ - Aγ - δ - β - 3' (Fig. 1). Additionally, the pseudogene, ψβ^sub 1^, is located between the Aγ and δ genes8 (Fig. 1). The HBB gene provides instructions for making a protein called betaglobin. The ε- and γ-globin genes diverged from the ancient ε/γ ancestral gene about 100 Myrs ago followed by duplication of the γ-globin gene to form the Gγ and Aγ genes, an event which occurred before the divergence of the Old World and New World monkeys9. The human alpha globin gene (HBA), located on chromosome 16, is also a member of this gene family. Under normal conditions, the alpha and beta genes in hemoglobin are transmitted separately as they are located on separate chromosomes7. Individuals inherit a copy of alpha gene and a beta gene from each of their parents and then recombine to determine individual genotypes.

Altered forms of genes (otherwise called as alleles), such as sickle-cell hemoglobin HbS, originate only by mutation or a change in the nucleotide sequence of the DNA10. Sickle-cell hemoglobin is a single point mutation occurring when there is a substitution of valine (V) for glutamic acid (E) at the sixth amino acid position in the beta globin gene10. HbS mutation forms three genotypes; for example, when the A hemoglobin and the S hemoglobin genes (replacing the normal hemoglobin gene) combine they form: AA homozygous dominant (normal genotype and phenotype), AS heterozygote (sickle-cell genotype but not phenotype: carrier) and the SS homozygous recessive7 (abnormal genotype and phenotype: leads to sicklecell anemia). Different types of variations of beta globin gene are associated with human disease such as Hemoglobin SC, Sickle/beta-thalassemia, Hemoglobin E/betathalassemia and Alpha thalassemia/Hemoglobin constant spring. An advantage of inheriting HBB, if heterozygous, is increased immunity to malaria while the disadvantage is that homozygous recessive alleles (SS) develop sicklecell anemia7. The HbS allele has been identified in four genetic backgrounds in different African populations, suggesting that the same mutation arose independently several times through convergent evolution11. Since the discovery of HbS in 1949, the number of hemoglobin variants is increasing and more than 600 are known today, mainly, the HbC and HbE alleles, which arose and spread in Africa and in southeast Asia, respectively12. Different uncommon beta globin variants have also been reported from different parts of India, like HbD Iran (β22 Glu[arrow right]Gln), Hb Hofu (β126 Val[arrow right]Glu) besides unspecified ones such as HbJ, HbK and HbM8. Heterozygosity for two different globin gene mutations were also recorded from different parts of India and globe, e.g. HbSD, HbSE, HbDK and HbSC8.

Indeed, the β-globin gene cluster in humans represents a good model for investigating mechanisms and processes of genome evolution, because it is one of the most intensively studied multigene families from the standpoint of molecular genetics and phylogenetic history13. Apart from humans, β-globin gene or its variants (present in the cluster; Fig. 1) have been reported in various other mammals such as chimps, rhesus monkeys, baboons, cows, sheeps, dogs, rabbits and horses. Chimpanzees are the closest extant relatives of humans, having shared a common ancestry of 4-6 million years14. However, recent studies on β-globin gene in chimpanzees have neither provided any evidence of mutations that confer resistance to malaria nor of long-term balancing selection at the genetic loci15 unlike humans16,17, although in rodents a separate study has reported presence of complex signatures of selection and gene conversion in the duplicated globin genes18.

Herein, we report the results of a comparative genomic analyses of the β-globin gene in 11 different taxa; nine eutherian [Homo sapiens (human), Macaca mulatta (rhesus monkey), Pan troglodytes (chimpanzee), Rattus norvegicus (mouse), Mus musculus (rat), Canis familiaris (dog), Bos taurus (cow), Equus caballus (horse), Oryctolagus cuniculus (rabbit)], one dinosaurian (avian) Gallus gallus (chicken) and a neopterygii (marine) Danio rerio (fish), describing fundamental similarities and differences among taxa to enable better evolutionary understanding of functional β-globin gene (Fig. 2). Since the presence of shared conserved insertion or deletions (indels) in protein sequences is a special type of signature that shows considerable promise for phylogenetic inference19 and also a species place on an evolutionary tree is a valuable predictor of the structure and function20, we inferred the evolutionary relationships based on β-globin gene among all the studied 11 taxa through phylogenetic analysis.

METHODS

Nucleotide and protein sequences of β-globin gene in nine eutherian (taxonomic classification given in Table 1a), Homo sapiens (GenBank Accession No. NC_000011.9), Macaca mulatta (GenBank Accession No. NC_007871.1), Pan troglodytes (GenBank Accession No. NC_006478.2), Mus musculus (GenBank Accession No. NC_095534.1), Rattus norvegicus (GenBank Accession No. NC_005100.2), Bos taurus (GenBank Accession No. NC_007313.3), Equus caballus (GenBank Accession No. NC_009150.2), Canis familiaris (GenBank Accession No. NC_006603.2), Oryctolagus cuniculus (GenBank Accession number NC_013669.1), a dinosaurian-Gallus gallus (GenBank Accession No. NC_006088.2) and a neopterygii-Danio rerio (GenBank Accession No. NC_007114.4) taxa were downloaded from National Centre for Biotechnology Information (NCBI) database (http:// www.ncbi.nlm.nih.gov/) during February/March 2010. These sequences were subjected to different computational and statistical analyses for different variables; such as total gene, exon and intron lengths. Coding/non-coding ratios and mean coding percentages were computed in all 11 taxa. The computer software DNASTAR (DNASTAR Inc. Madison, USA, www.dnastar.com) was used to align the nucleotide sequences of β-globin gene followed by Clustal W algorithm. Phylogenetic tree based on neighbour-joining (NJ) method was constructed to infer the evolutionary relationships among various taxa at the β-globin gene level using phylogeny option in DNASTAR and validated with MEGA21 computer program version 4.0 (http:// www.megasoftware.net/). Length of each branch and bootstrapped values for each internal node were also estimated. To further establish the relationship of altered β- globin gene (variants) in 11 taxa, HomoloGene option of NCBI (http://www.ncbi.nlm.nih.gov/sites/homologene) was used and nucleotide sequences of these variants were downloaded. Three more taxa [Sus scrofa (pig) GeneID: 407066, Ovis aries (sheep) GeneID: 100049064 and Papio anubis (Baboon) GeneID: 100137310] were also included for phylogenetic analysis of variants, as the β-globin gene was found orthologous in these taxa. However, these taxa were not included in genetic characterization study owing to non-availability of positional information in their respective genomes in the NCBI. Since amino acid sequences were available for these three taxa, these sequences were used for phylogenetic tree reconstruction (total 14 taxa were considered for a phylogenetic analysis using the amino acid sequences and only 11 taxa using the nucleotide sequences).

RESULTS & DISCUSSION

Results of the comparative genetic analyses revealed that the β-globin structure (5'- ε - Gγ - Aγ - δ - β - 3') has been subjected to many evolutionary changes and thus is unique in all taxa except human and chimp β-globin genes. Variations in the β-globin gene structure of many primates have been reported13. Moreover, evolutionary history of the beta globin gene in eutherian taxa is quite intriguing and perplexing. For example, it was reported that in higher primates, the γ-globin gene was duplicated before the divergence of Old World and New World monkeys, probably by an unequal homologous crossing over event mediated by LINE elements9 and after duplication the γ-globin gene became fetally expressed as a direct result of the accumulation of sequence changes in the 5' flanking region such as Gγ and Aγ9. Studies between mammalian and marsupial's beta-like globin genes confirmed the hypothesis that a two-gene cluster, containing an embryonic- and an adult-expressed beta-like globin gene, existed in the most recent common ancestor of marsupials and eutherians22. This was evident from the analysis of RNA from embryos and neo-natals that indicated a switch from embryonic to adult gene expression occurring at the time of birth, coinciding with the transfer of the marsupial from a uterus to a pouch environment22. Moreover, this cluster of genes was proposed to have arisen by tandem duplication of ancestral beta-globin genes, with the first duplication occurring 200 to 155 MYBP just prior to a period in mammalian evolution when eutherians and marsupials diverged from a common ancestor22.

The present study reveals several interesting evolutionary observations on the HBB gene among the eutherian, dinosaurian and neopterygii taxa. The β-globin gene in primate taxa (human, chimp and monkey) was found to have differed at the nucleotide sequence level but closely resembled at the amino acid level. Remarkably, P. troglodytes and M. mulatta shared same genetic compositions (similar size of exon 1, 2 and intron 1 in both taxa) with only 9 bp and 15 bp differences in intron 2 and in intron 3, respectively (Table 1b). Even the first exon, first and second introns were found to be similar in both human and chimp, suggesting closer genetic resemblances among two taxa (Table 1b). The presence of long extra third intron (14333 bp in P. troglodytes, 22259 bp in M. mulatta) and its absence in H. sapiens at the 3' end suggests that the γ- globin gene in P. troglodytes and M. mulatta is able to accommodate more genetic changes compared to human γ-globin gene. Presence of a long intron also contributes to a larger gene size (15098 bp in P. troglodytes and 23030 bp in M. mulatta) and eventually a larger genome size. It is also probable that an unequal homologous crossing over and duplication of γglobin gene could have resulted in this long intron of P. troglodytes and M. mulatta. The exon-intron ratio computed in all the three primate taxa differed in H. sapiens (0.638), P. troglodytes (0.053) and M. mulatta (0.034) signifying that introns are generally longer compared to exons (Table 1b). Furthermore, observation of a significant positive correlation between the intron size and total gene size (data not shown) suggests that the size of gene has increased across taxa due to accumulation of non-coding nucleotides11. Like humans, in rodents also, new genes originated via multiple recombinational pathways23. In the present study, it was evident that M. musculus (rat) exhibits an extra 264 bp long exon and 654 bp long intron compared to R. norvegicus (mouse) gene composition (Table 1b). Organisms belonging to Laurasiatheria super order such as B. taurus, C. familiaris and E. caballus (Table 1a) showed similarity with conservation in the first intron and second exon (Table 1b). Interestingly, in all 10 taxa, exon 2 was found to be conserved except M. musculus (almost a difference of more than 100 bp). The size of the UTR also varied among taxa (1301 bp in chimp, 1316 bp in monkey, 182 bp in human, 4033 bp in rat, 176 bp in mouse, 177 bp in cow, 200 bp in horse, 93 bp in dog, 145 bp in rabbit, 93 bp in chicken and 156 bp in fish). It is widely known that UTRs help in a better gene regulation and are known to play crucial roles in the post-transcriptional regulation of gene expression, including modulation of the transport of mRNAs out of the nucleus and of translation efficiency, subcellular localization and stability24. Thus, the observed variation in the size of UTRs in evolutionarily close taxa signifies that the mechanism of globin gene regulation might be different in different taxa. However, if size of the UTR matters to the efficiency of gene regulation, it might be true that efficient gene regulation mechanism possibly exists in rat in comparison to other taxa. The observation on an almost- uniform exon-intron ratio over all taxa except the primates signifies evolutionary conservation of nucleotides belonging to exon and intron in these taxa. The results otherwise indicate that introns are major determinants on the size of the gene. However, in D. rerio, the observed highest exon/intron ratio (3.030) suggests the major contribution of coding nucleotides to gene size in marine organisms. Further, percentage of coding nucleotides in the β-globin gene varied across taxa; the highest was detected in E. caballus (100%) and lowest in P. troglodytes (59.6%) testifying the fact that the coding region of β-globin gene in E. caballus is absolutely conserved.

With a view to understand phylogenetic inter-relationship among all the 11 taxa at genetic level, an un-rooted neighbour-joining (NJ) tree was constructed (Fig. 3a). As expected, all primate taxa fall in one clade and rodents in another. Human and chimp were closer to each other. Moreover, B. taurus and E. caballus belonged to one clade with O. cuniculus slightly diverging from these two taxa. Surprisingly, C. familiaris belonging to the super order Laurasiatheria (to which B. taurus and O. cuniculus also belong) fell separately to Euarchontoglires and Laurasiatheria super order organisms. Being outgroups, G. gallus and D. rerio fell in separate branches. It is suggested that a number of events including gene conversion have occurred during evolution which has altered the clusters of β-globin gene in specific ways in different eutherian (mammalian) orders. These include insertion of repeat sequences, change in expression profile, gene duplication, gene fusion and gene loss or inactivation9. In order to verify this hypothesis, such altered genes showing homologies to β-globin gene in different taxa were selected (Table 2a) and a neighbour-joining phylogenetic tree using nucleotide sequences was constructed (Fig. 3b). Even the original β-globin gene nucleotide sequences present in all taxa were included for this phylogeny construction. It was apparent that all altered forms of β-globin gene in B. taurus formed a separate clade, but the actual β-globin gene in B. taurus was closer to β-globin gene variant (HBBA) in O. aries. This might be due to the fact that both B. taurus and O. aries belong to the bovine class. This result also signifies that H. sapiens and P. troglodytes are very closely related to each other at the HBG1 gene variants. It is already known that functionally both these primate taxa are closer to each other than to any other apes25; hence, sequence similarities in both these taxa at the gene and protein level is no surprise. It has already been reported that the δ-globin gene (HBD) of eutherian organism exhibits a propensity for recombinational exchange with the closely linked β-globin gene and has been independently converted by the β-globin gene in many lineages26. For example, in African elephant (Loxodonta africana), presence of a chimeric β/δ fusion gene was created by unequal crossing-over between misaligned HBD and HBB paralogs26. Hence, the knowledge of β-globin variants is certainly very important to have a clear picture of the evolution of this gene in all taxa. Both G. gallus and D. rerio being outgroups, formed separate clades even with their variants. It has been suggested that major gene duplications have given rise to the paralogous beta globin genes or variants that are associated with significant evolutionary rate variation among gene lineages27. Further, genes arising from more recent gene duplications (e.g. tandem duplications within lineages) do not appear to differ greatly in rate27. Such a pattern also reflects a complex interplay of evolutionary forces where natural selection for diversifying paralogous functions and lineage-specific effects contribute to rate variation on a long-term basis, while gene conversion tends to increase sequence similarity27. Moreover, gene conversion effects appear to be stronger on recent gene duplicates, as their sequences are highly similar27.

It is known that in Old World monkeys, apes and humans, both the γ1 and γ2-globin genes are functional but the expression of γ1 gene is three-fold higher than that of the γ2 gene; whereas in New World monkeys only oneglobin gene is functional, usually γ2(9). This has been explained by different models of gene family evolution that explain the mechanism whereby gene copies created by gene duplications are maintained and diverge in functions28. Two models have been proposed to explain this phenomena; one that the nonsynonymous substitutions increase following gene duplication and preserve the duplicates through positive selection28. Another alternative model, the duplication-degeneration-complementation (DDC) model, does not explicitly require the action of positive Darwinian selection for the maintenance of duplicated gene copies, although purifying selection is assumed to continue to act on both copies27. Since gene duplication and divergence play significant role in eukaryotic protein network evolution, pointing to either discontinuous likely to be adaptive or continuous unlikely to be adaptive taxonomical differences, inferences at the protein level of the beta locus are as vital as inferences at the gene level29. Therefore, apart from characterization at the gene level, we also considered the protein sequences in all the 14 taxa (including sheep, baboon and pig) to ascertain functional level changes across taxa. Results of the β-globin protein alignment (147 codons) in all 14 taxa were found absolutely similar in human and chimp. Monkey and baboon sequences were found to be similar with only three substitutions at codon position 10, 14 and 53. All the four primate taxa were dissimilar at only nine codon positions (Table 2b), signifying that plenty of conservation exists at the β-globin protein level in closely related species, unlike the genetic level. Interestingly, contrary to the traditional belief10, we found that neither glutamic acid (E) nor valine (V) was present at the sixth amino acid position in human β-globin protein sequence. Instead, proline (P) was observed at the sixth codon position. Similar to human β- globin protein, proline was also observed at the sixth position for chimp, monkey, baboon, dog and pig (Table 2c). Absolute protein homology was also seen for the rodent taxa. Interestingly, in bovine group (cow and sheep) the first and second codons were found missing (Table 2c). To further assess the evolutionary relationship among all the 14 taxa, an unrooted phylogenetic tree was constructed based on the protein sequences and inferences were drawn. Human and chimp fell in a single clade neighboured by monkey and baboon with chicken and fish as outgroups (Fig. 3c).

The study therefore not only provides a comprehensive overview of the β-globin gene in evolutionarily closely related taxa and establishes the relationship among them based on the beta globin gene but also among the unrelated taxa. Absolute conservedness between human and chimp β-globin protein is certainly a highlight of the study. However, absence of evidence for a genomic signature of malaria at β-globin gene in chimpanzee15 challenges the correlation of β-globin gene with mild and severe malaria in human, since both proteins are absolutely similar. Either, the polymorphisms in β-globin gene in humans are no longer under the effect of balancing selection to impart protectiveness against malaria or that the β-globin gene is not so adaptive in chimpanzees since malaria is less detrimental in them15. Alternatively, it could be possible that chimpanzees might utilize mechanisms different from humans for protection against malaria15. Not all variants in humans are under the effect of balancing selection, as a recent study suggests that the beta globin recombinational hotspot around HbC reduces the effects of strong selection30. Evidences of natural selection suggest that gene duplications in paralogous copies of beta-globin genes evolve under a non-episodic process of functional divergence28. Considering the fact that β-globin gene variant (HBS) in humans is responsible for sickle-cell anemia and highly associated with severe malaria; more in-depth evolutionary and genetic diversity studies from field samples are needed for a better understanding of intricate relationship between infectious diseases and medically important genes.

In conclusion, the present study clearly indicates that genetic changes have occurred during evolution of different mammalian and closely related taxa in the globin gene. Although controversial31 this gene has been shown to be correlated with P. falciparum malaria in humans32. The evolutionary genetic studies performed here could thus not only be useful in testing the hypothesis that whether this particular gene has any role in P. falciparum malaria infection, but also could ascertain, with further studies in a malaria endemic region, if this gene is under evolutionary pressure in humans differentially in P. falciparum endemic and non-endemic areas.

ACKNOWLEDGEMENTS

Gauri Awasthi is a Senior Research Fellow of the Indian Council of Medical Research (ICMR) and Garima Srivastava spent summer internship in the EGB Lab. The authors thank the Director Incharge of NIMR for providing facilities. Dr Aparup Das thanks the ICMR for intramural funding.