Abstract

The evolutionary changes in the Drosophila H2A and H2AvD genes, which encode histones, were analyzed using the sequences of 12 Drosophila sp. for understanding the evolution of histone replacement and epigenetics. The Ball gene, coding for a histone threonine kinase, was located head-to-head with the H2AvD gene in seven Drosophila sp. A strongly conserved DNA sequence was also found in the region upstream of the H2AvD gene; this sequence is most likely a transcriptional signal, because the sequence was also conserved in four other Drosophila sp. that did not have an upstream Ball gene. The SPARC gene, coding for a calcium-binding domain, was located tail-to-tail in the region downstream of the H2AvD gene in 11 Drosophila sp. studied. A moderately conserved DNA sequence was found in the H2AvD gene region at the splicing site in the first intron. Different codon usages for the H2A and H2AvD genes were found for 11 of 17 amino acids, and codon usages characteristic of replacement histones (H2AvD, H4r, H3.3A and H3.3B) were found for amino acids. Codon usage was considerably different at several histone modification sites in the H2A gene. These results suggested that unlike the H3.3 and H4r genes, not only post-transcriptional control, but also transcriptional control played a role in the H2AvD gene. In addition to post-transcriptional controls, such as splicing and translation, the development of a control system for transcription must have occurred during the evolution of histone replacement and epigenetic systems.

Keywords

Epigenetics; Histone replacement; H2A; H2AvD; Drosophila

Introduction

Since epigenetics plays an important role in phenotypic expression and many biological phenomena [1-5], the evolutionary mechanism of phenotypic evolution can be studied by investigating the evolution of epigenetics [6,7]. Histones, which maintain the nucleosome structure, are proteins involved in the packing of DNA [8]. There are two types of histones according to the timing of its expression: Replication-dependent (C: canonical type) and replication-independent (R: replacement or variant type) histones [9]. Nucleosome remodelling is thought to be caused by changes in the nucleosome structure due to the modification of histones or the replacement of histones with different modification states [1-3,5]. Gene expression is controlled partly by nucleosome remodelling, that is, phenotypic expression is controlled in part by epigenetics [3,10]. Studying how these two types of histones (C and R) have evolved may enable us to understand the evolution of histone replacement and epigenetics, leading to the elucidation of the evolutionary mechanism of phenotypes [6,7].

Analyses of genetic variability, molecular evolution, concerted evolution, codon bias and GC content have been conducted and reported for histone genes of the C type in Drosophila [11-21]. In addition to H3.3A, H3.3B and H4r, H2AvD is known to be a histone of the R type in Drosophila [22,23]. H2AvD was first reported to be a variant of histone 2A (H2A) [22]. Unlike the H2A gene, the H2AvD gene is a single-copy gene containing three introns [22,23]. Histone is a basic protein and is composed of many basic amino acids, lysine and arginine. Histones for H2A and H2AvD in Drosophila are small proteins of 123 and 140 amino acids, respectively. The primary structures are evolutionary conservative but not highly as H3 nor H4. Together with histone 3 (H3) and histone 4 (H4), H2A and histone 2B (H2B) are assembled into the histone core of the nucleosome [8]. H2AvD takes part in nucleosome remodelling by replacing H2A [22,24,25]. Recently, analyses of the histone genes of the R type, H4r, H3.3A and H3.3B, have been reported in Drosophila [6,7]. In this paper, the H2AvD gene was analyzed as a remaining R type histone gene for analysis in Drosophila.

The codon usage of each amino acid in two histone 2A genes (H2A and H2AvD) was calculated.

The number of use of each codon was summed over 12 Drosophila sp. Because differences in codon usage are known to be present among Drosophila sp. for a given gene, to compare under the same conditions, the summed data of 12 species were analyzed.

Results

Genomic organization of the H2AvD gene in 12 Drosophila sp.

In contrast to the tandem cluster array of C type histone genes [11-13,16,17,19-21,27], H2AvD is a single-copy gene with three introns located at IIIR97D3 in Drosophila melanogaster (Figure 1) [22,23]. The last exon contains a long 3’ Untranslated region. Neighbouring genes and the transcriptional orientation may provide information on gene expression control, e.g., whether they are co-regulated or not, as found for the H3-H4 gene pair [28]. The genomic arrangements of the H2AvD gene in the 12 Drosophila sp., whose complete genomes have been sequenced, are shown diagrammatically in Figure 2. The genes in the upstream and downstream regions of the H2AvD gene and its transcriptional orientation are also shown. Although the genomic information for the H2AvD gene of D. virilis was available, no information was available for the upstream and downstream regions of the gene (Figure 2).

Figure 1: Gene structure of H2AvD in Drosophila melanogaster. The chromosomal location of the gene is IIIR97D3. The coding region is colored in red.

Figure 2: Gene arrangements of Ball (light green), H2AvD (red), SPARC (blue), DSRB (yellow) and Splicing factor (purple) in Drosophila genomes. The orientation of transcription is indicated by arrows. The phylogenetic relationship between the 12 Drosophila sp. is indicated on the right side of the Figure. The outside regions of the H2AvD gene in D. virilis were not available (dotted regions).

For seven Drosophila sp . (D. simulans, D. sechellia, D. melanogaster, D. yakuba, D. erecta, D. ananassae and D. willistoni), the H2AvD gene was paired with the Ball gene in a head-to-head manner (Figure 2). The Ball gene encodes an enzyme with histone threonine kinase activity, suggesting its functional relatedness to the H2AvD gene in histone modification. Since the H2AvD-Ball gene pairing was not observed in all 11 species, a functional relation, if any, is restricted to the 7 species. In addition, the DSRB gene, which encodes a double-stranded RNA-binding domain, was located upstream of the H2AvD gene in 2 species (D. pseudoobscura and D. persimilis). A gene encoding pre-mRNA splicing Prp18-interacting factor was located upstream of the H2AvD gene in the two remaining species (D. mojavensis and D. grimshawi).

A SPARC gene was located downstream of the H2AvD gene in a tail-to-tail configuration in all 11 species for which the nucleotide sequences were available (except for D. virilis). The SPARC gene encodes Osteonectin-like protein that contains a calcium-binding domain. The conservation of the H2AvD and SPARC gene pairing in a broad range of Drosophila sp. suggests its involvement in the control of expression.

Conserved nucleotide sequences in the H2AvD gene

DNA sequences with important functions can be detected by investigating their evolutionary conservation between distantly related species. The upstream, downstream and intron regions of the H2AvD gene were investigated in the 12 Drosophila sp. A sequence of 17 base pairs (bp) that was strongly conserved among the 12 species, 5’-CN(C/ T)T(G/C)ACGGCGTCT(T/C)(T/A)T-3’, was identified in the upstream region of the H2AvD gene (Figure 3). The DNA sequence was located 13-22 bp upstream of the transcription start site of the H2AvD gene, suggesting that it is an important signal for transcription. The sequence was located between two divergently transcribed genes, Ball and H2AvD, in 7 species. However, the possibility of a co-regulation for both genes is small, because the sequence was also present in four other Drosophila sp. that did not have a Ball gene in the upstream region of the H2AvD gene. None of the other DNA sequences in the upstream regions of the H2AvD gene were found to be conserved among the 12 species.

Other DNA sequences that were found to be conserved within the gene are shown in Figures 4 and 5. Short DNA sequences (3-9 bp) were intermittently preserved in the region of exon 1 and intron 1 in the H2AvD gene (Figure 4). One of the conserved sites in Figure 4 corresponded to a first splicing site of exon 1 and was considered to be a recognition sequence for a splicing complex or spliceosome. None of the other splicing sites showed any conservative sequences, except at exact donor and acceptor sites. In the regions for intron 2 and the 3’ untranslated regions of exon 4 in the H2AvD gene (Figure 5), short DNA sequences (3-8 bp) were intermittently preserved. The functional importance of these sequences is unknown. A DNA fragment insertion of approximately 2 kb in length was found in exon 4 of D. yakuba.

For all 11 species, the H2AvD gene was paired with the SPARC gene in a tail-to-tail manner (Figure 2); however, the distance between the two genes, as measured by the transcriptional region in D. melanogaster, was only 17 bp, and no conserved sequence was recognized in this region.

Evolutionary changes in the coding regions of the H2A and H2AvD genes in Drosophila

The alignment of amino acid sequences of the histones encoded by the H2A and H2AvD genes of 12 Drosophila sp. are shown in Figure 6. The amino acid sequences, excluding the start methionine codon, for H2A were 123 residues in length and were completely identical among the 12 species. No amino acid substitution was found. The amino acid sequences of the histones encoded by the H2AvD gene of 12

Drosophila sp. were 140 amino acid residues in length and they were also highly similar (Figure 6). Three amino acid substitutions were found in H2AvD among the 12 Drosophila sp.: a D-E substitution at position 123, a T-S substitution at position 124, and a V-P substitution at position 134 (Figure 6). The average amino acid identity between the sequences of the two H2AvD proteins was 99.3%, and the differences were more frequently found near the C-terminal end. As described previously [15,19,20], several amino acid substitutions and indels were found between H2A and H2AvD (Figure 6). The total length of the two histones differed by 17 amino acid residues.

The codon usage for amino acids and the GC content at the third codon position were compared between the two H2A genes (H2A and H2AvD) (Figures 7 and 8). Significant differences in codon bias were found in the H2A genes for 11 of 17 amino acids (Table 2). The remaining three amino acids were Met, Trp, and Cys. The former two were coded by a unique codon and the latter was not used in H2A. Seven amino acids, including Lys and Arg, showed highly significant differences (Table 2). Of these 11 amino acids, six (Gln, Asn, Ile, Ala, Arg and Lys) were identified as amino acids with significant differences in previous studies of the H3 and H4 genes [6,7]. The pattern of codon usage for these six amino acids is shown in Figure 9. The codon usage for three amino acids, excluding Lys, Ile and Asn, indicated respective patterns for the C and R types. For example, for the Arg sites, CGU was predominant for the C type, but CGC was predominant for the R type, except for the H3.3B gene. The R type more predominantly used G or C at the third codon position when compared to the C type. In the codon usage for the five remaining amino acids (Val, Pro, Tyr, Gly and Ser), a significant difference was found for H2A, but neither for H3 nor H4, as shown in Figure 10. Similar to the previous pattern, many amino acids, with the exception of Ser, had codon usage patterns that were distinctive for each histone type, C or R. This can also be seen by the comparison of GC content at the third codon position for the two H2A genes (Figure 8). Previous analyses of the H3 and H4 genes showed higher GC content at the third codon position in R-type genes [6,7]. As for the H2A gene (Figure 8), most species, with the exception of three, showed a similar tendency. In D. pseudoobscura , D. persimilis and D. mojavensis, the GC content of the two H2A genes was approximately the same.

Amino acid

χ2

d.f

Thr

3.88

3

Gly

8.87*

2a

Arg

34.47***

5

Lys

11.07***

1

Leu

6.28

5

Ala

57.47***

3

His

2.35

1

Val

20.38***

3

Asp

0.21

1

Asn

4.91*

1

Ile

7.73*

2

Gln

17.24***

1

Pro

32.68***

3

Ser

24.52***

5

Tyr

4.19*

1

Glu

0.05

1

Phe

0.20

1

Table 2: χ2 tests for codon bias in the H2A and H2AvD genes in 12 Drosophila sp. ***P<0.001, **P<0.01, *P<0.05, d.f.: Degree of Freedom, aWhen the expected value was smaller than 1, it was combined with the next smallest value.

Figure 7: Codon usage in the H2A and H2AvD histone genes in Drosophila. Numbers within the parentheses indicate the number of residues of that amino acid in the H2A and H2AvD of D. melanogaster.

Figure 8: GC content at the third codon positions of the H2A (blue) and H2AvD (red) genes in Drosophila.

Figure 9: Comparison of the codon usage in replacement histones and replication-dependent histones of the H2A, H3 and H4 genes in Drosophila. Codon usages of six amino acids that were significantly different in H2A, H3 and H4 are compared.

Figure 10: Comparison of the codon usage in replacement histones and replication-dependent histones of the H2A, H3 and H4 genes in Drosophila. Codon usages of five amino acids that were significantly different in H2A, but not in H3 nor H4, are compared.

Codon usage was also analyzed site-by-site to study the relationship between codon usage and histone modification for H2A (Figure 11). Four histone modifications (methylation, acetylation, phosphorylation and ubiquitylation) for four amino acids (Lys, Arg, Thr and Ser) were analyzed [2,29,30]. For the Ser sites, UCU was used most of the time at positions C1 (first site of the C type) and R124 (124th site of the R type). C1 is a phosphorylation site. For the Thr sites, ACC was predominant at positions R103 and C119, unlike in other sites. C119 is a modification site. For the Arg sites, AGA was predominant only at position C76. C76 is a methylation site. For the Lys sites, AAA was predominant at positions R4, C5, C8, R77 and R79. C5 and C8 are acetylation sites.

Figure 11: Codon usage at each amino acid site of the respective histone 2A genes in Drosophila. The locations of histone modifications are indicated by *acetylation, **methylation, ***acetylation and ubiquitylation and #phosphorylation after the position number. The protein length of H2A from human/mouse and Drosophila differed by six amino acids. Indels in human/mouse and Drosophila may have caused gaps in the amino acid numbers defined from the N-terminal end for each species; therefore, caution is needed regarding the position numbers of the modification sites.

These results indicated that amino acid sites showing a different codon bias when compared to other sites are frequently histone modification sites. This is consistent with the results obtained from previous studies on H3 and H4 genes, suggesting a connection between translation and histone modification.

Discussion

Studying how the two H2A genes (H2A and H2AvD) have functionally differentiated and evolved is expected to provide information that will enable us to understand the evolution of histone replacement and epigenetics. The SPARC gene was located downstream of the H2AvD gene in all Drosophila sp. studied. It is not easy to understand why the Osteonectin-like gene was located tail-totail. It is possible that the two genes, H2AvD and SPARC, have to be expressed under similar nucleosome conditions for some reason and are thus located next to each other, or that their close proximity was simply coincidental. The Ball gene was located upstream of the H2AvD gene in seven of 11 Drosophila sp. studied. Although the Ball gene encoding an enzyme with histone threonine kinase activity is functionally related to the H2AvD gene, it remains unknown why the relation is restricted to these 7 species. Is it related to a specific feature of these 7 species?

A strongly conserved DNA sequence was found in the region upstream of the H2AvD gene. The conserved sequence was located 13-22 bp upstream of the gene, indicating an important role for it in the transcription of the H2AvD gene. Functional divergence in transcriptional control was found for the H2AvD gene, but not for the H3.3A, H3.3B or H4r genes [6,7]. The genes for histones of the R type contained several introns [23,31,32]. A possibility of expression control via splicing was found for the H3.3B and H2AvD genes from the observation of conservative sequences at the first splicing site [6,7]. The reasons why they were found only in the first intron remain unknown, but we speculated that the first intron plays some roles in expressing the genes. It is also unclear whether the Splicing factor gene located upstream of the H2AvD gene in 2 species, D. mojavensis and D. grimshawi, is relevant for the splicing site in the first intron of the H2AvD gene.

The two H2A proteins, H2A and H2AvD, showed high similarity in amino acid sequence and are therefore supposed to have a similar function [22,29]. However, compared to the differences in amino acids between the C and R types, the differences within the H2A proteins (H2A and H2AvD) seemed to be small. A possible reason for this is that if the two proteins diverged from a common ancestor, they diverged fairly early and well before the divergence of Drosophila [33]. Alternatively, the two types of proteins may have evolved as similar, but different proteins under different selection pressures. The differences between the two types of proteins may result in some functional differences [29,34]. In addition to the purifying selection caused by strong constraints on the amino acid sequence as a histone 2A, the weak selection for synonymous sites, the force of which is small as it is affected by the population size [14,18,20,35], operates differently for the C and R types. The difference in codon usage observed for the two types was possibly caused by weak selection [18,20,36]. The distinctive codon usage for the C or R types could affect the speed and efficiency of translation, leading to different amounts of gene expression. Consistent with previous results for the H3 and H4 genes [6,7], a marked codon bias was found for some amino acid sites that are supposed to be modified. The stronger bias at histone modification sites when compared to other sites indicated a higher selection coefficient at the modification site. As for the timing of histone modification, it is thought to have occurred post-translational. However, a recent report showed that one of the histone modifications had occurred during translation [37]. These results suggested that histones could be modified through specific tRNAs. It is highly likely that the accumulated weak selection acting on many sites enabled the generation of different types of histones. Changes in gene structure as well as in the control of transcription, splicing and translation may have caused the evolution of epigenetics, leading to the evolution of phenotypes.

Acknowledgement

This research was supported by a Grant-in-Aid for Scientific Research to Y. M. from the Ministry of Education, Culture, Sports, Science and Technology of Japan.