Abstract

Macrosatellite repeats (MSRs) present an extreme example of copy number variation, yet their epigenetic regulation in normal and malignant cells is largely understudied. The CT47 cancer/testis antigen located on human Xq24 is organized as an array of 4.8kb large units. CT47 is expressed in the testis and in certain types of cancer, but not in non-malignant somatic tissue. We used CT47 as a model to study a possible correlation between copy number variation, epigenetic regulation and transcription originating from MSRs in normal and malignant cells. In lymphoblastoid cell lines and primary fibroblasts, CT47 expression was absent, consistent with the observed heterochromatic structure and DNA hypermethylation of the CT47 promoter. Heterochromatinization of CT47 occurs early during development as human embryonic stem cells show high levels of DNA methylation and repressive chromatin modifications in the absence of CT47 expression. In small-cell lung carcinoma cell lines with low levels of CT47 transcripts, we observed reduced levels of histone 3 lysine 9 trimethylation (H3K9me3) and trimethylated lysine 27 of histone H3 (H3K27me3) without concomitant increase in euchromatic histone modifications. DNA methylation levels in the promoter region of CT47 are also significantly reduced in these cells. This supports a model in which during oncogenic transformation, there is a relative loss of repressive chromatin markers resulting in leaky expression of CT47. We conclude that some MSRs, like CT47 and the autosomal MSRs TAF11-Like, PRR20, ZAV and D4Z4, the latter being involved in facioscapulohumeral muscular dystrophy, seem to be governed by common regulatory mechanisms with their abundant expression mostly being restricted to the germ line.

Introduction

Approximately half of the human genome is composed of repetitive DNA and a significant proportion of this DNA is arranged as tandemly repeated units. It is now widely accepted that these elements are an integral part of the genome with structural and regulatory functionalities. Macrosatellite repeats (MSRs) are a specific class of repeated DNA defined by a minimum unit size of 100bp. Typically, MSR unit sizes are much larger and MSRs usually span hundreds of kilobases of DNA. MSRs can be protein-coding or non-coding, and are often subtelomerically or centromerically located. MSRs have been implicated to be involved in human disease,1 including cancer,2 and epigenetic regulation of the genome, such as X inactivation.3 Recently, a genome-wide survey revealed that MSRs are more abundant than previously anticipated, are distributed over the entire genome and are organized in a higher order structure.4 Owing to their size variability, MSRs thus represent an important source of copy number variation (CNV) that may contribute to the missing heritability observed in some genome-wide association and CNV studies.

The structure and function of MSRs is largely understudied. Probably the best studied MSR is D4Z4, because contractions of this MSR are associated with facioscapulohumeral muscular dystrophy (FSHD).1, 5 Other MSRs that have been studied in various details include DXZ4 on the X chromosome, which was recently proposed to have a role in X-chromosome inactivation;3 RS447, which encodes for a deubiquitinating enzyme;6 and RNU2, encoding a small nuclear RNA,7 TAF11-Like,8, 9 which was recently suggested to be involved in a psychiatric disorder, and very recently SST1, PRR20 and ZAV.9

D4Z4 is characterized by bidirectional transcriptional activity, high DNA methylation and the presence of both heterochromatic histone modifications such as histone 3 lysine 9 trimethylation (H3K9me3) as well as the transcriptionally permissive histone modification such as dimethylated lysine 4 of histone H3 (H3K4me2). Upon contraction, as seen in FSHD, D4Z4 becomes hypomethylated,10 shows selective loss of H3K9me3, with secondary losses of heterochromatin protein 1γ and the cohesin complex,11 thus leading to a more open chromatin structure. DXZ4 has chromatin features very similar to D4Z4, but shows opposite chromatin configurations on the active (Xa) and inactive (Xi) X chromosome. On Xa, DXZ4 is organized as constitutive heterochromatin with high levels of H3K9me3, while on the Xi, DXZ4 adopts a more euchromatic structure with high levels of H3K4me2 and H3K9ac. Also, DXZ4 is bidirectionally transcribed with discrete differences in the transcriptional landscape on Xa and Xi.3

From these studies, the picture emerges that CNV, expression profile and chromatin structures are important attributes of MSRs, which when disturbed may have consequences for genome integrity and function, as has recently been demonstrated in Saccharomyces cerevisiae where large rDNA repeats are required for the maintenance of genome integrity.12

Cancer/testis antigens (CTAs) represent a class of genes that are typically only expressed in germ cells of the testis, fetal ovary and placenta, and sometimes in the brain, and which also show expression in various tumor types.13, 14 Epigenetic changes in cancer typically involve hypermethylation of CpG islands and hypomethylation of repetitive DNA elements.15, 16 Hypermethylation can lead to transcriptional silencing of tumor suppressor genes while the effect of hypomethylation in cancer is less clear, but thought to contribute to genomic instability. In accordance, it has been established that many CTAs show expression in malignancies, which is associated with promoter demethylation.17, 18 However, by and large, the chromatin structure of CTAs under normal and malignant conditions has not been studied in great detail.

The cancer/testis gene CT47 on Xq24 is arranged as an MSR with up to 17 copies of a putative protein coding gene ordered head to tail.19 Being a CTA and an MSR, in this study we set out to examine the array length and chromatin structure of this MSR in CT47-expressing and -non-expressing cells.

Materials and methods

Cell lines and culturing conditions

HapMap cell lines and primary fetal lung fibroblast WI38 were purchased from Coriell Cell Repositories (Camden, NJ, USA) and cultured according to the supplier's instruction. Small-cell lung carcinoma (SCLC) cell lines were purchased from LGC Standards GmbH (LGC Promochem, Teddington, UK) and cultured as described;20 human embryonic stem cells (hESCs) WA9 and WA1421 were purchased from WiCell Research Institute (Madison, WI, USA). Cells were cultured according to a published protocol.22 hESC and human embryonic body (hEB) samples were analyzed for the presence of the differentiation markers NANOG, OCT4 and XIST as described,23 and showed that ESCs had differentiated properly.

Primers

Characterization of the CT47 array

CT47 array size was determined by pulsed field gel electrophoresis (PFGE) after digestion of agarose-embedded DNA with EcoRI. Hybridization was performed with radioactively labeled CT47 probe recognizing the CT47 unit sequence (NT_011786.15), generated by PCR.

Quantitative methylation analysis of bisulfite-converted DNA

Genomic DNA was treated with sodium bisulfite (EZ DNA Methylation Kit; Zymo Research, Irvine, CA, USA; no. D5002). Bisulfite-converted DNA was PCR amplified in a final volume of 25μl containing 1 × PCR buffer supplied with the enzyme, 25mmol of each dNTP, 10pmol of bis CT47 primer pair and 0.1 units of HotStar Taq DNA polymerase (Qiagen, Venlo, The Netherlands; no. 203203). Primers specific for the CT47 promoter region were designed with MethPrime to amplify the genomic region 270 base pairs (bp) upstream of exon 1 of CT47. PCR cycling conditions were as follows: 15min denaturation step at 94°C, followed by 45 cycles of 94°C for 20s, 55°C for 30s and 72°C for 60s, and a final step of 72°C for 5min. PCR products were sequenced and the level of methylation of individual CpGs was measured with the ESME software (Epigenomics AG, Berlin, Germany).24

Chromatin immunoprecipitation

Chromatin was prepared from cells fixed with 1% formaldehyde and then quenched with 125m glycine. The chromatin immunoprecipitation (ChIP) procedure was carried out according to a published protocol.25 Antibodies against H3K4me2 (no. 07-030), H3K9me3 (no. 17-625) and trimethylated lysine 27 of histone H3 (H3K27me3) (no. 17-622) were purchased from Millipore (Billerica, MA, USA). EZH2 antibody was purchased from Active Motif (no. 39103) and the histone H3 antibody (La Hulpe, Belgium; no. ab1791) from Abcam (Cambridge, UK).

To determine the level of nonspecific signals, we used IgG, which was supplied with the antibodies for H3K9me3 and H3K27me3. Immunoprecipitation with histone antibodies and IgG was carried out with a chromatin containing 3μg DNA. Immunoprecipitation with EZH2 antibody and IgG was carried out with a chromatin containing 6μg DNA. The amount of immunopurified DNA was determined by quantitative PCR with specific primer pairs at three different sites in CT47: promoter primer pair (labeled A in Figure 1), exon 3 primer pair (labeled C in Figure 1) and distal primer pair (labeled D in Figure 1). All primers were designed by primer3 program and optimized for minimizing sequence variations between individual repeat units. Primers were tested for specificity on a monochromosomal cell panel. A published primer pair was used for the GAPDH promoter.26 Relative abundance of histone modifications was calculated in the following way: enrichment values obtained with IgG were subtracted from the enrichment values from specific antibodies. Normalization for CNV was carried out by correcting the enrichment values from the ChIP by the input values. Input values were obtained by determining the total amount of DNA used in the ChIP by using the specific primer pairs. These results are presented in the figures as relative enrichment to input. Samples were tested by ChIP in two independent experiments and every immunopurified DNA was quantified twice independently.

Schematic representation of CT47 macrosatellite array. The CT47 repeat array is localized on Xq24 at coordinates chrX: 119893246–119948579 according to hg18 assembly. The size of the array varies between 4 and...

cDNA synthesis and quantification of CT47 mRNA

Human testis total RNA was purchased from Ambion (Austin, TX, USA; no. AM7072). Total RNA was isolated from LCL cell lines, hESCs, hEBs from lines WA9 and WA14, and SCLC lines with Qiasol reagent (Qiagen; no. 79306). RNA was purified using the miRNeasy kit (Qiagen; no. 217004). cDNA synthesis was carried out with Omniscript Reverse transcription kit (Qiagen; no. 205113) using 500ng total RNA according to the manufacturer's instructions. cDNA synthesis for all samples was carried out in parallel under identical conditions and repeated twice. cDNA was quantified by using specific intron spanning primers for CT47 using SRBR Green (BioRad, Veenendaal, The Netherlands; no. 170-8882) mix and measurements were performed with MyiQ Single-Color Real-Time PCR Detection System, and analyzed with the Bio-Rad iQ5 program. Samples were measured twice in duplicate.

Statistical analyses

Statistical analysis of DNA methylation levels of different samples was carried out by Mann–Whitney U-test.

Results

No correlation between CT47 unit number and chromatin structure inside the CT47 array

HapMap cell lines were sized for the CT47 array, located at Xq24, by PFGE analysis of EcoRI-digested genomic DNA and using a probe recognizing the repeat (Supplementary Figure S1). The CT47 repeat sizes of selected cell lines used in this study are listed in Table 1. Taking advantage of the hemizygosity of CT47 in men, to study the repeat-size dependency of the CT47 chromatin structure, four male samples were chosen with short CT47 arrays (4–7 repeat units) and two with long arrays (12 or 14 units). Female samples were selected based on their homozygosity for the CT47 repeat size. The relative abundance of H3K4me2, H3K27me3 and H3K9me3 in CT47 promoter region, exon 3 of the CT47 gene and 3700bp distal of the array (Figure 1) was determined by quantitative ChIP analysis. H3K4me2 is associated with an open chromatin structure, while H3K27me3 marks transcriptionally silenced regions, and H3K9me3 is specific for heterochromatin. The GAPDH promoter region with an open chromatin structure was included as a control (Supplementary Figure S2). The most abundant histone modification at the CT47 promoter in the male cell lines is H3K9me3, suggesting that the chromatin is highly condensed in this region (Figure 2a). We also detected high levels of H3K27me3 and low levels of H3K4me2, further supporting a closed chromatin structure. Cell lines with short or long CT47 repeat arrays showed high inter-individual variation in H3K9me3 and H3K27me3 levels without clear correlation with repeat length. Distal to the array, the relative abundance of H3K27me3 is higher and H3K9me3 levels are lower than in the CT47 promoter, whereas H3K4me2 levels are similarly low (Figure 2b). There is no correlation between CT47 array length and chromatin structure distal to the array.

Relative abundance of histone modifications at the CT47 promoter (a) male, (c) female and distal region (b) in male and female LCL samples from the HapMap panel. Numbers within brackets are corresponding to the number of CT47 units in the sample. (a)...

Similar to the male samples, high levels of H3K9me3 and H3K27me3 and low levels of H3K4me2 were detected in the female cell lines with similarly sized long or short CT47 repeats on both alleles (Figure 2c). The H3K9me3 levels were lower than or close to the H3K27me3 levels, whereas in male samples H3K9me3 was the most abundant modification. Distal to the CT47 array, the relative abundance of these histone modifications is similar as within the array in female samples (Figure 2d). We also included a female primary fibroblast cell line (5–6 units) to exclude that immortalization by EBV has significant effects on the chromatin structure of CT47. We did not observe a difference in chromatin structure between female immortalized LCLs and female primary fibroblasts, excluding a large effect of the immortalization process. Thus, in normal somatic cells, the CT47 repeat contains repressive chromatin markers, and within the tested size range, there is no correlation between CT47 repeat array size and chromatin structure.

CT47 is heterochromatic in hESCs and EBs

Male WA14 and female WA9 human hESC cell lines were studied for CT47 chromatin structure in the pluripotent state and when differentiated into hEBs. This showed that H3K9me3 is already abundantly present on this locus in hESCs and hEBs (Figure 3). H3K4me2 and H3K27me3 levels were equally low in these samples, which is different from LCLs or primary fibroblasts, where H3K27me3 was far more abundant than H3K4me2. We did not observe major changes in the relative abundance of the different histone markers during differentiation. We conclude that the CT47 promoter region is highly heterochromatic early in development and remains heterochromatic during early differentiation, suggesting that at these stages CT47 expression does not change.

Relative abundance of different histone modifications at CT47 in human male (WA14) and female (WA9) pluripotent hESCs and in differentiated hEBs. The relative abundance of H3K4me2, associated with transcriptionally permissive chromatin, is increasing...

SCLC cells are expressing CT47 RNA

We selected four commercially available SCLC cell lines for determining CT47 expression in relation to chromatin structure. Two of them were already reported to express CT47 (NCI-H82 and NCI-H187),19 while the two other SCLC cell lines were not studied yet (NCI-H69 and H1607). Human testis total RNA was included as a control. By quantitative RT-PCR we detected CT47 expression in all SCLC cell lines but not in hESCs. hEBs or in LCLs and fibroblast cell cultures studied. In good agreement with previously published data, SCLC cell lines only express 0.1-3% of CT47 levels compared to testis (Figure 4a).19

(a) Expression levels of CT47 mRNA in different SCLC cell lines relative to the expression in human testis. Commercially available human total testis RNA and RNA isolated from SCLC lines were used for cDNA synthesis under identical conditions. CT47 expression...

Chromatin structure of CT47 array in SCLCs

We studied the role in transcription of the chromatin structure of the CT47 promoter region in all 4 SCLC cell lines. Figure 4c shows that the relative abundance of H3K4me2 is lower in SCLCs compared to LCL 385. However, H3K9me3 levels are dramatically reduced in all SCLC cell lines. Of all SCLCs, the highest levels of H3K9me3 are observed in sample NCI-H82, which expresses low levels of CT47. SCLC cell lines NCI-H69, NCI-H187 and H1607 show similar low levels of H3K9me3. H3K27me3 levels do not show a reduction but an increase, except for NCI-H69. H3K27me3 is generated by the catalytic subunit of the PRC2 complex, the histone methyl transferase EZH2. Transcriptionally silenced regions often show the simultaneous presence of H3K27me3 and EZH2. We therefore also tested the presence of EZH2 on the CT47 promoter in LCLs and SCLCs, and observed that EZH2 levels are strongly reduced in SCLCs when compared with LCLs.

The chromatin structure of the SCLCs was also analyzed in the third exon of the CT47 gene and 3700bp distal to the CT47 array (Figure 4b). Chromatin markers showed a very similar picture in the third exon as in the promoter region, except sample NCI-H82, where we detected H3K9me3 levels comparable to the LCL sample. We detected high levels of H3K9me3 and H3K27me3 compared to H3K4me2 in the LCL 385. Interestingly in this region, H3K27me3 was more abundant than H3K9me3. SCLCs showed a general loss of repressive chromatin markers in this region. EZH2 levels corresponding to the H3K27me3 levels were high in LCL 385 and lower in the SCLCs. Thus, in LCLs the CT47 promoter region contains both the transcriptional repressive histone modification H3K27me3 together with EZH2, and the heterochromatic histone modification H3K9me3. In SCLCs, the heterochromatic H3K9me3 marker is dramatically reduced and it seems that the epigenetic regulation of transcription through the PRC2 complex is disturbed: while EZH2 levels are decreased in SCLCs, the histone marker generated by PRC2, H3K27me3 is not reduced. Thus, we observed that transcriptional activity inversely correlates with H3K9me3 levels and EZH2 occupation, rather than directly with H3K4me2 levels.

Expression of CTAs in different tumor samples was previously shown to correlate with DNA hypomethylation.18, 27, 28 Using bisulfite sequencing, we measured the level of DNA methylation in the promoter region of CT47, from 275bp upstream from the first exon, in LCLs and SCLCs (Figure 4e), as well as in hESCs and hEBs (Supplementary Figure S3). The promoter region of CT47 contains seven CpGs. Quantitative methylation analysis of these seven CpGs showed that the promoter region of CT47 is highly methylated in LCLs, hESCs and hEBs. A significant difference was detected in DNA methylation between LCLs and SCLCs, with SCLCs showing decreased methylation levels at all seven CpGs (P<0.01).

Discussion

On human Xq24, highly homologous copies of CT47 are organized into an MSR of tandemly arranged units of each 4.8kb in size. This polymorphic MSR varies from 4 to 17 copies in length (manuscript in preparation and Chen et al19). As CT47 has a very specific expression profile, it is a good model to study the tissue-specific epigenetic regulation of MSRs. CTAs form a large family of protein coding sequences with specific expression pattern: they are expressed in the testis and certain types of cancers. Our study presents the first detailed analysis of the chromatin structure of a CTA in non-malignant and malignant cells and during early stages of development. Although biological processes leading to cancer should preferably studied on human tumor material, the established SCLCs represent a valuable and readily available tool for studying genetic29 and epigenetic changes30 in cancer.

Our expression data are in a good agreement with earlier studies showing that CT47 expression is restricted to the testis and malignant cells,19 albeit that the expression in SCLCs is only a fraction of that observed in the testis. We cannot exclude the possibility that only a small proportion of SCLC cells in culture express CT47, analogous to the presence of small subpopulation of solid tumors.2

We did not observe the expression of CT47 in hESC cells and hEBs. The CTAs MAGE-A, GAGE and NY-ESO-1 were previously shown not to be expressed in hESCs,31 whereas MAGE-A2 and MAGE-A6 expression could be detected in 20-day-old hEBs. Although we cannot exclude the possibility that the absence of CT47 transcript in our hESC and hEB samples is the result of the epigenetic fluidity of human hESCs,23, 32, 33 our results suggest that CT47 transcription is induced at later stages of development, possibly between the pluripotent stem cell and primordial germ cell stages, from which mature germ cells are developing.

Chromatin analysis of non-expressing cells showed high levels of heterochromatic histone modifications H3K27me3 and H3K9me3, and relatively low levels of the euchromatic histone modification H3K4me2 in the promoter region of CT47. We could not detect a correlation between CT47 array length and its chromatin structure in analogy with FSHD, where contractions only below a threshold of 11 units result in D4Z4 chromatin relaxation.11 Rather, we observed a high variation in the abundance of different histone modifications in samples with the same array lengths (Figures 2 and ​and3).3). As D4Z4 shows similar variability in levels of histone modifications with repeat lengths >11 units, it is likely that 4 units of CT47 is sufficient to ensure a heterochromatic chromatin environment in non-germline cells and opens up the possibility that each MSR needs a minimal, but for each MSR unique, number of repeat units to establish its appropriate chromatin structure.

We observed a relative higher abundance of H3K27me3 to H3K9me3 in female samples. The observation that females have relatively higher levels of H3K27me3 than males suggests that repression of CT47 is differently established on Xa and Xi. Indeed, Xi in female samples can be more densely populated with H3K27me3 than Xa.34 Nevertheless, this is different from the X-chromosomal MSR DXZ4, showing that two MSR on Xq can behave very differently in chromatin structure on the Xa and Xi.

A unique feature of the pluripotent state is the so-called bivalent chromatin state defined by the presence of permissive (H3K4me3) and repressive (H3K27me3) histone modifications.35 Bivalent states are found at promoter regions of genes expressed at low levels in hESCs that are associated with differentiation. We detected identical low levels of H3K4me2, H3K4me3 and H3K27me3 in hESCs and hEBs at the CT47 promoter region, suggesting that the product of this gene is not necessary for early development.

Repetitive elements were identified as targets of PRC1 and PRC2 complexes in the pluripotent state.36 Our results show that the MSR CT47 has, like D4Z4,11 a heterochromatic structure already at early stages of development, marked by abundant H3K9me3 levels, and that this chromatin structure does not change during differentiation. The high levels of H3K9me3 suggest that repression of MSR is mainly governed by SUV39H1 and SUV39H2 and not so much by PRC complexes.

SCLC cell lines, which do express CT47, show a disturbed chromatin structure of this MSR. In addition, male cell lines NCI-H69 and NCI-H187 harbor three arrays and NCI-H82 two arrays of CT47 MSR, suggesting that they have become aneuploid for the X chromosome. We detected significantly lower levels of H3K9me3 in these cancer cell lines without a concomitant relative increase in the abundance of the transcriptionally permissive histone modification H3K4me2. This might explain the relatively low levels of CT47 expression in these cells. Rather than transcriptional reactivation, SCLCs seem to show a relative loss of transcriptional repression, resulting in leaky expression of CT47. Owing to technical limitations, we can currently not exclude the possibility that individual units within an array are differently organized.

In contrast to H3K9me3, H3K27me3 levels are not significantly lower in SCLCs, whereas EZH2 levels are significantly decreased in these samples. This shows that SCLC cells have serious defects in coordinated epigenetic regulation of this MSR, consistent with reports of genome-wide epigenetic changes in other malignancies.16

CTA gene expression is often correlated with DNA promoter hypomethylation. We observed that all of the CpG dinucleotides examined in the promoter region of CT47 show a significant decrease in DNA methylation in SCLCs compared with the otherwise high methylation levels in non-malignant cells. Epigenetic factors like DNA methylation and histone modifications are acting in a coordinated way to establish normal development. Our study reveals that transcriptional regulation/derepression of CT47 is dominated by the loss of repressive markers such as DNA methylation and H3K9me3: SCLCs with significantly lower levels of DNA methylation and H3K9me3 at the promoter region are expressing CT47, despite the absence of open chromatin markers.

We observed striking commonalities between the autosomal MSR D4Z4 and the X-chromosomal MSR CT47. In normal somatic cells, hESCs and hEBs, both MSRs are highly methylated and have high levels of repressive chromatin markers. In the adult body, DUX4,37 encoded by D4Z4, and CT47 transcripts are highly abundant in the testis and ovary. In disease conditions, like FSHD or cancer, there is a loss of DNA methylation and repressive chromatin markers, but no increase in permissive chromatin modifications, leading to low or leaky expression levels of the MSR-encoded genes. Like CT47 expression in SCLCs, evidence was presented that DUX4 is expressed in malignant adult and fetal human rhabdomyosarcoma cell lines,38 and therefore DUX4 also fits the definition of a CTA. The shared features of epigenetic regulation and expression patterns of CT47 and D4Z4 suggest that other MSRs may also be regulated similarly. A recent publication shows that indeed the family of MSRs exclusively expressed in the germ line and testis is expanding and includes new members like TAF11-Like, PRR20 and ZAV.9 Understanding the epigenetic organization of MSRs in health and disease, which cover a high portion of the human genome, may therefore significantly advance our understanding of the epigenetic map of the human genome and of germline biology.

Acknowledgments

This work was supported by institutional funds from the LUMC, the Netherlands Organization for Scientific Research (NWO 917.56.338); Breakthrough Project Grant by the Netherlands Genomics Initiative (NWO 93.51.8001); and the National Institutes of Health (P01NS069539). We thank Peter Thijssen for critical reading of the manuscript and Yvonne Krom with helping the DNA methylation analysis.

Bruce HA, Sachs N, Rudnicki DD, et al. Long tandem repeats as a form of genomic copy number variation: structure and length polymorphism of a chromosome 5p repeat in control and schizophrenia populations. Psychiatr Genet. 2009;19:64–71.[PubMed]