This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The presence of the methylated nucleobase 5MedC in CpG islands is a key factor that determines gene silencing. False methylation patterns are responsible for deteriorated cellular development and are a hallmark of many cancers. Today genes can be sequenced for the content of 5MedC only with the help of the bisulfite reagent, which is based exclusively on chemical reactivity differences established by the additional methyl group. Despite intensive optimization of the bisulfite protocol, the method still has specificity problems. Most importantly ~95% of the DNA analyte is degraded during the analysis procedure. We discovered that the reagent O-allylhydroxylamine is able to discriminate between dC and 5MedC. The reagent, in contrast to bisulfite, does not exploit reactivity differences but gives directly different reaction products. The reagent forms a stable mutagenic adduct with dC, which can exist in two states (E versus Z). In case of dC the allylhydroxylamine adduct switches into the E-isomeric form, which generates dC to dT transition mutations that can easily be detected by established methods. Significantly, the 5MedC-adduct adopts exclusively the Z-isomeric form, which causes the polymerase to stop. O-allylhydroxylamine does allow differentiation between dC and 5MedC with high accuracy, leading towards a novel and mild chemistry for methylation analysis.

INTRODUCTION

DNA methylation is an epigenetic mechanism for transcriptional regulation (1–2). The process controls cellular differentiation and is defective in many diseases including cancer (3–5). DNA methylation involves substitution of the H atom at C5 of the cytosine base (dC) by a methyl group to give 5-methylcytosine (5MedC) (6). These methylated 5MedC bases trigger processes that finally lead to gene silencing. Today, the detection of 5MedC bases in a given DNA sequence is possible with restriction enzymes, which cleave selectively unmethylated DNA (7), or with antibodies, which recognize methylated DNA (8–10). The current gold standard, however, is bisulfite sequencing. This reagent selectively deaminates dC to dU but does not affect 5MedC (11, 12). The position of the newly formed dU base is then detected with standard sequencing methods after PCR amplification of the treated DNA material (13). Bisulfite sequencing is currently in widespread use (14–18). However, during bisulfite analysis ~95% of the DNA material is degraded, which limits 5MedC detection if only a small amount of sample is available (19–21). This drawback fuels current research to find alternative 5MedC detection and sequencing methods (22). Research in this direction has furnished the result that Br+ ions add selectively to 5MedC to give cleavable DNA sites (23). OsO4 and reagents derived thereof were found to cis-hydroxylate dC bases with some selectivity (24, 25). Furthermore photochemical methods were reported (26, 27) that allow some discrimination and finally even compounds were reported that are added during bisulfite sequencing to trap intermediates of the bisulfite reaction (28). All reported approaches target the reactivity differences between dC and 5MedC imposed by the additional methyl group. However, as this difference is small, all new procedures are currently unable to distinguish between the dC and 5MedC efficiently or their general approach interferes with current sequencing techniques.

Here we report a novel concept that does not rely on reactivity differences but in contrast allows detection of 5MedC directly based on the reaction product. Incubation of dC and 5MedC with hydroxylamine derivatives gives E- and Z-configured oxime-type adducts (Z/E-1 and Z/E-2, see Scheme 1) (29,30). The E- and Z-configured products are in equilibrium via the amino isomeric forms A-1 and A-2. Each of these isomers (Z, A and E) features different H-bonding characteristics. Whereas the amino tautomers (A) base pair like dC, the E-isomers are expected to code like dT. The Z-isomers likely cause stalling of polymerases during replication (Scheme 1) because they interfere with normal base pairing. NMR studies have shown that the pyrimidine oximes like 1 and 2 are present mainly in the imino tautomer and that the Z-isomers are the most stable forms (31). In the case of 5MedC, we expected that the isomer E-2 would be disfavored because of the steric strain imposed between the methyl group and the O-allyl chain. The 5MedC-adduct 2 should therefore prefer Z-configuration (Z-2) or exist in the amino tautomeric form (A-2). As such, we expected that this product would prefer base pairing with dG. In the case of the dC adduct 1 steric strain with the hydrogen atom at C5 is much lower and we hoped that this adduct would in contrast adopt at least to some extent the E–configuration, which would allow base pairing with dATP. The resulting dC to dT transition mutation we thought could be exploited as the desired signal that allows dC to be distinguished from 5MedC. As our approach is based on mutagenicity, it is sequence independent and can also detect methylated cytosines that are not in a CpG context as found, for instance, in plant genomes. This is an advantage over restriction enzyme based methods that are limited by the specificity of the enzyme.

Reaction of dC and 5MedC with hydroxylamine derivatives. Depiction of the tautomeric equilibrium and the interconversion process between the Z-isomers and the E-isomers. 1: dC-O-allylhydroxylamine—adduct; 2:5MedC–O–allylhydroxylamine—adduct....

MATERIALS AND METHODS

Incubation of DNA strands

In a thermocycler (Eppendorf Mastercycler personal) with a heatable lid a single-stranded oligodeoxynucleotide ODN (10µM) (Metabion, Martinsried, Germany) was incubated with NH2OAllyl (1M, pH 5.2 with HNEt2, Fluka, Buchs, Switzerland) for 4h at 60°C. For incubation kinetics 200pmol of the DNA were directly injected into a HPLC without prior desalting (for detailed incubation kinetics please refer to the Supplementary Data). If the sample was to be sequenced it was desalted using BioSpin 6 Tris columns (BioRad, Hercules, CA, USA) that were previously equilibrated three times with 500µl ddH2O. The filtrate was concentrated in a SpeedVac (Christ RVC-2-33 IR), re-dissolved in NH3 (28% in H2O, Fluka, Buchs, Switzerland) and heated to 60°C for 90min in a thermocycler with a heatable lid. The solution was again desalted using BioSpin 6 Tris columns. The concentration of the solution was determined with a NanoDrop photometer (PeqLab ND-1000). The treatment with ammoniumhydroxide is optional but proved to be beneficial for the subsequent sequencing process because it converts the bis-hydroxylamineadduct (Supplementary Data) which is to a small extent formed into the desired compound 1 in base promoted elimination reaction.

Hybridization of DNA

An amount of 1.2µM ODN, 1µM primer and a suitable buffer (NEBuffer 2, NEB, Frankfurt am Main, Germany) were mixed and heated to 95°C for 4min in a thermocycler with a heatable lid. The solution was allowed to cool at a rate of 1°C per minute and stored at 4°C in the dark.

Direct sequencing

An amount of 10pmol dsDNA in 25µl Annealing Buffer (Qiagen, Hilden, Germany) were sequenced in a PyroMark Q24 Pyrosequencer. All bases except the lesion generated by incubation were sequenced using standard conditions. Because the incorporation of nucleotides opposite the incubated dC or 5MedC is slower than opposite canonical bases, addition of the nucleotide was repeated 10 times. The data was analyzed by the software provided by the manufacturer. The software sums up the peak heights and determines the ratio of dATPαS/dGTP incorporation. The last methylation site cannot be detected, because the last position in a strand generally yields no results in pyrosequencing. Please note that due to the special pyrosequencing procedure dATPαS instead of dATP was used.

Primer extension based sequencing

An amount of 10pmol of the incubated sample were hybridized with a biotinylated primer. 0.1U/µl Klenow(Exo-) polymerase, 500µM dNTPs and 1µM dsDNA in 20µl 1× NEB Buffer 2 were incubated for 10–60min at 60°C. To the solution were added 2µl Streptavidin Sepharose beads (GE Healthcare, Uppsala, Sweden), 40µl Binding Buffer (Qiagen) and 18µl ddH2O. After agitation at 1400rpm for 15min the beads were captured with a Vacuum Prep Tool (Qiagen), washed with 70% EtOH, 0.1M NaOH and Washing Buffer (Qiagen). The beads were dissolved in 25µl Annealing Buffer (Qiagen) containing 10pmol sequencing primer (Metabion). Pyrosequencing was performed on a PyroMark Q24 Pyrosequencer using standard conditions (Qiagen). The data was analyzed by the software provided by the manufacturer. The average dTTP incorporation of all experiments was calculated and the data plotted relative to this value. Negative deviations from average correspond to 5MedC, positive values to dC. The data displayed are the average of three measurements.

Real-time primer extension based quantification

ODN1C and ODN1M were mixed at different ratios, incubated with NH2Oallyl and hybridized with ODN2 as described earlier. In a 96-well plate (NUNC, Roskilde, Denmark) were mixed per well: 10pmol dsDNA (10µl), 0.15U ATP sulfurylase (0.5µl), 1U Klenow(exo–) polymerase (0.2µl), 500pmol APS (1µl), 78.3µl ENLITEN luciferase, luciferin reagent (Promega, Madison, WI, USA). In a 96-well reader (TECAN Genios Pro) at 30°C 10µl dATPαS (20µM) were injected at 200µl/s. The plate was shaken for 5s (amplitude 5mm) and the luminescence measured every 250ms for 400 cycles. The data was collected by the Microsoft Excel Makro provided by the manufacturer. The average of four blank measurements (enzymes without DNA) was subtracted from the average of four measurements of the sample. The data for dATPαS incorporation opposite undamaged dC was taken as baseline. In the first 15–20s the curves are unstable due to mixing effects, so the data is only shown from second 20 onwards. The slope of the curves shows the speed of incorporation of dATPαS which corresponds to the fraction of dC in the sample.

Protein purification

The polymerase subunit of DNA polymerase I from Geobacillus stearothermophilus (DSM No. 22) was amplified via PCR and subcloned in pENTR-TEV-D-TOPO (Invitrogen, Karlsbad, USA) to introduce an N-terminal TEV-protease recognition site. The fragment was then transferred to the expression plasmid pDEST007 (32), yielding a cleavable N-terminal Strep-tag II. The protein was expressed in Escherichia coli BL21 at 37°C by induction at OD600=1 with 0.2mg/l anhydrotetracycline for 2h. Cells were resuspended in 100mM Tris–Cl pH 7.5, 150mM NaCl, 10mM ß-ME and Complete protease inhibitor (Roche), lysed by French-press and heated to 50°C for 10min to denature E. coli proteins. 0.1% NP-40 and Tween-20 were added to the lysate prior to heat-treatment and maintained during Strep-tag purification. The cleared lysate was loaded on a Streptactin column (IBA) and eluted with desthiobiotin. Subsequently, the tag was cleaved by incubation on ice with 10U/mg AcTEV protease (Invitrogen) over night. Further purification via Heparin affinity chromatography and crystallization was performed as described in (33).

Co-crystallization

For co-crystallization the template containing lesion 1 was annealed to the corresponding primer (for sequences see Supplementary Data) in the protein storage buffer (10mM Na–cacodylate pH 7, 50mM NaCl, 0.5mM EDTA 10mM MgSO4). Prior to crystallization protein and DNA were mixed in a 1 to 3 molar ratio. The final concentration of Pol I and dsDNA were 5mg/ml and 0.5mM, respectively. Crystals were grown by mixing an equal volume of protein-DNA complex with 47.5–51.0% (NH4)2SO4, 3.0–3.5% MPD and 100mM MES pH 5.8, using the hanging-drop vapor diffusion method. The crystallization plates were incubated at 18°C and crystals appeared after 1–2 days. Crystals were frozen in 24% sucrose, 55% (NH4)2SO4, 3.0–3.5% MPD, 100mM MES and stored in liquid nitrogen for data collection. Best crystals were obtained with 49.5% (NH4)2SO4 and 3% MPD.

Data were collected at the beamline PXIII [Swiss Light Source (SLS), Villigen, Switzerland] The data were processed with the programs XDS (34) and SCALA (35, 36). Structure solution was carried out by molecular replacement with Phaser (37) using the coordinates of 1U45 (38). In order to reduce model bias, the temperature factors were reset to 20 for main chain and 40 for side chain and DNA atoms, respectively. Prior to model building in COOT (39) a simulated annealing omit map, removing the area around the lesion, was calculated with PHENIX (40). Restrained refinement was carried out on REFMAC5 (41). Data processing and refinement statistics are summarized in Supplementary Table S1.

RESULTS

What are the coding properties of the DNA lesions 1 and 2?

In order to investigate whether the dC-adduct 1 and 5MedC-adduct 2 have different coding properties we reacted the oligodeoxynucleotide ODN1 (Figure 1A) containing either a dC or a 5MedC (ODN1C and ODN1M) with various hydroxylamine derivatives (10µM DNA, 1M NH2OR, pH=5.2, 4h, 60°C). The best results were finally obtained with O–allylhydroxylamine, which furnished lesions 1 and 2 cleanly and in high yields (Supplementary Data). We observed a small reactivity difference between dC and 5MedC, with the latter reacting three to five times slower (42). Most importantly, no DNA degradation or reaction with other DNA bases was observed.

To investigate the coding properties of the reaction products 1 and 2 we performed primer extension reactions with the primer ODN2, Klenow(exo-) polymerase and ODN1C or ODN1M (Figure 1A1–3). For both oligonucleotides we observed full extension before and after treatment with O-allylhydroxylamine. This proves that the polymerase can indeed read through the hydroxylamine adduct 1. When dGTP was removed from the primer extension assay (Figure 1A3), we observed that after incubation with O–allylhydroxylamine only unmethylated ODN1C but not methylated ODN1M was copied. This shows that adduct 1 is replicatively bypassed and that it is base-paired with a triphosphate other than dGTP. Further studies, depicted in Figure 1B, showed that the polymerase is able to pair 1 with dATP. In order to investigate the relative incorporation efficiency of dATP and dGTP opposite 1 the primer extension studies were repeated with dGTP replaced by ddGTP (Figure 1A2). The data show that Klenow(exo-) pairs 1 with dGTP and dATP with similar efficiencies. Other polymerases tested including low fidelity polymerases such as Pol η and Pol κ showed the same behavior. In the case of ODN1M we observed full extension only when all four triphosphates are present (Figure 1A1). No elongation was detected in the absence of dGTP, showing that adduct 2 either only allows incorporation of dGTP or blocks the polymerase. Further studies, shown in Figure 1C, with ODN3 in which we synthetically inserted the 5MedC adduct 2, confirmed that bypass of adduct 2 is indeed difficult and not possible with dATP.

In summary, the results show that oxime 1 instructs a polymerase to introduce dG and dA into the primer. Adduct 2 in contrast hinders replication and instructs the polymerase to incorporate a dG. Both adducts 1 and 2 consequently differ dramatically regarding their coding characteristics, and only the adduct derived from dC gives rise to dC to dT transition mutations, providing the requested readout for epigenetic sequencing.

What is the structural basis of the different coding properties?

In order to prove that the formation of a E-1:dA base pair is the reason for the mutational bypass we crystallized a duplex containing the base pair with the protein B.st Pol I (38). The full structure is depicted Figure 2. It shows the E-1:dA base pair determined at a resolution of 2.9Å. The O–allyl–hydroxylamine-dC adduct 1 is indeed present in the E–configuration and it base pairs with dA via two perfect Watson–Crick type H–bonds as expected.

Crystal structure of BstPolI in complex with a DNA double strand containing a 1:dATP base pair (PDB: 2xo7). 2Fo–Fc electron density for the base pair is depicted at the 1σ level.

Can the coding properties be exploited for a direct readout of the methylation status?

In order to study if the coding difference of the dC and 5MedC oxime adducts 1 and 2 can be exploited for epigenetic sequencing, we analyzed the promoter sequence of the cyclin-dependent kinase inhibitor gene p15, known to be aberrantly methylated in acute myeloid leukemia patients (43–45). For the study we used ODN5 (Figure 3A) containing two either methylated or unmethylated CpG sites and one dC site that does not neighbor a dG. After incubation of ODN5 with O–allylhydroxylamine followed by hybridization with a sequencing primer ODN6 the three DNA constructs were directly subjected to pyrosequencing, which visualizes elongation of the primer strand (ODN6) by a luminescence signal (46). The results for position 23 and 26 are depicted in Figure 3B. As expected, both dC and 5MedC in untreated oligonucleotides direct the addition of dGTP to the primer. The treated DNA strands, in contrast, showed significant incorporation of dATP opposite the dC. Opposite 5MedC this misincorporation was in all cases not detected (<5% which is the error range of the instrument, Figure 3B). Important is the finding that the O–allylhydroxylamine modified dC base 1 instructs incorporation of ~50% dATPαS regardless of the position of the dC base, showing that no sequence effects occur.

(A) Fragment of the p15 promoter used for direct sequencing. (B) Direct sequencing of ODN5 with primer ODN6; Workflow: Incubation with NH2Oallyl converts dC to 1, 5MedC is to a small extent converted to 2. 1 mispairs with dATPαS, while 5MedC pairs...

Does the novel method allow sequencing of DNA fragments usually employed in pyrosequencing?

We extended the new epigenetic sequencing possibility to the analysis of a longer promoter sequence. For this experiment we choose an 84-bp fragment with three either methylated or unmethylated CpG sites (ODN7, CpG at positions 23, 34 and 57, for sequence see Supplementary Data) of the p15 promoter. Such a promoter sequence is a typical target of bisulfite sequencing. For the experiment (Figure 4A) we first incubated the different 84–mer oligonucleotides with O–allylhydroxylamine and subsequently hybridized the treated strands with the biotinylated primer ODN8, (for sequence see Supplementary Data). We next performed the primer extension reaction with the Klenow(exo-) polymerase and isolated the product DNA strand via streptavidin-coated sepharose beads. Following hybridization with the sequencing primer (ODN9, for sequence see Supplementary Data) the obtained oligonucleotides were finally analyzed via pyrosequencing. The results are depicted in Figure 4B.

Primer extension based sequencing of ODN7 in four different methylation states, (A) Workflow: Incubation with NH2Oallyl converts dC to 1, 5MedC is to a small extent converted to 2. 1 mispairs with dATP, while 5MedC pairs with dGTP and 2 blocks. After...

The sequencing data clearly furnished a dT signal at all original dC positions in the promoter. The signal was in all cases at least 15–20% above background. Again, the same signal intensities are obtained at all dC positions showing again that sequence effects are negligible. The O–allylhydroxylamine based epigenetic sequencing method in the reported form is consequently only limited by the pyrosequencing method, which currently limits sequencing to <100- to 150-base long oligonucleotides. As the result provides the same final readout as bisulfite sequencing it is fully compatible with current epigenetic sequencing equipment.

Can the ratio of dC to 5MedC be determined in a sample?

Generally, the extent of the methylation can be determined by comparing the relative heights of the dT and dC signals as done in bisulfite sequencing. Additionally, we aimed to develop an alternative approach that is based on a real time primer extension reaction. By coupling a polymerase to a luciferase, elongation of a DNA strand can be monitored by light. In the case of 5MedC no light upon addition of dATPαS can be detected. If a mixture of dC and 5MedC is present the amount of luminescence represents the fraction of dC in the sample. As shown in Figure 5 the extent of methylation of a specific CpG site can be readily determined after calibration.

Real time primer extension based quantification of the extent of the methylation in ODN1. The slope of the curves shows the speed of incorporation of dATPαS which corresponds to the fraction of dC in the sample.

Can the method distinguish between 5MedC and 5HOMedC?

Recently, 5-hydroxymethylcytosine (5HOMedC) has been found as a second post-replicatively formed DNA base (47, 48). Newly developed methods allow for precise quantification of 5HOMedC levels in tissues, but do not yield sequence-specific information (49). It is highly desired to have a method that can distinguish 5HOMedC and 5HOMedC at a single base resolution. Unfortunately, as the detection principle presented in this work is based on sterics and the new base 5HOMedC even imposes a larger steric strain it is not possible to distinguish 5MedC and 5HOMedC after incubation with O–allylhydroxylamine.

CONCLUSION

In conclusion, we show that the E/Z equilibrium of O–allylhydroxylamine dC/5MedC adducts is influenced by the presence of a methyl group at C5. The lesion generated from dC after O–allylhydroxylamine treatment is bypassed by base pairing with dA and dG. In contrast, after incubation of 5MedC only incorporation of dG can be observed. This allows for the first time detection of the methylation status of a cytosine not only based on reaction rates but based on different reaction products. We used this observation to establish the first bisulfite independent 5MedC sequencing principle. The selectivity of the method is controlled by two chemical filters. First, dC reacts faster with NH2Oallyl and secondly, only the dC adduct 1 but not the 5MedC adduct 2 can induce dC to dT transition mutations, which can be detected by standard sequencing methods. As this is the same readout as for the bisulfite method our system is fully compatible with current sequencing equipment. The next focus in our research will be to couple our novel detection principle to modern sequencing-by-synthesis methods that could allow our method to be routinely applied for the methylation analysis of small amounts of DNA. Alternatively one could imagine PCR based approaches that are employed in bisulfite sequencing or to exploit the mutagenic potential of hydroxylamines by subcloning in bacteria.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online. The crystal structure can be accessed under pdb code 2xo7.

Supplementary Material

ACKNOWLEDGEMENTS

The authors thank Claudia Szeibert and Andrea Kneuttinger for their help during the project, Sabine Schneider for her help solving the structure and the beamline staff at the SLS for technical support. M. Münzel likes to thank the Verband der chemischen Industrie (VCI) for a Kekulé Fellowship.