Abstract

Modified DNA bases are widespread in biology. 5-Methylcytosine (mC) is a predominant epigenetic marker in higher eukaryotes involved in gene regulation, development, aging, cancer, and disease. Recently, 5-hydroxymethylcytosine (hmC) was identified in mammalian brain tissue and stem cells. However, most of the currently available assays cannot distinguish mC from hmC in DNA fragments. We investigate here the physical properties of DNA with modified cytosines, in efforts to develop a physical tool that distinguishes mC from hmC in DNA fragments. Molecular dynamics simulations reveal that polar cytosine modifications affect internal base pair dynamics, while experimental evidence suggest a correlation between the modified cytosine’s polarity, DNA flexibility, and duplex stability. Based on these physical differences, solid-state nanopores can rapidly discriminate among DNA fragments with mC or hmC modification by sampling a few hundred molecules in the solution. Further, the relative proportion of hmC in the sample can be determined from the electronic signature of the intact DNA fragment.

Introduction

Modified DNA bases (chemical derivatives of A, G, C, or T) are present in all kingdoms of life, and their diversity suggests functional roles in living organisms1. In viruses and bacteria, modified bases have been identified and studied since the 1950’s, and some functions have been elucidated 2. For example, T-even bacteriophage DNA contains glucosylated cytosines, which protect the viral genome from cleavage by most host cell endonucleases3. In higher eukaryotes cytosine methylation is a prominent epigenetic marker, involved in gene expression and development and disease. It has been suggested that the presence of 5′-methylcytosine (mC) in CpG island promoter regions affects the binding of transcription factors, and consequently gene expression4. Mapping the sites of cytosine modification in genomes and comparing methylation patterns among individuals is crucial for understanding the underlying mechanisms by which gene expression is controlled.

In 2009, substantial amounts of 5-hydroxymethylcytosine (hmC) were identified in mammalian brain tissues and embryonic stem cells5, 6, and the Tet enzyme family involved in mC to hmC oxidation was identified6. The presence of hmC in mammalian DNA confirms earlier findings by Penn and co-workers in 19727. Despite these decades old findings, the physiological role of hmC modification remains elusive. It is not known whether hmC is an epigenetic marker or simply an oxidative intermediate in DNA demethylation. A major reason behind this limited knowledge has been the difficulty to distinguish mC from hmC. Bisulfite sequencing, the primary biochemical assay for mapping cytosine methylation, cannot distinguish between mC and hmC bases, as both modifications protect bisulfite-mediated oxidation of cytosine to uracil8–10. Only recently, fluorescence experiments using a single polymerase enzyme have shown that different cytosine modifications in a specific sequence of a template strand have significant effects on the incorporation kinetics of a DNA polymerase11. While a few hmC-specific antibodies have recently been made available, their use for immunoprecipitation-based enrichment remains technically challenging.12 To our knowledge, antibody-free or enzyme-free assays for distinguishing hmC from mC in an intact genomic fragment are not available.

In this work, we explore the physical properties of DNA molecules that contain modified cytosines. Biological and solid-state nanopores are extremely sensitive devices for discriminating among different nucleic acids13–24, and are promising candidates for DNA sequencing25. In this paper, we show that solid-state nanopores are capable of detecting modified cytosines, and that these changes are a result of the DNA mechanical properties. Molecular dynamics (MD) simulations, previously used to describe the physics of DNA translocation through a nanopore26, are employed here to understand the mechanisms that alter the physical properties of the modified DNA.

Results

Figure 1a shows a thin silicon nitride membrane that separates two electrolyte chambers, with a nanopore being the only junction between the two chambers. Application of voltage ΔV across the membrane drives ions through the pore, resulting in a steady-state ion current that is measured using a low-noise current amplifier. When biomolecules pass through the pore, the magnitude of residual ion current is used to report on the structure of the biomolecule. Figure 1b shows continuous time traces of the current through a 4 nm diameter pore in a 20 nm thick membrane before and after the addition of 3 kbp double-stranded DNA to the chamber with the negative electrode. Addition of DNA results in a sequence of current blockade spikes, each corresponding to the transport of a single DNA molecule. The magnitude of each spike is related to the excluded volume of biopolymer that occupies the pore. We define here two quantities (see Figure 1b): 1) ΔI corresponds to the mean current amplitude of an event, and 2) tT corresponds to the transport time, or total duration of the event. The all-point current histogram in Figure 1b shows that DNA occlusion of the pore produces characteristic current amplitude. Characterization of a sample is performed by statistical analysis of the two parameters ΔI and tT.

Nanopore detection of modified cytosines. (a) Scheme of a nanopore in a thin silicon nitride membrane, showing a DNA molecule being driven through it by an applied voltage ΔV. A TEM image of a 4 nm diameter pore in a 20 nm thick SiN membrane is...

Differentiation of cytosine and 5-methylcytosine from 5-hydroxymethylcytosine

We first show that nanopores can discriminate among identical DNA sequences with different cytosine modifications. Toward this goal, PCR conditions that utilize Phusion DNA polymerase (Finnzymes/NEB) were optimized to yield approximately equal incorporation of native and modified cytosine nucleotides into the replicated strands (see SI-1). Using these conditions, we then prepared DNA fragments that exclusively contain cytosine, 5-methylcytosine, or 5-hydroxymethylcytosine, hereafter called C-DNA, mC-DNA, or hmC-DNA, respectively. Sequencing the amplified DNA fragments verified that no specific mutations were introduced during PCR amplification.

Ion-current traces for a 3 kbp fragment with different cytosine modifications reveal larger ΔI and tT values for hmC-DNA than for C-DNA or mC-DNA (Figure 1c). This result is striking, given the small chemical differences between the different cytosines. We consistently observed similar results for at least ten pores of diameters in the range 4.0 ± 0.3 nm, as well as for a set of 400 bp DNA fragments (see SI-2). Our results were also reproduced by measurements of mC-DNA and hmC-DNA in a blind experiment using a different nanopore measurement setup.

To further explore the mechanism that accounts for deeper blockade amplitudes for hmC-DNA, we studied the dependence of ΔI on bath temperature (see Figure 2). To generate the plots, we fit the ΔI distributions to Gaussian functions in order to find the most-likely current amplitudes, ΔImax. Full ΔI distributions for all three DNA molecules at 10°C and 21°C are shown in the inset to the plot. We find that ΔImax values for mC-DNA and C-DNA are similar at all temperatures. For clarity, we therefore focus on comparing ΔImax values for mC-DNA to hmC-DNA; both of which increase with temperature. However, while the increase in ΔImax is expected with temperature because of the increased ion mobility, ΔIhmC clearly exhibits a non-monotonic behavior, increasing more sharply than ΔImC in the range 10–30°C. The ratio ΔIhmC/ΔImC, shown above the plot in Figure 2, systematically shows that hmC-DNA blocks more current amplitude than mC-DNA, by as much as 40% at 30°C. At 21°C, this difference in ΔImax between mC-DNA and hmC-DNA is ~300 pA.

Signal amplitude dependence on bath temperature for mC-DNA and hmC-DNA. Each point in the graph represents the peak ΔI value (ΔImax) obtained from Gaussian fits to ΔI distributions for each experiment. Dashed lines are guides to...

While we have distinguished between DNA that exclusively contains either mC-DNA or hmC-DNA, mammalian DNA contains relatively small amounts of hmC, up to a few percent of total cytosines5, 27. Therefore, to test the sensitivity of nanopores to sparsely modified DNA, we prepared identical sequences that contain different fractions of hmC compared to either C or mC, and confirmed the cytosine ratio using mass spectrometry (see SI-3). In Figure 3 we plot normalized ΔImax values for 3 kbp DNA samples with different hmC:C ratios. The position of ΔInorm increases with the fraction of hmC, both for samples that have mixed hmC:C and hmC:mC nucleotides. A linear regression fit to the data yields a slope of 8 ± 0.5 pA per percent of hmC. Rigorous statistical analysis of our data is shown in SI-6. From the fits to the Gaussian distributions, the certainty in the mean in ΔInorm for a population of >1,000 molecules is 3 pA. Therefore, for an ideal pore that does not fluctuate during the experiment lifetime, our method should in principle quantify hmC in DNA fragments with hmC abundances of ~1% of total cytosines in the fragment. However, statistical analysis of our data (SI-6) shows that the confidence with which we can discriminate among the 0% hmC and 3% hmC is only 91%; these amounts of hmC correspond to 0.75% of total bases, close to the values found in Purkinje neurons.5 To realize the full analytical potential of our discrimination technique, all sources of systematic errors must be overcome.

Normalized ΔImax values for 3 kbp DNA with mixed cytosines (ΔInorm). Different cytosine modification ratios hmC/C and hmC/mC were tested, and ΔImax values were found from Gaussian fits to the ΔI distributions. The values...

Effect of cytosine modifications on DNA structure

The different ΔImax values for mC-DNA and hmC-DNA suggests that transport through the pore is influenced by differences in DNA structure. In Figure 4a, we plot mean transport times tT for the three DNAs as a function of temperature19. In the range of 10–30°C, we note two main observations: First, tT values follow the trend hmC-DNA > C-DNA > mC-DNA. Second, while tT values decrease for C-DNA and mC-DNA with increasing temperatures, tT values increase for hmC-DNA, suggesting a thermally activated process that stalls DNA transport for hmC-DNA. We also note that transport times for C-DNA are significantly slower than for mC-DNA, although the difference is not as pronounced as the difference between mC-DNA and hmC-DNA. The ψ2 values For all of the transport time fits were between 0.8 and 1.7.

Temperature dependence on stability of duplexes with different cytosine modifications. (a) Mean transport times for 3 kbp C-DNA, mC-DNA, and hmC-DNA samples as a function of bath temperature. With increasing temperatures in the range of 10–30°C,...

The anomalous behaviour of hmC-DNA suggests a subtle change of the structure, since the three modifications have identical visible mobilities in agarose gel electrophoresis. Therefore, we turned to thermal annealing experiments for the three duplexes in order to shed light on structural differences, as shown in Figure 4b. Thermal annealing curves were obtained by monitoring the fluorescence of SYBR Green I® while reducing the temperature of the solution from 98°C in decrements of 0.2°C. Annealing is indicated by an inflection point in the raw fluorescence data (see SI-4), which appears as a peak in the differential −dF/dT. The temperature at which a peak in the differential curve is reached is referred to as the annealing or melting temperature Tm. The three molecules have a significant difference in Tm values with hmC-DNA < C-DNA < mC-DNA (see Figure 4a). Also, complete annealing of the duplex occurs over a wider temperature range for hmC-DNA, as shown in the inset of Figure 4b. Similar results were obtained in the six replicates that were performed.

The nanopore transport time data suggest that hmC-DNA stalls for longer times in the nanopore as temperature is increased, as opposed to smoother transport for mC-DNA or C-DNA. Similarly, the lower annealing temperature and lower annealing rate for hmC-DNA suggest that the duplex structure is less energetically stable, as compared to the other two molecules. Despite the temperature gap between the nanopore experiments and the annealing curves, both pieces of evidence suggest hmC-DNA has a more easily perturbed duplex structure, which can promote slower translocation through the pore. Enhanced permeation through small pores was recently observed by RT-PCR in a nanopore permeation study.21

To gain molecular insight, molecular dynamics (MD) simulations were performed on model DNA duplexes in 1 M KCl solution at neutral pH under ambient temperature and pressure (see SI-5 for more details about the computations)28. We computed 0.12 μs-long trajectories for a series of 27 bp d(A*CT)9·d(AGT)9 duplexes, where *C is either C, mC or hmC. Although not directly relevant to mammalian CpG methylation sites, this sequence was chosen to study the dynamics of isolated cytosines (i.e., G-*C bps spaced by two A-T bps).

Throughout the simulations, all duplexes maintained a B-form double helix of nearly identical diameter and axial orientation. While steric effects from the modifications produced very minor changes in the duplex’s average local structure, the most prominent effects of the modifications occur locally within G-*C base pairs. The polarity of the modification governs the strength of cytosine-water interactions. Average interaction energies, i.e., electrostatic and van der Waals energies, between cytosine major groove atoms and water molecules within the first solvation shell were −1.5, −1.0, and −1.8 kcal/mol for C-DNA, mC-DNA, and hmC-DNA, respectively. These energetic differences, along with the structure and volume of the chemical modifications, affect solvation dynamics in the major groove. Figure 5a displays differential water densities for hmC-DNA and mC-DNA relative to C-DNA. The excluded volume of the methyl and hydroxymethyl groups pushes solvent around the cytosine farther away from the duplex, as seen by the magenta bubbles in the Figure. Alternatively, van der Waals attraction between water molecules and the modifications increases solvent density around nearby phosphates and N4 atoms in mC and hmC. However, the largest changes in solvation are observed for hmC, because its hydroxymethyl group extends into the major groove towards the DNA 5′ end, creating polar cavities that increase solvation in the major groove (see cyan bubbles in Figure 5a). These cavities can capture water molecules within the first solvation shell, thereby increasing the probability of water to bind to hmC for long (~100 ps) timescales.

Results of molecular dynamics simulations of duplexes containing modified cytosines. (a) Water density in the major groove of mC (left) and hmC (right) relative to C. Cyan and magenta isosurfaces indicate regions of increased and diminished solvent density,...

Interestingly, the MD simulations also show that the polarity of cytosine modifications dictates local structural fluctuations in G-*C bps. The local geometry of DNA can be described by parameters that specify intra-bp, inter-bp and local helical axis conformations29. Figure 5b displays plots of intra-base pair fluctuation amplitudes for G-mC and G-hmC bps relative to G-C bps. Fluctuations in shear, stretch, stagger and buckle are 3–7% smaller in G-mC, while they are 1–5% larger in G-hmC. Due to steric constraints, rotary motion about the helical axis is impeded in G-mC and G-hmC basepairs, which results in reduced fluctuations in propeller twist and opening. Nevertheless, the overall trend for intra-base pair fluctuations is G-hmC > G-C > G-mC. This result is consistent with the polarity of the chemical modification and can be understood as a solvent-mediated effect; fluctuations enable greater opportunities for contact between DNA and water molecules. Because water has the highest affinity for hmC, G-hmC basepairs experience the largest fluctuations. In contrast, water is less favorable to solvate the hydrophobic methyl group of mC, which increases the rigidity of G-mC basepairs. Solvent effects are also responsible for an interesting conformational change in the amine attached to the cytosine 4 position. Water molecules can collide with this group and cause it to rotate by 180° (Figure 5c). This rotation is accompanied by a temporary disruption of hydrogen bonding in G-*C base pairs. Over the simulation timescales, 31 amine rotations are observed throughout the hmC-DNA duplex, compared to only 12 and 7 rotations in mC- and C-DNA duplexes, respectively.

In general, DNA fluctuations are governed by interplay between the steric effects and polarity of the modification. However, for intra-base pair fluctuations, the trend is clear: increasing the size of the modification tends to increase the local rigidity within basepairs, while increasing the polarity of the modification decreases rigidity. Decreased flexibility of mC-DNA duplexes has been observed in previous MD simulations30 and was attributed to the steric effect and hydrophobicity of the methyl group. A novel aspect to our findings is that the steric constraints of the modification, which increase local rigidity, can be mitigated by introducing a more polar modification. Thus, the hydrophilic hydroxymethyl group destabilizes G-hmC basepairs, while the hydrophobic methyl group stabilizes G-mC basepairs.

To determine how these microscopic fluctuations of modified cytosine bases affect global DNA structure, we used atomic force microscopy (AFM) to image surface-immobilized DNA molecules. Rivetti and co-workers31 have argued that measurements of the contour length (L) and end-to-end distance (R) of DNA molecules deposited on a flat 2D mica substrate yields the mean persistence length (P) of the DNA:

〈R2〉2D=4PL(1−2PL(1−e−L2P)),

Eqn. 1

where the relationship between P and R was derived from a mathematical analysis of worm-like chain bending along its contour. To minimize excluded volume effects from two interacting DNA segments in a long molecule, we synthesized two short DNA fragments that are 410 bp and 1100 bp long, and made the different cytosine variants for each length. For each DNA length, we deposited samples onto freshly cleaved mica under conditions that were optimized to achieve a uniform density of DNA molecules adsorbed on the surface. As indicated by Rivetti and co-workers, a low concentration of Mg2+ ions was used to adsorb the DNA, resulting in diffusion and equilibration of the DNA onto the mica surface31. Following the acquisition of a set of AFM images, we used Gwyddion software to measure R for a set of molecules of each type. In addition, we have measured L for a representative set of molecules in each sample by drawing polyline segments along DNA contours using ImageJ software. For the 410 bp samples, measurement of L values for 30–40 molecules of each kind yielded 137±2 nm, 138±3 nm, and 135±3 nm for C-DNA, mC-DNA, and hmC-DNA, respectively. These mean L values are in very close agreement with the expected contour length based on a helical B-form DNA pitch of 0.34 nm/bp. In contrast, R values were statistically different for each DNA type. By constructing histograms of R and fitting them to Gaussian distributions (see Figure 6), we find that there is a statistical difference between the mean R values, following the trend hmC < C < mC. These differences suggest that there are differences in DNA flexibility. Using Eqn. 1 to compute P for each sample, we find that the range of P values for each modification are (based on <R> values ± uncertainties for both DNA lengths): <PhmC-DNA> = 29–38 nm, <PC- DNA> = 41–49 nm, and <PmC-DNA> = 50–65 nm (see SI-7 for further statistical details). We note that the mean of our calculated P values for C-DNA is somewhat smaller than those found by Rivetti (52 nm). Smaller P values have been ascribed to DNA that adsorbs onto the mica in a partially trapped configuration, and are known to be sensitive to the sample preparation conditions32, 33. Other sources of deviation may be the presence of sequences that inherently contain more unstable basepairing interactions than ordinary DNA sequences, sequences that contain bends (e.g., A-tracts),33 or metrology errors due to tip-mediated artifacts. Therefore, using the same DNA sequence, deposition condition, and AFM tip for each set of molecules ensured that the differences in values of P between samples are trustworthy.

Our compiled data, both computational and experimental, suggest that increasing the polarity of cytosine modifications reduces the rigidity of a DNA fragment. These findings may be important in explaining epigenetic effects of modified DNAs on transcription factor binding or chromatin assembly, as well as revealing information on the packaging efficiency of certain viruses. This also may explain our ability to discriminate among hmC-DNA and mC-DNA using nanopores: in the temperature range of 10–30°C, hmC-DNA undergoes significant local duplex destabilization, which increases both the mean transport time and current amplitude of hmC-DNA. The exact mechanistic details of DNA transport through a nanopore are yet to be fully understood. MD simulations reveal that the dynamics of DNA translocation through a confined nanopore involves a complex interplay of DNA interactions with solvent, nanopore walls, ions as well as intra-DNA interactions26. Structurally, the 5′ position of cytosine faces outwards towards the grooves, leaving them accessible for interactions with the nanopore. In addition, our findings suggest that the relative stability of modified duplexes is an important factor. Therefore, it is plausible that the combination of hydrophilic pore walls, high electric field inside the pore, and duplex stability, contribute to structural perturbations of hmC-DNA in the nanopore. More simply stated, pulling hmC-DNA through our nanopore may stall transport by deforming or locally denaturing the duplex. Interactions with the hydrophilic nanopore interface may be augmented by contact with deformed DNA structures at the pore. Also, local denaturation of hmC-DNA may promote the presence of multiple DNA strands occupying the pore simultaneously (e.g., three denatured single strands). This may explain the deep current blockades, longer transport times, and more complex current signatures that we observe with hmC-DNA.

In conclusion, we have shown that select physical properties of DNA molecules with identical sequences are dependent on the cytosine modification polarity. These properties give rise to different ion current signatures for DNA molecules threaded through nanopores, and nanopore measurements of as little as a few hundred molecules is sufficient to uniquely distinguish mC-DNA from hmC-DNA. Further, different proportions of hmC in fragments containing C-DNA or mC-DNA can be quantified based on the ion current signatures. We used MD simulations to probe the molecular basis of our findings, revealing that polar cytosine modifications increase the flexibility of DNA by promoting solvent-mediated fluctuations in G-C basepairs. Further, AFM revealed that the mean end-to-end distance for the more polar hmC-DNA was significantly shorter than for C-DNA and mC-DNA, indicating an increased flexibility for hmC-DNA. Nanopore-based discrimination among hmC-DNA and mC-DNA is non-destructive, high-throughput, and sensitive. Provided a better understanding of DNA transport through nanopores, this approach may be able to map cytosine modification patterns directly in unamplified DNA fragments from living cells.

Materials and Methods

All DNA molecules in this paper were prepared by PCR using Phusion DNA polymerase (Finnzymes/NEB). The 3kb sequence was amplified from T4 genomic DNA. The 400 bp and 1100 bp samples were amplified from pBR322 plasmid (NEB). To verify that modified cytosines do not introduce mismatches, we sequenced all types of products following PCR amplification. To make DNA samples with mixed cytosine proportions, we have added different cytosine mononucleotide ratios in the PCR mix. Following PCR amplification, the percentage of hmC was qualitatively determined by digestion with a methylation dependent restriction enzyme (MspJI) (see Figure S1). The 3 kbp DNA products were subjected to a 1% agarose gel electrophoresis as shown in Figure S1A below. The PCR products were then incubated with MspJI modification-dependent restriction endonuclease. As demonstrated in Figure S1B, DNA with modified cytosines was digested, and the extent of digestion qualitatively correlates with the fraction of C with respect to hmC in the PCR mix. For further details regarding the preparation and characterization of the DNA molecules please see the Supplementary Information file.

The substrates for nanopore fabrication were 5×5 mm2 Si chips that have a 20 nm thick low-stress silicon nitride (SiN) film deposited on a 5-μm-thick thermally-grown SiO2 layer, used to reduce the electrical noise. Following removal of the underlying oxide layer, solid-state nanopores in the range 3.5 – 4.5 nm were fabricated and imaged in a JEOL 2010FEG TEM. The nanopore devices were cleaned using piranha solution, followed by copious water wash, assembly in the fluoropolymer cell using a homemade quick cure PDMS gasket, and immersion with 1M KCl + 1mM EDTA buffered to pH 8 using 10 mM tris-HCl. Our fluoropolymer cell accommodates volumes of 1–20 μl and features temperature regulation using a thermoelectric device connected to a copper block that houses the cell. Each chamber was equipped with a Ag/AgCl electrode. An Axopatch 200B patch clamp amplifier was used to apply voltage and measure current through the pore. The analog signal output was sampled at 250 kHz sampling rate using a 16-bit DAQ card (NI PCI-6230). Data were collected and analyzed using custom LabVIEW and Igor Pro software.

Supplementary Material

1_si_001

Acknowledgments

We thank Daniel Branton, Richard Roberts, and Philip Nelson for comments on the manuscript, and Drs. Anna Kalota and Alan Gewirtz for use of their RT-PCR instrument. The computations were supported by Teragrid and ICMS resources and the experimental work by NIH Grant R21HG004767 and New England Biolabs.