Abstract

Metazoan genomes are spatially organized at multiple scales, from packaging of DNA around individual nucleosomes to segregation of whole chromosomes into distinct territories. At the intermediate scale of kilobases to megabases, which encompasses the sizes of genes, gene clusters and regulatory domains, the three-dimensional (3D) organization of DNA is implicated in multiple gene regulatory mechanisms, but understanding this organization remains a challenge. At this scale, the genome is partitioned into domains of different epigenetic states that are essential for regulating gene expression. Here we investigate the 3D organization of chromatin in different epigenetic states using super-resolution imaging. We classified genomic domains in Drosophila cells into transcriptionally active, inactive or Polycomb-repressed states, and observed distinct chromatin organizations for each state. All three types of chromatin domains exhibit power-law scaling between their physical sizes in 3D and their domain lengths, but each type has a distinct scaling exponent. Polycomb-repressed domains show the densest packing and most intriguing chromatin folding behaviour, in which chromatin packing density increases with domain length. Distinct from the self-similar organization displayed by transcriptionally active and inactive chromatin, the Polycomb-repressed domains are characterized by a high degree of chromatin intermixing within the domain. Moreover, compared to inactive domains, Polycomb-repressed domains spatially exclude neighbouring active chromatin to a much stronger degree. Computational modelling and knockdown experiments suggest that reversible chromatin interactions mediated by Polycomb-group proteins play an important role in these unique packaging properties of the repressed chromatin. Taken together, our super-resolution images reveal distinct chromatin packaging for different epigenetic states at the kilobase-to-megabase scale, a length scale that is directly relevant to genome regulation.

A unique pair of index primers are used in a PCR reaction to selectively amplify the templates for the probe set of interest from a complex pool of custom, array-derived oligonucleotides. These templates are then amplified and converted to RNA in an in vitro transcription reaction. The RNA products are converted back to DNA in a reverse-transcription reaction using a primer labelled with an activator dye, Alexa 405, which incorporates the dye into the resulting single-stranded DNA probe. Finally, a 32-nt oligonucleotide attached to Alexa 647 is hybridized to all of the probes. The photoswitchable dye, Alexa 647, is used for STORM imaging. The activator dye, Alexa 405, facilitates the 405-nm light induced reactivation of the Alexa 647 dye.

a, Top panels: Example images of DNA in a Kc167 cell, visualized with the viable DNA dye Hoechst 33342, both in the live cell before fixation and in the same cell after applying our fixation buffer (osmotically balanced methanol-free formaldehyde in PBS). Bottom panels: Same as the top panels but for a Kc167 cell before and after fixation with methanol, a fixative that is known to cause a shrinkage effect. b, Quantifications of the distances between chromatin features in live and fixed cells. Corresponding chromatin features were identified in the live and corresponding fixed cell images through scale-invariant feature transform (SIFT) registration40. We measured distances between pairs of identified SIFT features in each cell, and calculated the ratio between the median inter-feature distances before and after fixation for each cell. Plotted here are the histograms of ratios determined from many cells for fixation with our osmotically balanced methanol-free formaldehyde fixation buffer (magenta), for a “mock fixed” condition in which the growth media was replaced with fresh media without any fixation reagent (cyan), and for fixation with methanol (red), n ≈ 80 cells in each cases. The average ratios are 1.009 ± 0.003 and 1.008 ±0.003 for fixation with our fixation buffer and the mock fixation, respectively, indicating a lack of shrinkage effect. In contrast, the average ratio for the methanol-fixation case (0.868 ± 0.005) is appreciably less than one, indicating a chromosome shrinkage induced by methanol. c, STORM images of TRF1-mMaple3 labelled telomeres in live and fixed HEK293 cells. mMaple3 is a photoactivatable fluorescent protein42. Cells are fixed with our osmotically balanced methanol-free formaldehyde in PBS. Two examples of telomere STORM images are shown for each condition. d, Quantifications of the radius of gyration of the telomeric domains in live and fixed cells. We determined the radius of gyration for each telomere structure and plotted here are the histograms of the radii of gyration across ~150 telomeres from ~30 cells for live (Cyan) and fixed (Magenta) cells. The average radius of gyration is 77 ± 3 nm for live cells and 78 ± 2 nm for fixed cells, again indicating that there is no significant chromatin shrinkage effect upon fixation. The telomere size measurement is not limited by our image resolution with mMaple3 (~30 nm).

Volume, radius of gyration and other shape characteristics for chromatin domains of various domain lengths in three different epigenetic states

a, Scheme of Drosophila chromosomes (X, 2L, 2R, 3L, and 3R) with the position of the imaged epigenetic domains marked (Red: active domains A-01 to A-23; Black: inactive domains I-01 to I-14; Blue: repressed domains R-01 to R-11). b, Log-log plot of the median domain volume as a function of domain contour length reproduced from but with the domain ID labelled. c, As in but plotted on a linear-linear scale. d, Linear plot of the median radius of gyration as a function of domain contour length. e, Coefficient of variation (CV) in density per voxel for all domains as a function of domain length. CV in density is defined as the ratio of the standard deviation of density to the average density within the domain-occupied volume, which characterizes how uniformly the chromatin is distributed in space within these domains (). f, Ratio of surface area to volume2/3 for all domains as a function of domain length. This surface-to-volume parameter characterizes the complexity of the physical shapes taken by the domains in 3D (). Error bars represent 95% confidence intervals derived from resampling.

Conventional images of chromatin domains and domain volume characterization based on conventional images

a, Blow-up view of the conventional images of chromatin domains shown in . The left column shows the raw conventional, wide-field images, with pixel size defined by our camera. The right column shows the corresponding anti-aliased and de-noised images. b, Quantification of the median domain volume determined from conventional images (foreground symbols), overlaid on the median volume determined from STORM data plotted in (faint background symbols and lines). Error bars represent 95% confidence intervals derived from resampling. Note that the conventional images may not only cause an artificial increase in domain size, especially severe for those domains whose physical sizes are smaller than the image resolution, but can also lead to an apparent decrease in domain size in some cases when the thin protrusions was too dim to detect by conventional imaging.

Distributions of domain volume and radius of gyration of different epigenetic domains and subdomains over all imaged cells

a, Histograms of domain volume for all imaged cells for each of the domains shown in . Red: active domains; Black: inactive domains; Light blue: repressed domains. The domain IDs are indicated in the upper right corner of each plot. The x-axis (volume) range has been adjusted for each domain to ensure the readability of the histogram. b, Histograms of the radius of gyration for each of the imaged domains in all cells. c, Histograms of the radius of gyration for subdomains of active (red), inactive (black) and repressed (blue) chromatin, shown in and , for all imaged cells. The subdomain IDs are indicated in the upper right corner of each plot.

a, Quantification of relative change in gene expression by qPCR (mean +/− s.e.m., n = 3 biological replicates) upon ph-p and ph-d double knockdown. Grey bars: Expression fold change of ph-p and ph-d upon the double knockdown. Light blue bars: Expression fold change of five Polycomb target genes, Ubx, Abd-B, Dfd, Antp, en. Red bars: Expression fold change of three control genes, Act5c, alphatub84b and Gapdh1, that are not targeted by Polycomb. Expression fold change was determined as the ratio between the signal detected in ph-p and ph-d double knockdown cells and that detected in wild-type cells. b, Average expression fold change upon ph-p and ph-d double knockdown for all genes in all of the active (red), inactive (black) and repressed (light blue) domains included in our study. The expression fold change is defined as the ratio of expression level measured in Ph-knockdown cells to that measured in the wild-type control cells determined by next generation RNA sequencing (mean ±s.e.m., n = 45, 89 and 532 genes for Repressed, Inactive and Active domains, respectively, 2 biological replicates). Expression level was measured in units of read fragments per kilobase per million reads (FPKM). Note, some genes (9 from Repressed regions, 11 from Inactive regions and 2 from Active regions) are excluded from the average expression fold change calculation because they received zero counts in the wild-type control cells. c, Example images of the R-10 domain (Bithorax complex) in wild-type (left) and Ph-knockdown (right) cells. d, Radius of gyration vs. domain length for subdomains of R-10 in wild-type cells (solid green triangles) and Ph-knockdown cells (hollow green triangles).

Locus-to-locus variation observed for the three types of epigenetic domains after normalization based on the observed scaling law over domain length

a, the normalized volume for domains of active (left), inactive (middle), and repressed (right) chromatin. Normalized volume is defined as the ratio of median volume of the domain to the expected volume calculated from the power-law scaling fits shown in . Error bars represent 78% confidence intervals, such that there is a less than 5% chance that domains with non-overlapping error bars are not distinct. b, Volcano plots of the relative differences in volume between all pairs of active domains (left), inactive domains (middle) or repressed domains (right), after the normalization shown in (a). Each data point represents one pair of domains with their ratio of the normalized volumes plotted on the x-axis and the p-value of their normalized volume difference plotted on the y-axis. The dashed line is at a p-value of 0.05. All dots above this line represent pairs of domains in which the normalized volume of one domain is statistically distinguishable that of the other domain. c, Standard deviation of the normalized volumes for each domain type. Error-bars represent 95% confidence intervals.

Additional factors correlating with the domain volume after normalizing the effect of domain length for active domains

To normalize for the effect of domain length, we determined the percent deviation of the volumes of active domains from the power law scaling trendline shown in , and hereafter refer to this value as percent deviation from trendline. a, Correlation of the percent deviation from trendline with the binding density of the insulator proteins BEAF32 (left) and CTCF (right). Binding density was determined from the density of peaks per kb in Dam-ID data. Peaks were defined as local maxima at least 2 standards deviation above the mean. b, As in (a) but for correlation with transcription start site (TSS) density (left) and RNA-seq total read density (right). The TSS density is defined as the average number of TSSs per kb in the domain and the RNA-seq total read density is defined as the total number of reads mapping to the domain measured using RNA sequencing divided by the domain length in kb. c, Pearson Correlation coefficients and corresponding p-values for the correlation of percent deviation from trendline with the indicated genomic factors. Average gene expression is referring to the average expression value in FPKM (read fragments per kilobase per million reads) of all genes in the domain. Maximum gene expression is referring to the FPKM of the most highly expressed gene in the domain. Su(Hw) is an insulator protein like BEAF32 and CTCF. We noticed a weak trend in which domains with higher binding densities of the insulators BEAF32 or Su(Hw) are slightly more compact. Although this trend is consistent with the hypothesis that insulator proteins may function as loop forming factors and that loops may lead to more compact domains, the correlation detected here was not statistically significant. Further analysis with improved sensitivity in detection of BEAF32 or Su(Hw) binding sites might uncover a stronger affect, so our data do not rule out the insulator loop hypothesis. Similarly, we caution that the positive correlation observed with the density of CTCF binding sites might reflect the preference of CTCF to bind open chromatin regions (such as enhancers and promoters), and does not necessarily suggest that CTCF binding induces a more open chromatin state.

Chromatin in different epigenetic states exhibits distinct packaging and power-law scaling

a, Enrichment profile of H3K4me2 (red), H3K27me3 (light blue) and unmodified H3 (black) in three genomic regions, each harbouring an example active, inactive or repressed domain (indicated by brackets). Marker enrichment, as defined in , was determined from ChIP-seq data. b, 3D-STORM images of the three distinct epigenetic domains in (a), labelled by in situ hybridization with DNA probes conjugated to the photoswitchable dye Alexa-647, shown with their corresponding conventional images in the inset. Each epigenetic domain appears as a single region in nearly all cells due to homologous pairing in the tetraploid Kc167 cells. c, Log-log plot of the median domain volume as a function of domain length for active (red solid circles), inactive (black solid circles) and repressed (light blue solid circles) domains, as well as for repressed domains in Ph-knockdown cells (light blue hollow circles). Error bars represent 95% confidence intervals derived from resampling (n ≈ 50 cells). The lines indicate power-law fits, with the scaling exponent b shown in the legend. d, as in (c) but for the radius of gyration as a function of domain length with the scaling exponent c shown in the legend.

a, Marker enrichment profile of three genomic regions with the example epigenetic domains marked by brackets and imaged subdomains marked by green and magenta lines. b, Linear plot of the radius of gyration as a function of the subdomain length (green symbols), compared to those for the whole domain data (red, black or light blue circles), for active (left panel), inactive (middle panel) and repressed chromatin (right panel). Different green symbols (triangle and squares) represent subdomains of two different parent domains. Power-law fits of subdomains (green solid lines) and whole domains (red, black and light blue dashed lines) are shown with the scaling exponent c given in the legends. The green lines in the right panel are to guide the eye. c, Two-colour, 3D-STORM images of example pairs of subdomains within active (left), inactive (middle) and repressed (right) domains. Portions of the two subdomains that overlap in 3D are shown in white. The two subdomains are labelled with Alexa-647 and Alexa 750 tagged DNA probes, respectively. d, Quantification of overlap fraction between the subdomains for active (red), inactive (black), or repressed (light blue) chromatin (). Error bars represent 95% confidence intervals derived from resampling (n ≈ 50 cells).

Computational modelling of chromatin packaging for inactive and repressed domains

a, Radius of gyration of polymer domains as a function of the domain length for simulated polymers confined in a small volume to emulate inactive chromatin. Error bars indicate 95% confidence intervals derived from resampling (n ≈ 20 simulations). The line indicates the power-law fit with the scaling exponent c = 0.33. b, Radius of gyration of subdomains of a parent inactive domain (green triangles) compared to the whole-domain scaling data (black circles) with power-law fits and scaling exponents shown. c, Radius of gyration as a function of the domain length for the simulated sticky polymer domain, embedded in a non-stick, confined polymer, to emulate repressed chromatin. The light blue line represent the power-law fit with the scaling exponent c = 0.22. The black dotted line indicates the point at which the closest possible packing of monomers is reached, causing c to deviate from 0.22 and approach 0.33 beyond this point. c also deviates from 0.22 at the small monomer number end because such a short polymer chain cannot sufficiently bend. d, Radius of gyration of the subdomains (green triangles) of the parent sticky domain in comparison with the whole-domain scaling data (light blue circles). e, Snapshots of simulations showing adjacent subdomains in the inactive (non-sticky) chromatin, adjacent subdomains of the repressed (sticky) chromatin, and adjacent repressed and inactive chromatin domains. f, Quantification of the overlap fraction () between the adjacent polymer regions illustrated in (e).