Abstract

Nucleosome organization has been suggested to affect local mutation rates in the genome. However, the lack of de novo mutation and high-resolution nucleosome data has limited the investigation of this hypothesis. Additionally, analyses using indirect mutation rate measurements have yielded contradictory and potentially confounding results. Here, we combine data on >300,000 human de novo mutations with high-resolution nucleosome maps and find substantially elevated mutation rates around translationally stable ('strong') nucleosomes. We show that the mutational mechanisms affected by strong nucleosomes are low-fidelity replication, insufficient mismatch repair and increased double-strand breaks. Strong nucleosomes preferentially locate within young SINE/LINE transposons, suggesting that when subject to increased mutation rates, transposons are then more rapidly inactivated. Depletion of strong nucleosomes in older transposons suggests frequent positioning changes during evolution. The findings have important implications for human genetics and genome evolution.

a Schematic diagram describing two nucleosome positioning-related variables (dmean and dvar) relative to a given genomic position. b Classifying the genome into five equal portions by dvar and calculating the SNV densities. c, d Independent statistical significance of potential contributing factors to mutation rate variation, having controlled for other factors; b for SNVs and c INDELs. Tests for SNVs were performed separately at A/T and C/G sites (non-CpG and CpG contexts, respectively). Vertical red lines indicate the threshold for statistical significance (0.05). The p values were from the likelihood-ratio tests and were adjusted for multiple testing with Benjamini–Hochberg correction. us upstream; ds downstream. Source data are provided as a Source Data file.

a Frequencies of 96 mutation types among de novo SNVs; six nucleotide substitutions in the context of the bases immediately 5′ and 3′ of the mutated site. SNVs are grouped into those overlapping strong nucleosomes and those elsewhere, and among the former into those overlapping with different classes of repeat elements. ↑ and ↓ indicate mutation types showing statistically significant differences relative to the genomic background SNV set (adjusted p < 0.05, two-sided Fisher’s exact test). b Percentage contribution of COSMIC mutational signatures among different groups of SNVs; only signatures with nonzero values are shown. Asterisks indicate mutational signatures displaying >1% increase relative to the genomic background SNV set. Brief summaries of the aetiologies of affected signatures are shown on the right (descriptions taken from the COSMIC website). Source data are provided as a Source Data file.

a Mutation density profiles relative to strong-nucleosome dyads in cancer genomes harboring driver mutations in the POLE and MMR pathway genes. Numbers of mutations used are indicated in the brackets. The MMR escape ratio compares the mutation densities in the MMR-proficient and MMR-deficient genomes. b Mutation density profiles relative to strong-nucleosome dyads for bMMRD cancer genomes with different driver mutation statuses in the POLE and POLD1 genes. The escape ratios compare the mutation densities for Pol ε-deficient and Pol δ-deficient cancers with the proficient ones. c END-seq signal indicating the density of DSBs relative to strong-nucleosome dyads. HU hydroxyurea. Two-sided Fisher’s exact test was used for testing the association of strong nucleosomal regions (dyad ± 95 bp) with differential MMR/polymerase performance. Source data are provided as a Source Data file.

Fig. 5. Strong nucleosomes are enriched for…

Fig. 5. Strong nucleosomes are enriched for evolutionarily young LINE and SINE elements.

2

a Fold…

Fig. 5. Strong nucleosomes are enriched for evolutionarily young LINE and SINE elements.

a Fold enrichment of strong-nucleosome occurrence in L1 subfamilies. The top 30 abundant subfamilies are shown ordered by evolutionary age. Dot sizes depict the numbers of strong nucleosomes and color-scale indicates the subfamily age. b Same as a but for Alu elements (only major subfamilies with age information presented here). c Densities of strong-nucleosome dyads, de novo SNVs, and de novo INDELs along the Alu sequences and flanking regions, grouped by Alu subfamilies of different ages. Bar plots show the average densities for all Alus of different subfamilies on the right, with dots representing the values of Alu bins in the left meta-profile panels. The bottom panel shows the average DNA deformation energies along Alu sequences estimated using nuScore. Profiles were plotted using Alu elements ≥250 bp and all elements were scaled up to a 300 bp region in the plots. Source data are provided as a Source Data file.