Affiliation: Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York, USA.

AUTOMATICALLY GENERATED EXCERPT

Please rate it.

The distribution of CNVs in mammalian genomes is nonrandom, and several sequence features have been associated with CNV breakpoints and regions of high structural mutability –... Based on an analysis of DNA methylation patterns in human sperm, Li et al. recently reported a significant relationship between CNVs and hypomethylation in the male germline, leading to the suggestion that DNA hypomethylation plays a causative role in the generation of structural variation... Given the potentially profound implications of this report for the study of human disease, we read the findings of Li et al. with great interest... However, after systematically reanalyzing the relationship between CNVs and DNA methylation patterns in sperm, we have identified several cryptic confounders in the data that we believe seriously undermine the conclusions of Li et al... They then applied two independent methods to estimate germline DNA methylation within each window: (i) directly using published whole genome 15× bisulfite sequencing of sperm DNA and a second low coverage 2.5× dataset, and (ii) indirectly by calculating a Methylation Index (MI) based on the relative occurrence of C>T SNPs defined by the HapMap project... Indeed, after removing all 100 kb windows that contain satellites or contain >99 percentile by LINE, SINE, LTR, or total repeat content, we observed that in every dataset analyzed, enrichments for CNVs in windows with the lowest 1% mean methylation either significantly diminished or disappeared completely (Figure 1b)... We next considered the influence of problems associated with mapping reduced-complexity bisulfite reads in duplicated regions of the genome... Thus, by measuring the relative occurrence of C>T SNPs within CpG dinucleotides (termed “mSNPs”), it is possible to draw inferences about the ancestral methylation state of a region... However, SNP-based studies of structural variation are often compromised due to the fact that many CNV regions show significantly reduced SNP density compared to the genome average (median density of HapMap SNPs within HapMap CNVs is 1 per 1,087 bp, compared to 1 per 738 bp genome-wide)... This stems largely from the fact that ∼98% of HapMap SNP assays map uniquely within the genome, resulting in markedly reduced SNP density in duplicated portions of the genome, precisely those regions that are also enriched for CNVs, ,... As a result, there is a strong confounding relationship between CNV regions and low SNP density that renders the use of a SNP-based MI inherently flawed for studies of structural variation... Taking a more direct approach, we used published 15× sperm bisulfite sequencing data to calculate mean methylation per base both within and flanking 5,360 nonredundant HapMap CNVs <20 kb in size (mean CNV size 3,789 bp) (Figure 2a)... Although we observed a small decrease in methylation levels within CNVs compared to flanking regions, overall CNV regions have consistently high levels of methylation (mean 69%) that are only slightly lower than the genome average (70%)... In summary, we identify multiple strong confounders in the study of Li et al. that in our opinion cast serious doubt on the notion that germline hypomethylation is causally related to structural mutability.

pgen-1003332-g002: Global assessment of methylation levels and confounders contributing to hypomethylation in common CNV regions.(a) Mean methylation levels and (b) mean CpG density per base within and flanking 5,360 nonredundant HapMap CNVs. To directly assess the relationship between DNA methylation and structural variation, we used published 15× bisulfite sequencing data [10] to calculate mean methylation per base both within and flanking a high-quality set of HapMap CNVs [7]. We first merged 8,599 CNVs defined by Conrad into 6,142 nonredundant regions, and then removed those <20 kb in size to form a filtered set of 5,360 nonredundant regions (mean size, 3,789 bp). A 100 kb window was then centered on the midpoint of each CNV, and mean methylation levels and CpG count per base in these 100 kb windows were calculated using 15× sperm bisulfite sequencing data [10]. Each plot shows a 100 bp moving average. Although a small decrease in methylation level is evident within CNVs compared to flanking regions, overall mean methylation levels within CNV regions (69%) are very similar to the genome average (70%). Furthermore this dip in methylation corresponds precisely with an increase in CpG density and an enrichment for CGIs within CNVs. As most CGIs are unmethylated in sperm [10], [17], this fact likely accounts for the small overall reduction in methylation levels associated with CNVs. (c) Regions classified as “methylation deserts” by Li et al. represent an extremely nonrandom subset of the genome that is highly enriched for common repeats and preferential mapping of bisulfite reads to CpG islands. We classified all 100 kb windows defined by Li et al. based on their content of common repeats and fraction of CpGs assayed that map within ±2 kb of CGIs. One hundred and eighty-three of the 285 (64%) windows that were classified as “methylation deserts” by Li et al. are >95th percentile based on satellite, LINE, or LTR content and/or the 99th percentile based on total repeat content. A further 80 windows (28%) are >95th percentile based on the fraction of CpGs assayed within them that map to CGIs or shores. Overall, only 22 of 285 (8%) windows defined by Li et al. as “methylation deserts” do not show extremes of repeat content or highly biased sampling of CpG islands. In contrast, in the rest of the genome, 84% of windows do not overlap any of these categories. Furthermore, windows that overlap a high-quality dataset of HapMap CNVs [7] show a repeat content and proportion of reads mapping to CGIs similar to the genome average. Thus, the set of regions defined as “methylation deserts” by Li et al. represent an extreme fraction of the genome that is likely to be highly enriched for unusual epigenetic and structural features.

Mentions:
Finally we believe that the approach used by Li et al. in which the genome was first partitioned into 100 kb intervals before associating windows containing CNVs with average methylation levels is poorly suited to address the question in mind, suffering from low resolution and an increased susceptibility to artifacts. Taking a more direct approach, we used published 15× sperm bisulfite sequencing data [10] to calculate mean methylation per base both within and flanking 5,360 nonredundant HapMap CNVs <20 kb in size (mean CNV size 3,789 bp) (Figure 2a) [7]. Although we observed a small decrease in methylation levels within CNVs compared to flanking regions, overall CNV regions have consistently high levels of methylation (mean 69%) that are only slightly lower than the genome average (70%). Furthermore this slight dip in CNV methylation corresponds precisely with an increase in CpG density and an enrichment for CGIs within CNVs (CGIs comprise 1.1% of CNVs compared to 0.75% genome-wide, a 1.4-fold difference; Figure 2b). As most CGIs are unmethylated in sperm [10], [17], this fact alone is likely to account for the small overall reduction in methylation levels associated with CNVs.

pgen-1003332-g002: Global assessment of methylation levels and confounders contributing to hypomethylation in common CNV regions.(a) Mean methylation levels and (b) mean CpG density per base within and flanking 5,360 nonredundant HapMap CNVs. To directly assess the relationship between DNA methylation and structural variation, we used published 15× bisulfite sequencing data [10] to calculate mean methylation per base both within and flanking a high-quality set of HapMap CNVs [7]. We first merged 8,599 CNVs defined by Conrad into 6,142 nonredundant regions, and then removed those <20 kb in size to form a filtered set of 5,360 nonredundant regions (mean size, 3,789 bp). A 100 kb window was then centered on the midpoint of each CNV, and mean methylation levels and CpG count per base in these 100 kb windows were calculated using 15× sperm bisulfite sequencing data [10]. Each plot shows a 100 bp moving average. Although a small decrease in methylation level is evident within CNVs compared to flanking regions, overall mean methylation levels within CNV regions (69%) are very similar to the genome average (70%). Furthermore this dip in methylation corresponds precisely with an increase in CpG density and an enrichment for CGIs within CNVs. As most CGIs are unmethylated in sperm [10], [17], this fact likely accounts for the small overall reduction in methylation levels associated with CNVs. (c) Regions classified as “methylation deserts” by Li et al. represent an extremely nonrandom subset of the genome that is highly enriched for common repeats and preferential mapping of bisulfite reads to CpG islands. We classified all 100 kb windows defined by Li et al. based on their content of common repeats and fraction of CpGs assayed that map within ±2 kb of CGIs. One hundred and eighty-three of the 285 (64%) windows that were classified as “methylation deserts” by Li et al. are >95th percentile based on satellite, LINE, or LTR content and/or the 99th percentile based on total repeat content. A further 80 windows (28%) are >95th percentile based on the fraction of CpGs assayed within them that map to CGIs or shores. Overall, only 22 of 285 (8%) windows defined by Li et al. as “methylation deserts” do not show extremes of repeat content or highly biased sampling of CpG islands. In contrast, in the rest of the genome, 84% of windows do not overlap any of these categories. Furthermore, windows that overlap a high-quality dataset of HapMap CNVs [7] show a repeat content and proportion of reads mapping to CGIs similar to the genome average. Thus, the set of regions defined as “methylation deserts” by Li et al. represent an extreme fraction of the genome that is likely to be highly enriched for unusual epigenetic and structural features.

Mentions:
Finally we believe that the approach used by Li et al. in which the genome was first partitioned into 100 kb intervals before associating windows containing CNVs with average methylation levels is poorly suited to address the question in mind, suffering from low resolution and an increased susceptibility to artifacts. Taking a more direct approach, we used published 15× sperm bisulfite sequencing data [10] to calculate mean methylation per base both within and flanking 5,360 nonredundant HapMap CNVs <20 kb in size (mean CNV size 3,789 bp) (Figure 2a) [7]. Although we observed a small decrease in methylation levels within CNVs compared to flanking regions, overall CNV regions have consistently high levels of methylation (mean 69%) that are only slightly lower than the genome average (70%). Furthermore this slight dip in CNV methylation corresponds precisely with an increase in CpG density and an enrichment for CGIs within CNVs (CGIs comprise 1.1% of CNVs compared to 0.75% genome-wide, a 1.4-fold difference; Figure 2b). As most CGIs are unmethylated in sperm [10], [17], this fact alone is likely to account for the small overall reduction in methylation levels associated with CNVs.

Affiliation:
Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York, USA.

AUTOMATICALLY GENERATED EXCERPT

Please rate it.

The distribution of CNVs in mammalian genomes is nonrandom, and several sequence features have been associated with CNV breakpoints and regions of high structural mutability –... Based on an analysis of DNA methylation patterns in human sperm, Li et al. recently reported a significant relationship between CNVs and hypomethylation in the male germline, leading to the suggestion that DNA hypomethylation plays a causative role in the generation of structural variation... Given the potentially profound implications of this report for the study of human disease, we read the findings of Li et al. with great interest... However, after systematically reanalyzing the relationship between CNVs and DNA methylation patterns in sperm, we have identified several cryptic confounders in the data that we believe seriously undermine the conclusions of Li et al... They then applied two independent methods to estimate germline DNA methylation within each window: (i) directly using published whole genome 15× bisulfite sequencing of sperm DNA and a second low coverage 2.5× dataset, and (ii) indirectly by calculating a Methylation Index (MI) based on the relative occurrence of C>T SNPs defined by the HapMap project... Indeed, after removing all 100 kb windows that contain satellites or contain >99 percentile by LINE, SINE, LTR, or total repeat content, we observed that in every dataset analyzed, enrichments for CNVs in windows with the lowest 1% mean methylation either significantly diminished or disappeared completely (Figure 1b)... We next considered the influence of problems associated with mapping reduced-complexity bisulfite reads in duplicated regions of the genome... Thus, by measuring the relative occurrence of C>T SNPs within CpG dinucleotides (termed “mSNPs”), it is possible to draw inferences about the ancestral methylation state of a region... However, SNP-based studies of structural variation are often compromised due to the fact that many CNV regions show significantly reduced SNP density compared to the genome average (median density of HapMap SNPs within HapMap CNVs is 1 per 1,087 bp, compared to 1 per 738 bp genome-wide)... This stems largely from the fact that ∼98% of HapMap SNP assays map uniquely within the genome, resulting in markedly reduced SNP density in duplicated portions of the genome, precisely those regions that are also enriched for CNVs, ,... As a result, there is a strong confounding relationship between CNV regions and low SNP density that renders the use of a SNP-based MI inherently flawed for studies of structural variation... Taking a more direct approach, we used published 15× sperm bisulfite sequencing data to calculate mean methylation per base both within and flanking 5,360 nonredundant HapMap CNVs <20 kb in size (mean CNV size 3,789 bp) (Figure 2a)... Although we observed a small decrease in methylation levels within CNVs compared to flanking regions, overall CNV regions have consistently high levels of methylation (mean 69%) that are only slightly lower than the genome average (70%)... In summary, we identify multiple strong confounders in the study of Li et al. that in our opinion cast serious doubt on the notion that germline hypomethylation is causally related to structural mutability.