Principles and Workflow of Whole Genome Bisulfite Sequencing

Principles of whole genome bisulfite sequencing

Epigenetic studies have confirmed that DNA-methylation modification of specific gene regions plays an important role in chromosome conformation and gene expression regulation. Methylation of DNA cytosine residues at the C5 (5meC) is a common epigenetic mark in many eukaryotes and is widely found in CpG or CpHpG (H=A, T, C). There are mainly three approaches, including endonuclease digestion, affinity enrichment, and bisulfite conversion (Table 1). Almost all sequence-specific DNA methylation analysis approaches require a methylation-dependent treatment before amplification or hybridization to maintain fidelity. Various molecular biology techniques, such as next-generation sequencing (NGS), are subsequently performed to detect 5meC residues.

Table 1. Main principles of NGS-based methylation analysis.

Enzyme digestion

Affinity enrichment

Sodium bisulfite

Principles

Some restriction enzymes, such as HpaII and SmaI, are inhibited by 5meC in the CpG.

Bisulfite conversion spurred a revolution in genome methylation analysis in 1990s. Since bisulfite can convert un-methylated cytosines in the genome into uracils and then replaced by thymines during PCR amplification, which can be distinguished from the cytosine originally modified by methylation by counting cytosines and thymines for each position after sequencing (Figure 1). Whole genome bisulfite sequencing (WGBS), as a research method of great significance in this field, applies a combination of bisulfite treatment and next/third generation sequencing technologies (mostly, shotgun sequencing) to study DNA methylation at genomic level.

Firstly, approximately 1-5 mg of tissue samples collected from humans, animals, plants or microorganisms are prepared for DNA. In general, samples for whole-genome bisulfite sequencing need to meet the following four characteristics.

i. Eukaryotes;

ii. Hypomethylation (as shown in Figure 3, studies have shown that once the number of CpG sites in a region increases, the sequencing data of WGBS begins to decrease);

iii. Its reference genome has been assembled to the scaffold level at least;

iv. Relatively complete genome annotations. And then, apply a suitable kit to extract high-purity and high-molecular-weight DNA. The extracted DNA should have a mass of no less than 5 μg, a concentration of no less than 50 ng/ul, and an OD260/280 of 1.8 to 2.0.

Bisulfite conversion is considered to be the “gold standard” for DNA methylation analysis, the principles have been shown in Figure 4. For this method, BS-induced DNA degradation may lead to depletion of genomic regions enriched for unmethylated cytosines. Therefore, it is important to assess the amount of DNA degradation under reaction conditions, and how this affects the desired amplicon should also be considered. Olova et al. (2018) found that DNA degradation is strong in bisulfite conversion protocols that utilize high denaturation or high bisulfite molarity. There are several kits available in the market (Table 2).

Take the EpiGnomeTM Methyl-Seq Kit (Epicentre) as an example (as shown in Figure 5), bisulfite-treated single-stranded DNA is random-primed using a polymerase capable of reading uracil nucleotides, to synthesize DNA containing a specific sequence tag. The 3’ end of the newly synthesized DNA strand is then selectively labeled with a second specific sequence, thus a two-marker DNA molecular with a known sequence tag at the 5’ and 3’ ends can be obtained. Illumina P7 and P5 adapters are subsequently added by PCR at the 5 and 3 ends prior to DNA sequencing.

Figure 5. Workflow for the EpiGnomeTM Methyl-Seq Kit.

Sequencing

Hiseq sequencing technology, a novel sequencing method based on sequencing-by-synthesis (SBS), is widely applied for WGBS. The bridge amplification on a flow cell is achieved by using a single molecule array. Since the new reversible blocking technique can synthesize only one base at a time and label the fluorophore, the corresponding laser is used to excite the fluorophore, and the excitation light can be captured to read the base information. Paired-end 150 bp strategy is typically employed in WGBS to sequence 250-300 bp insertion bisulfite-treated DNA libraries. In addition to Illumina HiSeq, PacBio SMRT, Nanopore, Roche 454, and other Illumina platforms are also commonly used for this purpose.

Data Analysis

A series of analyses can be performed for the sequencing results. Five main types of information analysis are listed in Table 3. In addition, methylation density analysis, differentially methylated region (DMR) analysis, DMR annotation and enrichment analysis (GO/KEGG) and clustering analysis can also be performed. The common bioinformatic resources of WGBS include BDPC, CpGcluster, CpGFinder, Epinexus, MethTools, mPod, QUMA, and TCGA Data Portal.

Table 3. Main types of WGBS data analysis.

Type

Details

Alignment against reference genome

Tools, such as SOAP software, are used to compare the reads with the reference genome sequence, and only the aligned reads will be used for the analysis of methylation information. Align reads allowing C-C matches and C-T mismatches.

mC calling

Determine mC position throughout the genome. mC ratios are computed by considering read quality and multi-locus mapping probabilities. Discard small-probability alignment that has a low reliability of alignment.

Sequence depth and coverage analysis

An image reflecting the relationship between gene coverage and sequencing depth determines whether methylation discovery can be made with a certain degree of confidence at specific base positions.

Methylation level analysis

The methylation level of each methylated C base is calculated as follows: 100*reads/total reads. The genome-wide average methylation level reflects the overall characteristics of the genomic methylation profile.

Global trends of methylome

The distribution ratio of CG, CHGG and CHH in methylated C bases reflects the characteristics of whole genome methylation maps of specific species to some extent.