In addition to direct visualization files are downloadable to enable downstream analyses (differentially methylated region detection, etc.). Individual files in .csv format for each sequence context are downloadable from ExperimentView. These files comply with CoGe's LoadExperiment format for quantitative data and contain the following information:

In addition to direct visualization files are downloadable to enable downstream analyses (differentially methylated region detection, etc.). Individual files in .csv format for each sequence context are downloadable from ExperimentView. These files comply with CoGe's LoadExperiment format for quantitative data and contain the following information:

−

#CHR,POSITION,POSITION,STRAND,PERCENT METHYLATION(0-1),READ DEPTH

+

#CHR,POSITION,POSITION,STRAND,METHYLATION(0-1),DEPTH

Chromosome IDs will match those of the genome to which reads have been aligned, position is of each cytosine, percent methylation is expressed as a decimal between 0 and 1, and read depth at each methylation call will be shown as an integer (filtered for the minimum coverage specified during the analysis pipeline).

Chromosome IDs will match those of the genome to which reads have been aligned, position is of each cytosine, percent methylation is expressed as a decimal between 0 and 1, and read depth at each methylation call will be shown as an integer (filtered for the minimum coverage specified during the analysis pipeline).

The BAM alignment can be downloaded in ExperimentView. Sequence and quantitative information for the visible region can also be downloaded while browsing the genome.

The BAM alignment can be downloaded in ExperimentView. Sequence and quantitative information for the visible region can also be downloaded while browsing the genome.

Revision as of 00:39, 5 March 2016

Example of percent methylation data tracks displayed against other genomic features

CoGe can analyze bisulfite sequencing data and visualize percent methylation at single-cytosine resolution. Additionally, output filetypes can be easily converted to appropriate formats for automated differential methylation analysis. This analysis pipeline is a work in progress being developed by Jeffrey Grover and Matt Bomhoff at the University of Arizona.

See the LoadExperiment tool to use the new pipeline. Select BAM/FASTQ input files to make the pipeline available.

There are currently two separate analysis pipelines available. One based on Bismark and the other on bwameth.

What Is This For?

The tools in this pipeline are intended for use with whole genome bisulfite sequencing (WGBS). In theory many of them are also appropriate for reduced representation bisulfite sequencing (RRBS), but this is untested. Reads must be in fastq format with quality encoding either phred 33 or phred 64 scale, single or paired ended, there must also be an appropriate genome to compare against.

What Is This Not For?

Any other sequencing-based methylation analysis including, but not limited to, MeDIP-seq (methylated DNA immunoprecipitation sequencing), or array-based methylation analyses. Fastq files with quality encoding not in phred 33/64 (Illumina format), or generally, anything not mentioned in the preceding section. In some cases non Illumina formatted reads can be converted to the appropriate formats.

Note:

It is common for fastq files, especially ones downloaded from certain public sources or opened by users to have formatting abnormalities. Please check for common formatting problems, such as extra newlines or special characters, if the pipeline fails. If sequencing reads are contained in multiple files they should be concatenated before loading. Paired-end files must end with the identifier "R1" and "R2" respectively (e.g., sample1_R1.fastq and sample1_R2.fastq).

Workflow

Fastq files can either be loaded from a local machine or the iPlant datastore (recommended).

Reads should be trimmed if this hasn't been done already. Trim Galore! is a wrapper for fastqc and cutadapt available in CoGe that performs adapter trimming, quality-based, and length-based trimming. It will attempt to automatically detect adapter types from a source file but this can be overridden with a nucleotide string if desired.

The reference genome is indexed and in-silico bisulfite converted with either bismark_genome_preparation or bwameth's index functionality.

Depending on the selected pipeline either Bismark (using Bowtie2) or bwameth are used to align bisulfite-converted reads.

Methylation status is extracted with either bismark_methylation_extractor or PileOMeth.

Methylation summaries are then reformatted to .csv to comply with quantitative data loading in CoGe and are filtered by desired read depth (default: 5). This is then loaded into the genome browser as a viewable quantitative track. Top strand methylation is represented as positive numbers (0% to 100% as a decimal) and bottom strand methylation as negative numbers (same scale). When mousing over these quantitative tracks the decimal-formatted methylation % and read depth for that position will be displayed (in that order).

Options

More information is available in the individual tools' documentation. This is a summary of the options visible to users through CoGe.

--OT: inclusion regions for reads corresponding to the original top strand. Format A,B,C,D means include calls from position A to B on read 1 and C to D on read 2. (default: 0,0,0,0 - whole read)

--OB: inclusion regions for reads corresponding to the original bottom strand. Format A,B,C,D means include calls from position A to B on read 1 and C to D on read 2. (default: 0,0,0,0 - whole read)

These options are useful to remove methylation bias at the end of reads if necessary.

Deduplicate: Run bismark_deduplicate (Bismark pipeline) or picard tools (bwameth pipeline) to remove PCR duplicates. (default: off, but should be used in most cases)

Minimum Coverage: Minimum read depth to report methylation percentage and visualize. (default: 5, this should be a sane number for most applications but it's up to the user)

Output Files

In addition to direct visualization files are downloadable to enable downstream analyses (differentially methylated region detection, etc.). Individual files in .csv format for each sequence context are downloadable from ExperimentView. These files comply with CoGe's LoadExperiment format for quantitative data and contain the following information:

#CHR,POSITION,POSITION,STRAND,METHYLATION(0-1),DEPTH

Chromosome IDs will match those of the genome to which reads have been aligned, position is of each cytosine, percent methylation is expressed as a decimal between 0 and 1, and read depth at each methylation call will be shown as an integer (filtered for the minimum coverage specified during the analysis pipeline).

The BAM alignment can be downloaded in ExperimentView. Sequence and quantitative information for the visible region can also be downloaded while browsing the genome.