Other sites

Restriction digestion of eukaryotic genomes in R

[This article was first published on Chitka, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There are multiple desktop tools (Bioedit, Emboss, various basic bioinformatic tools) and browser based tools (NEBcutter, biotools, In silico restriction digestion and various other tools) available for performing restriction digestion of smaller prokaryotic genomes or smaller eukaryotic chromosomes individually. However, no desktop tool or browser based tool is available for free use to perform restriction digestion on the whole genome of eukaryotes with larger genomes. Tools like Emboss handles this sort of task but in a primitive way that is not helpful for the downstream analysis of the results right away.

I have been working on methylation analysis using RRBS method. This method is based on the restriction digestion pattern of the enzyme MspI on the whole genome. I wanted to perform in silico digestion of MspI on mouse genome(mm10) to virtually see the pattern of digestion. After scanning the web finally I narrowed down on a bioconductor package “Biostrings” that helped me achieve this task. Here, I give the code to perform this task. Since the package I used is an R package, it also helped me perform a variety of downstream analysis pretty fast.

This method is based on the ability of the “Biostrings” package to recognize the MspI restriction site (CCGG) on the mouse genome (BSgenome.Mmusculus.UCSC.mm10 bioconductor package loaded into R). Following tasks are peformed by the script below:

Load the needed bioconductor and R packages

Identify the MspI restriction sites (genomic co-ordinates) per chromosome in the genome.

Extract the start and end co-ordinates of the dna fragments resulting from the genomic digestion (using gaps)

Create a dataframe of the genomic co-ordinates of the digested fragments fro each chromosome for easier downstream analysis