AB

CZ

Feb 10, 2016

Filled StarFilled StarFilled StarFilled StarFilled Star

I learned a lot from this class and the instructor is good.

Na lição

Week One

The class will cover how to install and use Bioconductor software. We will discuss common data structures, including ExpressionSets, SummarizedExperiment and GRanges used across several types of analyses.

Ministrado por

Kasper Daniel Hansen, PhD

Assistant Professor, Biostatistics and Genetic Medicine

Transcrição

Hi and welcome to an overview of GRanges. In this and the following videos, we are going to discuss GRanges, which is a data structure for storing genomic intervals in R. The key thing about GRanges is that, they're fast and efficient. And I think, it's fair to say that thry have completely transformed my own work. This is data structures that I use almost every day at work with genomic data. In my opinion, every R user dealing with genomic data needs to master these data structures. And the functionality, provided by these data structures in order to facilitate their own work. So, the key insight is that many integers in genomics can be thought of intervals or perhaps sets of intervals of integers. So here's a screenshot from the UCSC genome browser. It's a somewhat randomly chosen gene. And we can see that genes intervals. They are DNA's clusters. They are SNPs. They are repeat mask regions of the genome. And basically many, many, many integers in genomics can be thought of as intervals. Promoters, Genes, single-nucleotide polymorphism, which are really intervals, but consist of only a single base, CpG Islands. But also, data, such as next-generation sequencing reads after they've been mapped. Once a reader's been mapped to the genome, it's an interval. All sequence data that has been processed is at fault. Perhaps, you have done a gypsy experiment and you've done pcoding and you end up with peaks that are often described as intervals with some score associated to them. The course may do objects, can be thought of in intervals. Many tasks in genomics involves relating sets of intervals to each other. For example, question subsets, which promoters contain SNPs? Which transcription facts are bindings sites overlap a promoter? Which genes are covered by sequencing reads? These are all a task that we do again and again. And conceptually, it 's about relating different sets of intervals to each other. And this is the kind of functionality, that's provided by the framework we're discussing here. So here's a little visual depiction of some r output of the GRanges. A GRange, as you can see here, consists of three, this particular GRange, consists of three Genomic intervals. The three intervals are all on chromosome one. They have a strand associated on them which in this case is plus, minus, and plus. And they have some ranges. The first interval goals contains the bases one, two, and three. And the second one contains three, four, and five. The GRanges has names associated with it. In this particular instance, that's optional. The names are A1, A2, and A3. And there's also some information about the genome, which is not very helpful in this case here. In this case, the software has inferred that there's a single chromosome. And it doesn't know how long the chromosome is. But usually, when we work with human data. We know exactly how long the different chromosomes are. And many people like to store this information and the object as well. So, GRanges are defined and the functionality is provided by two fairly complicated packages called GenomicRanges and another package called IRanges. As I said before, these packets are fast and efficient. But at first glance, they can appear very complicated with many different classes and a lot of different functions. What we are going to try to do in the following videos is simplify it a little bit to make it easier to process. GRanges, or genomic ranges, as a software package is described in this excellent paper from Mike Lawrence and other authors in PLoS Computational Biology. And what we're doing, or this concept of computing on intervals. Is something that is functional that is also, provided by a command line tool called bit tools which is an excellent tool. It's not really something we are going to discuss in this series of lectures.