We will learn computational methods -- algorithms and data structures -- for analyzing DNA sequencing data. We will learn a little about DNA, genomics, and how DNA sequencing is used. We will use Python to implement key algorithms and data structures and to analyze real genomes and DNA sequencing datasets.

Taught By

Ben Langmead, PhD

Assistant Professor

Jacob Pritt

Transcript

Back in the first few lectures, we distinguished between two different computational problems that we need to solve when analyzing sequencing data. So, both of these problems arise when we wana take a collection, a large set of sequencing reads, and figure out where each of them came from. Which you can think of these reads as being like puzzle pieces. And you can think of the problem we're trying to solve as being like putting together a puzzle. We just spent a while discussing the first version of the problem, called read alignment. And in read alignment, it's analogous to this situation where we're trying to put together a puzzle, but with the help of the picture of the completed puzzle from the lid of the puzzle box. So, the picture of the completed puzzle is the reference genome that we would use, such as the human reference genome, for example. And that sequence came from the Human Genome Project. Now, we're going to discuss another version of the problem, called assembly for short. But it also goes by longer names, such as de novo assembly, or de novo shotgun assembly. De novo just means from scratch, and the shotgun just refers to the fact that the reads are coming randomly from all over the genome. So, in this version of the problem, we do not have the benefit of being able to see the picture of the completed puzzle. So, we might be studying a species, for example, that's never been sequenced before. In fact, when the Human Genome Project was conducted, it was the first time a human genome was being completely sequenced. So, that project had to solve the de novo assembly problem with respect to the human genome. That was a very, very big problem. And as we'll see, this problem is generally a fundamentally hard problem. So, it's definitely more computationally work intensive than the corresponding read alignment problem. But it's also just fundamentally difficult. Some of the things that make it hard are things that we can sidestep, or we can deal with. But some of the things that make it hard are things we just can't avoid. So, we're going to have to live with certain unfortunate realities. But that said, we will also encounter a few really good ideas and some very profound ideas that form the basis of our modern tools for solving these problems. And furthermore, there's reason to hope that the assembly problem is one that technology will help us get even better at in the future. So, we'll start to tackle this problem in the following lecture.

Explore our Catalog

Join for free and get personalized recommendations, updates and offers.

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.