Space-Efficient Sequence Alignment

Course video 10 of 20

<p>Welcome to Week 3 of the class!</p>
<p>Last week, we saw how a variety of different applications of sequence alignment can all be reduced to finding the longest path in a Manhattan-like graph.</p>
<p>This week, we will conclude the current chapter by considering a few advanced topics in sequence alignment. For example, if we need to align long strings, our current algorithm will consume a huge amount of memory. Can we find a more memory-efficient approach? And what should we do when we move from aligning just two strings at a time to aligning many strings?</p>

Once we have sequenced genomes in the previous course, we would like to compare them to determine how species have evolved and what makes them different.
In the first half of the course, we will compare two short biological sequences, such as genes (i.e., short sequences of DNA) or proteins. We will encounter a powerful algorithmic tool called dynamic programming that will help us determine the number of mutations that have separated the two genes/proteins.
In the second half of the course, we will "zoom out" to compare entire genomes, where we see large scale mutations called genome rearrangements, seismic events that have heaved around large blocks of DNA over millions of years of evolution. Looking at the human and mouse genomes, we will ask ourselves: just as earthquakes are much more likely to occur along fault lines, are there locations in our genome that are "fragile" and more susceptible to be broken as part of genome rearrangements? We will see how combinatorial algorithms will help us answer this question.
Finally, you will learn how to apply popular bioinformatics software tools to solve problems in sequence alignment, including BLAST.