In the previous course in the Specialization, we learned how to compare genes, proteins, and genomes. One way we can use these methods is in order to construct a "Tree of Life" showing how a large collection of related organisms have evolved over time.
In the first half of the course, we will discuss approaches for evolutionary tree construction that have been the subject of some of the most cited scientific papers of all time, and show how they can resolve quandaries from finding the origin of a deadly virus to locating the birthplace of modern humans.
In the second half of the course, we will shift gears and examine the old claim that birds evolved from dinosaurs. How can we prove this? In particular, we will examine a result that claimed that peptides harvested from a T. rex fossil closely matched peptides found in chickens. In particular, we will use methods from computational proteomics to ask how we could assess whether this result is valid or due to some form of contamination.
Finally, you will learn how to apply popular bioinformatics software tools to reconstruct an evolutionary tree of ebolaviruses and identify the source of the recent Ebola epidemic that caused global headlines.

Avaliações

PO

Good course for improving algorithmic skills and keep learning something new

ZX

Jul 21, 2019

Filled StarFilled StarFilled StarFilled StarFilled Star

In depth and comprehensive coverage of the topics in genetic data analysis.

Na lição

Week 2: More Algorithms for Constructing Trees from Distance Matrices

<p>Welcome to Week 2 of class!</p>

<p>Last week, we started to see how evolutionary trees can be constructed from distance matrices. &nbsp;This week, we will encounter additional algorithms for this purpose, including the neighbor-joining algorithm, which has become one of the top-ten most cited papers in all of science since its introduction three decades ago.</p>

Ministrado por

Pavel Pevzner

Phillip Compeau

Transcrição

Now, we know that there's no way we're going to be able to fit a tree to a non-additive matrix, by definition of what a non-additive matrix is. But maybe we can construct a tree that is, in a sense, the best approximate solution for a given distance matrix. For example, consider the non-additive distance matrix that we encountered earlier in the talk, shown below, along with the following tree. To measure how close this tree is to the distance matrix, we need to compare the entries in D, i.e., upper case D, with the distance between the corresponding leaves in the tree, i.e., lower case d. The matrices are symmetric, so we only need to consider the entries above the main diagonal, of course. Now for this example, notice that the red values disagree between the matrix and the tree. The distance from i to l, for example, is 3, not 4, which is what the tree indicates. And the distance from j to l is 5, not 4, which is what the tree indicates. So in general we can compute how well a tree fits a distance matrix by computing the sum of squared errors between the distance matrix, upper case D, and the distances between leaves in the tree that we're given, lower case d. So to denote this sum of squared errors, we used the term "Discrepancy(T, D)". For this particular tree, there are only two non-zero errors, both of which are equal to 1, and so the discrepancy is equal to 2. However, this is not the best we can do for this tree. Why don't you take a moment to see if you can find edge weights that provide a smaller value of discrepancy? Can you find maybe a provably best assignment of edge weights for this tree? In general, there is a polynomial algorithm that will minimize the sum of squared errors if we are given the structure of the tree (T) in advance, but in practice, we're not going to know the structure of the tree in advance, so we'll need to minimize the sum of squared errors over all possible tree structures. The number of tree structures is exponential in the number of the leaves in the tree, and so unfortunately, this least squares distance based phylogeny problem is NP-complete, and so it's not going to help us fit a tree to a non-additive matrix unfortunately.