Experienced Computer Scientists analyze and solve computational problems at a level of abstraction that is beyond that of any particular programming language. This two-part class is designed to train students in the mathematical concepts and process of "Algorithmic Thinking", allowing them to build simpler, more efficient solutions to computational problems.
In part 2 of this course, we will study advanced algorithmic techniques such as divide-and-conquer and dynamic programming. As the central part of the course, students will implement several algorithms in Python that incorporate these techniques and then use these algorithms to analyze two large real-world data sets. The main focus of these tasks is to understand interaction between the algorithms and the structure of the data sets being analyzed by these algorithms.
Once students have completed this class, they will have both the mathematical and programming skills to analyze, design, and program solutions to a wide range of computational problems. While this class will use Python as its vehicle of choice to practice Algorithmic Thinking, the concepts that you will learn in this class transcend any particular programming language.

Enseigné par

Luay Nakhleh

Associate Professor

Scott Rixner

Professor

Joe Warren

Professor

Transcription

So now we have, we have reasoned about how to solve the [INAUDIBLE] structure problem. And we got to our recursive formula, which we said, we can easily just now turn it into a recursive, recursive algorithm. How do recursive algorithms work? Again, recursive, the recursive algorithm is going to start from opt of 1N, and then it's going to start breaking the problem into smaller problem, and solving, and solving, and solving. If you think about it, if you try that algorithm and you look at it carefully, you will see that there will be many times where you for example opt of IJ will be solved over and over and over again for a same I and J. Okay? So the, one of the issue with recursive algorithms in these kind of problems. Is that there are some problems that are going to appear multiple times when we break the problem into sub-problems. And these sub-problems are going to be solved again, over and over and over. Now instead of doing that, how about when we solve a sub-problem, we store the value. So when we solve opt of I J, we store the value of opt I J. And then next time we see, next time we encounter opt of I J, instead of solving that again, which will take time, we can just look up the, the value of that I J in a matrix, I by J. Right? And this is actually the, the, the technique that we, I'm going to show now, or the implementation of that, of that algorithm. That instead of doing recursive implementation is going to store values in a table and every time we need, we need a value instead of re-computing it we'll just look it up in a table. And this the, this is the implementation here of the algorithm I just described. Again, I'm going to, I'm going to start. Instead of, of starting opt of 1 N, I will starting from opt of, of I J from smaller values of I J. From 1, 1, 1, 2, and so on, and go all the way and build towards opt of 1 N. So, wha, first thing is I, I can op, initialize opt of IJ to zero for every case where I is greater than or equal to J minus 4. because remember, the no sharp turns say that i has to be smaller than J minus 4. So, for every case where I is greater than or equal to minus 4. This can't, this has to be zero. I cannot have any pairs, in the, in the set. So, now that, that this is the case, I can initialize k from 5 and onwards, and I start with the sequence with the onward sequence from position 1 to, all the way to N minus K. When I look at position 1, in my a, in my RNA sequence. If this is the position, I look at I equals 1, then J has to be incremented by value K, again here by 5, so that it doesn't violate the, the, the no shorts conditions. I can start now solving between I and J, then I increment J, solve this case, then J is here. I solve this case. When I'm done with every possible value of J, I increment I and I repeat the same exercise. At the end I will have the value of I all the way between one and N. Okay? So this is the algorithm that is not a recursive implementation. So here OPT of I J is not, OPT is not the name of a function, it's a name of a matrix. Suppose OPT, assume that OPT is basically a, a matrix where I am storing values. Okay? So, OPT is a matrix that is, that is indexed by I and j and every time I come to this, what I am saying is OPT of I, J is the max of OPT I, J minus 1. So I lookup the value. In the table, at, put, at, in the row I, column J minus 1. Max again, vary the value of T and look up these values. Since I'm building the matrix from smaller I and J towards bigger ones, every time I come to I and J, these values have already been computed. And I can just look them up in the table okay? So the difference again, between our recursive implementation and this sort of implementation, is in that recursive implementation I would have called the function OPT, and I will pass parameters i, n, j in addition to the, to the RNA string. And it would start from OPT of 1N, and starts going down and building, and trying to solve sub-problems, sub-problems. And when it solves all the sub problems, it goes back. And, and compute the solution to the main problem. This implementation, instead is going to start from the simpler cases. I'm going to say, yes you want opt of one n. I'm going to start from opt one and six for example. One of 7, 1 of 8 and 1 of 9, and start building my solution toward opt of 1N. and this is basically the algorithm. Again these are all, keep in mind that OPT here is the name of a matrix, and when I say return OPT of 1, n, I'm saying look up the entry in row 1, column n, in this matrix and just return. Okay? So this is an algorithm that is not recursive. And it stores values in a table, and every time it wants to, to, to compute a new value, it's just based on looking up values in a table. So lets, lets illustrate this algorithm with this string here. So if I have a, an, an RNA string of 9, of 9 symbols. And I have this algorithm, again the initialization is that it's for every I and J where I is greater than or equal to J minus 4. We are going to fill this matrix here with zeros, okay. So for example, if I look at 4 and 6, when I is, is 4 and J is 6. This is going to be zero because 4 is greater than or equal to 6 minus 4. So in particular, we will see that this is going to be ze, zero, this is going to be, these are all going to be zeros here. Okay? So here 1 and 6, 1. Is great 1 6 minus for is 2, so this one, actually is not going to be 0's here, just these one's, right? So when i is 1 and this is 1 is not greater than or equal to 6 minus 4 which is 2, so this is fine here, okay? So these. These three rows here are going to be, these three diagonals are going to be all zeroes, all right. Based on that initialization. Now we go to the more interesting cases. I say let K equal 5. So now let's think about k equal five. If K equal 5, and then I'm going to go with i, it's going to go from 1 to 2, to all the way to N minus K. N is 9, K is 5, so it's going to go all the way to 9 minus 5, which is 4. Okay? So I is going to traverse these values. For each I, J is going to be I plus K. So 1 plus 5 is 6. 2 plus 5 is 7, 8, 9 okay? So this algorithm is going to compute the values in the matrix for these pairs of indices. So it's going to compute pt of 1, 6, opt of 2, 7, opt of 3, 8, opt of 4, 9. Okay? So if we star with 1, 6. Opt of 1,6. [SOUND] Opt of 1,6 it can be the max. Let's look at how this computation would be. It's either opt of 1, 5, this 1, or the max over sum T. Of 1 plus opt of 1 for 40 minus 1, and opt of T plus 1 and G minus 1, which is five. Okay? Now, of course, here when I add one, I'm basically assuming that 1,6 can go together. 1 and 6, A and U, can go together in this case. For example, when I go to 1, 7, when I'm if I'm looking at 1, 7, 1 and 7 cannot go together so the t wouldn't count there. OKay? So here opt of 1, 5. Opt of 1, 5 is going to be, where is that here? Opt of 1,5 is zero. And because it, because of the initialization. And this one here is going to be, the value is going to be one. So we'll end up at, actually the, the value will be end up computing here is going to be one. And if I repeat this exercise for K equal 5 and now looking at, at 2, 7 and then 3, 8 and then 4, 9, I'll end up with 1, 0 here and 1 and 0. Okay? And repeat now the exercise. When we are done with this now, we increase K to 6. But notice the interesting thing that's happening here is that every time I want to compute these values, all these entries are already in the matrix, okay? There are already in the matrix, okay? So, now I, I increase K to 6. If, when K is set, is, is a 6, this is going to go, I is going to from 1, 2 to 3 and J is going to be I plus K. Okay? So, now we're going to be, the way this algorithm is, is working is that, it will, it completed this diagonal. Then next time, it's going to do this diagonal here. Then the, then this, and then, finally, this one. Okay? So if we look at this here, and we do the computation right, you will notice that it will do, it will get values ones here. Then we come to this one, it will compute these values again, just using this this relation. And notice that every time when I come to compute this one, it's going to be based on values that have already been computed. It was never a need, when I compute this, it will not need this value, which hasn't been computed. Then at the end it will compute this value which is two. Okay? So, the important thing to keep in mind here is that this algorithm. It's building a matrix that's called opt, here. It is indexed by this I N J and then at the end of the day what I'm really interested really in, is this here. This is of opt of 1,9, which is optimal number of positions that are matched together in, when I'm doing the RNA structure and the entire sequence. And the other thing that is very important to note is that every time I'm filling and entering the matrix I am using entries that have already been filled. And the last one to notice is that the way it is working again to ensure that I can make use of values that have been computed, it is working in this fashion. It is filling this diagonal, then this, then this, then this and it terminates here when we are done OPT of 1 N. Okay? So this is the, the, an implementation. Of this algorithm, and this implementation belongs to a class of algorithms that we call dynamic programming. Okay? This is called dynamic programming, which is different from recursive algorithms because a recursive algorithm would have implemented this algorithm recursively without storing values in the matrix, whereas here we are storing values in the matrix. And every time we will see an instance of the problem, that we had solved already, we will just look up the value of that instance from the matrix.