What makes WiFi faster at home than at a coffee shop? How does Google order its search results from the trillions of webpages on the Internet? Why does Verizon charge $15 for every GB of data we use? Is it really true that we are connected in six social steps or less?
These are just a few of the many intriguing questions we can ask about the social and technical networks that form integral parts of our daily lives. This course is about exploring the answers, using a language that anyone can understand. We will focus on fundamental principles like “sharing is hard”, “crowds are wise”, and “network of networks” that have guided the design and sustainability of today’s networks, and summarize the theories behind everything from the social connections we make on platforms like Facebook to the technology upon which these websites run.
Unlike other networking courses, the mathematics included here are no more complicated than adding and multiplying numbers. While mathematical details are necessary to fully specify the algorithms and systems we investigate, they are not required to understand the main ideas. We use illustrations, analogies, and anecdotes about networks as pedagogical tools in lieu of detailed equations.
All the features of this course are available for free. It does not offer a certificate upon completion.

Revisiones

Filled StarFilled StarFilled StarFilled StarHalf Faded Star

4.4 (47 calificaciones)

5 stars

27 ratings

4 stars

13 ratings

3 stars

6 ratings

2 stars

1 ratings

De la lección

Movie Recommendation on Netflix

One of the perks of having a Netflix subscription is getting recommendations of movies to watch. Behind the scenes, Netflix uses powerful algorithms to determine which will be suggested to each person specifically. In this lesson, we will take a look at the main ideas behind these algorithms.

Impartido por:

Christopher Brinton

Lecturer

Mung Chiang

Professor

Transcripción

So now we need to leverage these similarity values, and in order to do that we need to use three tables that we found previously. This right here, this table is just the original ratings that I'm showing. But in addition we need the baseline predictor, we need the baseline errors which again is by taking the subtracting the predictions out, and then we need the cosine similarities, and which we just found. So when we have these three things, we can actually leverage these similarity failures. And so we're going to build what's called the neighborhood predictor now, and it's, it's very simple once we have these three tables to do the neighborhood prediction. We just have to look, look up a few values in the tables. And so let's, let's just look at an example first, and then we'll write out the general case of it and see how you go to apply it in a general sense. Alright so lets lets take C five. Right so we'll take, user C, movie five. So we wont be using this table. We'll just, reference it if we need to, to see. And so, with C five, we know we in the baseline predictor we had 2.60. And, we know the prediction was too low by, 3, right. So, we have a 2.60 as the baseline prediction for that, okay. Now we need to do something with, the neighbor, right, because that's the whole idea here, we want to use the neighbor. So since they're movie neighbors, we'll try to look up movie five and we'll find movie five's neighbor. So, movie five's closest neighbor is two. As we've said, cause this is the neighbor, so we, we want to come down and find two, right here. So, now, if we look in in two, 'kay the next question is did user C rate movie two? And the answer is yes he did, and it's not even part of the test set, because if it was, then we couldn't use the value. Since it's is also part of the training data, it's still useful. 'Kay, so, user C did rate movie two. You can see right here the rating was 1.33. And that 1.33 was lower than this two, right? Which is why this value is positive right here. So it's indicating to us that we needed to increase the prediction to get it to where I need it to be. So now, what we want to do is we want to combine this, this prediction right here for C five with the error that we got for C two, okay, because two is the neighbor of five, and we need to, we're going to use two because we can't use the value directly or else that would just be reverse engineering. We have to use other values that are in the training set on the table. Okay, because if we did, if we just used this value, right, if we just added C five back and got zero prediction, then that wouldn't be very helpful for, for instance, when we were using the actual values and we needed to find which are either the test set or the the ones that we don't have because then there would be nothing to add, right? If we were just going to add the values back we wouldn't know what to add. And that's just the idea again that we can't reverse engineer the test site. We just have to use other structure within the table. So we find the neighbor, the nearest neighbor is two, and then we find two, and we see that user C has rated movie two. Okay, so this is and it's 0.67, is that value, okay. And now we're going to add 0.67 to that, and that gives us 3.27, 'kay. So now, sometimes we don't want to add values. Okay, so we're adding right here this um,neighborhood prediction value. Sometimes we want to subtract those. And I bet you can probably guess when we want to add and when we want to subtract. It dependes upon whether we have a positive or a negative correlation. So really we don't care about the absolute values that are in this table. We would if we were doing two, or three, or four neighbors, but since we're just doing one neighbor, all we care about is whether the values are positive or negative. So you could really just replace this with either a minus or a plus, in all these values here and it wouldn't make a difference to us, for these purposes. Okay, so now, let's consider another example. Let's try user B and movie four. So with user B and movie four, we have a baseline value of 3.50 and that's in the test set. So, now this is, this is a case where we couldn't actually add the error, because we don't know what the error is. So we have 3.50, okay, now we look at four, four's neighbor is two, so four also uses two as it's nearest neighbor. So, the question is now, has B rated movie two? And the answer is yes, B has rated movie two and it is part of the training set. Okay, so, we can use this value. And this value is 0.17. So, what that's indicating to us is that the predictor was too low by 0.17 for this rating of b for movie two. Now here's the kicker. Two and four have a negative correlation. 'Kay, so since the predictor was too low for movie two, we're going to think that it was probably too high for movie four. So rather than adding this value of 0.17, we're going to subtract the 0.17 out. Okay, so we're going to subtract the 0.17 and say, you know, since this was indicating it was too low, it was probably too high for number four, okay. And statistically speaking, we would say that, probabilistically it is, probab, the probability is great that it's going to be higher, that's the idea here. so then we subtract out 0.17 and we get 3.33. So this is for C five, this is for B four. So now we can write out just a simple, general equation, here, right here, is that the neighborhood predictor is going to be the baseline plus or minus the error for the neighbor, and we, we add if it's a positive correlation then we subtract if it's a negative correlation. That's it. That's, so that's all we need. We just need these three tables and then we can do all these calculations. So we can try another example really quick, because I just want to illustrate the fact that here with C, with C five for instance, we saw, we know that the value is 3 over here. So we're going from 2.60 to 3.27 which is overshooting the value, but it's still closer, right. 3.27 is closer to 3 than 2.60 is. And the same thing with B four, if we look at B and four, the real value's 3, B four is 3.50 before, and now we're subtracting and getting 3.33. So we are getting closer. But, in general, we're not always going to get closer, and we just hope that overall the RMSC gets lower. So sometimes, it's going to be a mistake, it's going to go in the wrong direction, for instance. So, as an example of that, let's take user F's rating will be 5. So, this, this is 0.4 right here and 3.85s already pretty close to 0.4. but for F five we said we know we would have 3.85. And then if we look at five's nearest neighbor again, we said was two. So, then we go into the table for two. F has rated two, and they're a positive correlation so, we would actually end up subtracting. And we add the error here, which is a negative value which means that we subtract here, is saying the predictor was too high, then we get something like 3.27, and 3.27 is clearly in the wrong direction. 3.85 was closer so this is getting farther away. And so it's not always going to give us better but there is nothing we can do about that. We just hope that it overall gets better. And the final point is that if if we get a time where the user, the neighbor has not rated the movie, then we just don't use it. Like we know for instance, that, five's neighbor is two, we'll take that example again. So if we were doing, trying to do for D five, right, and we have this value here, we could not use D two right, because even though two is the neighbor, so we want to see what D rated movie two in order to augment the prediction for five, but since that's part of the test set we couldn't use that. So we would just keep the baseline predictor the way it was instead of trying to add anything on at all.