In this segment,
we're going to talk about linear regression, and we're going to derive the
expression for the linear regression model, so we're going to talk about
derivation. Now, what regression is all about is that you are trying to best fit
some given data. So let me write this down first, and then I will show you
graphically. So basically it is that you are given certain data points, let's
suppose x1, y1, x2, y2, all the way up to xn, yn. So you're given, let's
suppose n data points, and what you want to be able to do is that you want to
best fit a straight line to it, and that straight line, let's suppose we call it
y is equal to a0, plus a1 x, that's the general form of a straight line, y is
equal to a0, plus a1 x, where a0 is the intercept and a1 is the slope of that
straight line. So what we're trying to do is that somebody's giving us these
data points, and what they want us to do is they want us to best fit a straight
line to this data point . . . to these data points. So let's suppose somebody
says, hey, this is the straight line which you are drawing. So the equation of
the straight line which we have is y is equal to a0, plus a1 x, and this is,
let's suppose, x1, comma, y1, and this is let's suppose some xi, comma, yi, some
ith data point, let's suppose, and basically you are drawing all . . . showing
all those n data points on this graph here, and then you are trying to draw a
straight line to best fit those data points. Eventually what you want to do is
you want to minimize the amount of difference which you have between these
observed values, which are the ones which are given to you, and what the
straight line will predict, because you want to minimize the amount of
difference between, but at the same time you have the differences at several
points, at all the data points. You'll have a difference here, you have a
difference here, you've got a difference here, so everywhere there's a
difference between what you are observing and what you are predicting. So the
cross is the one which you are predicting, and the dot is the one which you are
observing. And now this data point here, or not the data point, but the
predicted value at that particular point will be simply a0 . . . it'll be simply
a0, plus a1 times xi, because that's a straight line which you are going to
find, and then if you put the value of x equal to xi, that's what you're going
to get for the predicted value, and this is the observed value. So if you look
at the amount of residual, that's what it's called, some people call it error,
so there's a residual at each and every data point which you have, which will be
the observed value, which is yi, minus the predicted value, which is the
predicted value from the straight line, so you've got yi, minus a0, minus a1
xi. That is the amount of residual which you are having at each data point.
Now, what you want to be able to do is you want to be able to somehow make these
residuals to be small everywhere, and one of the criteria used is to simply take
the summation of all the errors, square it, and then you want to add all those
errors, the square of all the errors, and which is called the Sr, sometimes . .
. and the Sr stands for the sum of the square of residuals. And that's what you
want to try to make as small as possible, because eventually what the goal is to
minimize the amount of residuals, and one of the criteria which is used is
called the least squares method of finding the regression model. And least
squares means that you are squaring the errors, and you are trying to add all
them, and you're trying to minimize them, and that's why it's called least
squares regression. So that means that it's summation of yi, minus a0, minus a1
xi, squared, and that's what you are getting from there, i is equal to 1 to n.
So you want to minimize the summation there. Now, you can understand that when
somebody's telling you to do regression, you have these as the observed values,
and these are the . . . this is what you're predicting, and the only control
which you have is on these constants of the regression model, so these are
called the constants of the regression model. And depending on what kind of
regression model you are drawing, in this case we are doing a linear regression,
so a0 and a1 are the two constants of the regression model. You want to be able
to find those. Those are something which are in your control, and you want to
control them in such a fashion that this summation, this whole summation becomes
as small as possible. You cannot make it exactly equal 0. If you make it
exactly equal to 0, that means that it has to, the straight line has to go
through all the data points, which is not going to be the case when you are
doing regression. So you want to be able to make this as small as possible. So
what that means is that we have to derivatives with respect to a0 and a1, put
those equal to 0, going back to your differential calculus class, to be able to
minimize this expression there, minimize the summation here. So what I'll have
to do is take the derivative of the sum of the square of the residuals with
respect to a0, put that equal to 0, take the derivative Sr with respect to a1,
and put that equal to 0. So that's what I will have to do, and once I put those
equal to 0, I'll get two equations . . . two equations, and two unknowns, and
once I get two equations, two unknowns, I'll be able to find what a0 and a1 are.