WEBVTT
00:00:00.000 --> 00:00:01.466 align:middle line:90%
00:00:01.466 --> 00:00:03.840 align:middle line:84%
All right, now we're looking
at the second learning graph
00:00:03.840 --> 00:00:04.770 align:middle line:90%
example.
00:00:04.770 --> 00:00:07.140 align:middle line:84%
Now, what we're
going to here is very
00:00:07.140 --> 00:00:10.450 align:middle line:84%
similar to what we did in the
first learning graph example,
00:00:10.450 --> 00:00:13.250 align:middle line:84%
but we're going to be doing
it with a bunch more models.
00:00:13.250 --> 00:00:15.690 align:middle line:84%
And we're going to
see the differences
00:00:15.690 --> 00:00:19.080 align:middle line:84%
in these learning graph
models for the different types
00:00:19.080 --> 00:00:21.060 align:middle line:84%
of models, the differences
in the learning
00:00:21.060 --> 00:00:23.550 align:middle line:84%
graphs for the different
types of models,
00:00:23.550 --> 00:00:26.550 align:middle line:84%
and in particular for
basic models compared
00:00:26.550 --> 00:00:28.650 align:middle line:90%
to complex models.
00:00:28.650 --> 00:00:30.600 align:middle line:84%
First of all, let's
generate the data.
00:00:30.600 --> 00:00:32.700 align:middle line:84%
Here, I'm just going
to use synthetic data.
00:00:32.700 --> 00:00:37.420 align:middle line:90%
00:00:37.420 --> 00:00:39.730 align:middle line:84%
And once again,
we're going to use
00:00:39.730 --> 00:00:42.252 align:middle line:84%
this synthetic
data-- we can have
00:00:42.252 --> 00:00:43.460 align:middle line:90%
a little bit of a look at it.
00:00:43.460 --> 00:00:45.960 align:middle line:84%
I called it train,
because as I said,
00:00:45.960 --> 00:00:48.180 align:middle line:84%
you'd be doing learning
graphs on the training data,
00:00:48.180 --> 00:00:49.640 align:middle line:90%
not on the complete data.
00:00:49.640 --> 00:00:51.050 align:middle line:90%
It's got 500 rows.
00:00:51.050 --> 00:00:53.570 align:middle line:84%
It's called an x
column and a y column.
00:00:53.570 --> 00:00:56.530 align:middle line:84%
We'll be estimating
the y based on the x.
00:00:56.530 --> 00:01:01.520 align:middle line:84%
So as before, we're going to
be generating models from 10%,
00:01:01.520 --> 00:01:04.690 align:middle line:84%
20%, up to 100% of
the training data,
00:01:04.690 --> 00:01:09.430 align:middle line:84%
and estimating their in-sample
and out-of-sample mean squared
00:01:09.430 --> 00:01:10.280 align:middle line:90%
area.
00:01:10.280 --> 00:01:14.140 align:middle line:84%
The out of sample will estimate
10-fold cross-validation.
00:01:14.140 --> 00:01:16.810 align:middle line:84%
But rather than doing it for
a single ordinarily squared
00:01:16.810 --> 00:01:19.240 align:middle line:84%
model, we're going to
be doing it for six
00:01:19.240 --> 00:01:21.310 align:middle line:90%
polynomial regression models.
00:01:21.310 --> 00:01:25.240 align:middle line:84%
And the orders are going to be
order one, two, three, four,
00:01:25.240 --> 00:01:26.780 align:middle line:90%
five, and 20.
00:01:26.780 --> 00:01:29.290 align:middle line:84%
Now, of course polynomial
regression of order one
00:01:29.290 --> 00:01:32.470 align:middle line:90%
is just linear regression.
00:01:32.470 --> 00:01:37.600 align:middle line:84%
So you can look over these
lines and my comments
00:01:37.600 --> 00:01:39.850 align:middle line:84%
a little bit more
slowly in your own time,
00:01:39.850 --> 00:01:41.560 align:middle line:84%
but it's doing
exactly the same thing
00:01:41.560 --> 00:01:45.670 align:middle line:84%
that we did in the last excise,
only now for these six models
00:01:45.670 --> 00:01:47.170 align:middle line:90%
rather than a single one.
00:01:47.170 --> 00:01:51.130 align:middle line:90%
00:01:51.130 --> 00:01:53.440 align:middle line:84%
Once we've got
these results, we're
00:01:53.440 --> 00:01:57.940 align:middle line:84%
going to graph them,
just like before.
00:01:57.940 --> 00:01:59.895 align:middle line:84%
Except we're now going
to get six graphs.
00:01:59.895 --> 00:02:06.480 align:middle line:90%
00:02:06.480 --> 00:02:09.419 align:middle line:84%
And notice that they
look quite different
00:02:09.419 --> 00:02:10.615 align:middle line:90%
for the different models.
00:02:10.615 --> 00:02:13.200 align:middle line:90%
00:02:13.200 --> 00:02:18.720 align:middle line:84%
The low-order polynomial
models, order one, order two,
00:02:18.720 --> 00:02:22.710 align:middle line:84%
notice that the lines are very
close together from the start.
00:02:22.710 --> 00:02:26.280 align:middle line:84%
They're not getting a great deal
of improvement from more data.
00:02:26.280 --> 00:02:29.530 align:middle line:90%
00:02:29.530 --> 00:02:34.500 align:middle line:84%
Now, as we go to the higher
order, three, but particularly
00:02:34.500 --> 00:02:38.190 align:middle line:84%
four and five, there is a
bit of a gap at the start,
00:02:38.190 --> 00:02:42.930 align:middle line:84%
but that's disappeared pretty
much by 150 to 200 or so.
00:02:42.930 --> 00:02:49.290 align:middle line:84%
Notice also though, that these
models are converging together
00:02:49.290 --> 00:02:52.470 align:middle line:84%
at a much lower level
than the first order one
00:02:52.470 --> 00:02:54.120 align:middle line:90%
and order two models.
00:02:54.120 --> 00:02:57.510 align:middle line:84%
These more complex
models are basically
00:02:57.510 --> 00:03:03.049 align:middle line:84%
zooming in on a mean squared
area of around about 6,000,
00:03:03.049 --> 00:03:04.590 align:middle line:84%
whereas the order
one and two models,
00:03:04.590 --> 00:03:07.290 align:middle line:84%
we're zooming in on a main
squared area of up around
00:03:07.290 --> 00:03:11.100 align:middle line:90%
10,000 or 11,000.
00:03:11.100 --> 00:03:15.470 align:middle line:84%
So clearly, the more complex
models of polynomial regression
00:03:15.470 --> 00:03:20.150 align:middle line:84%
models for orders four or
five are performing better
00:03:20.150 --> 00:03:23.270 align:middle line:84%
and they appear
to need maybe 200,
00:03:23.270 --> 00:03:27.830 align:middle line:84%
but certainly say 150
rows of training data.
00:03:27.830 --> 00:03:32.450 align:middle line:84%
Let's jump to the really
complex model, order 20.
00:03:32.450 --> 00:03:37.190 align:middle line:84%
Now, this too is zooming in
on about the same, 5,000.
00:03:37.190 --> 00:03:40.850 align:middle line:84%
But clearly, it still needs
a lot more data to be able
00:03:40.850 --> 00:03:46.840 align:middle line:84%
to really maximise
its potential.
00:03:46.840 --> 00:03:51.610 align:middle line:84%
It doesn't perform at all
well up until about 150.
00:03:51.610 --> 00:03:53.350 align:middle line:90%
It's off this graph.
00:03:53.350 --> 00:03:56.140 align:middle line:84%
But the lines are
still clearly separate
00:03:56.140 --> 00:04:00.540 align:middle line:84%
up at the maximum amount
of training data, 500.
00:04:00.540 --> 00:04:02.510 align:middle line:84%
Now, this is
actually exactly what
00:04:02.510 --> 00:04:09.850 align:middle line:84%
we would expect from a model
that is overly complex.
00:04:09.850 --> 00:04:14.200 align:middle line:84%
If you give it enough data,
this overly complex model
00:04:14.200 --> 00:04:19.120 align:middle line:84%
will end up performing as
well as an optimal model.
00:04:19.120 --> 00:04:22.180 align:middle line:84%
If you give it less than that
amount, it's going to over fit.
00:04:22.180 --> 00:04:24.700 align:middle line:84%
So the order 20
model, if you give it
00:04:24.700 --> 00:04:26.800 align:middle line:84%
sufficient amount
of data, it will
00:04:26.800 --> 00:04:29.390 align:middle line:84%
converge to the same sort of
performance as the order four
00:04:29.390 --> 00:04:30.940 align:middle line:90%
and order five models.
00:04:30.940 --> 00:04:34.330 align:middle line:84%
But it needs that extra
data to be able to do so.
00:04:34.330 --> 00:04:37.730 align:middle line:84%
If it doesn't get that extra
data, it's going to over fit.
00:04:37.730 --> 00:04:40.240 align:middle line:84%
The order four and
order five models,
00:04:40.240 --> 00:04:42.455 align:middle line:84%
they appear to be roughly
the right complexity
00:04:42.455 --> 00:04:44.560 align:middle line:90%
to model this function.
00:04:44.560 --> 00:04:47.950 align:middle line:84%
If you actually look at how we
generated the synthetic data,
00:04:47.950 --> 00:04:50.620 align:middle line:84%
you'll find I think that it
is a fourth order polynomial.
00:04:50.620 --> 00:04:53.290 align:middle line:90%
So that's no surprise.
00:04:53.290 --> 00:04:57.910 align:middle line:84%
They managed to converge to
performing very well, not much
00:04:57.910 --> 00:05:00.520 align:middle line:84%
of a difference between the
two lines, very quickly,
00:05:00.520 --> 00:05:03.470 align:middle line:90%
needing about 150, 200 data.
00:05:03.470 --> 00:05:06.310 align:middle line:84%
The simple models,
order one and order two,
00:05:06.310 --> 00:05:10.030 align:middle line:84%
they converged to doing as
well as they can very quickly.
00:05:10.030 --> 00:05:12.190 align:middle line:84%
The two lines come
together very quickly.
00:05:12.190 --> 00:05:14.980 align:middle line:84%
But they're simply
too unsophisticated.
00:05:14.980 --> 00:05:19.300 align:middle line:84%
They're too simple to be able
to model the data generating
00:05:19.300 --> 00:05:20.710 align:middle line:90%
function well.
00:05:20.710 --> 00:05:22.840 align:middle line:84%
And so, although they
converged to doing
00:05:22.840 --> 00:05:25.180 align:middle line:84%
as well as they can
quickly, they actually
00:05:25.180 --> 00:05:26.490 align:middle line:90%
can't do very well.
00:05:26.490 --> 00:05:29.700 align:middle line:84%
Their mean spread area
is up at about 11,000,
00:05:29.700 --> 00:05:34.590 align:middle line:84%
or almost twice what the
order four and five models do.