Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

Sometimes backprop will define new hidden features that are not explicit in the input representation x, but which capture properties of the input instances that are most relevant to learning the target function t(x)

However, if we have four points, we can find a labeling such that the linear classifier fails to be perfect

We can see that 3 is the critical number

The VC-dimension of a linear classifier in a 2D space is 3 because, if we have 3 points in the training set, perfect classification is always possible irrespective of the labeling, whereas for 4 points, perfect classification can be impossible

A fancy term, but it simply means: we should find a classifier that minimizes the sum of training error (empirical risk) and a term that is a function of the flexibility of the classifier (model complexity)

Recall the concept of confidence interval (CI)

For example, we are 99% confident that the population mean lies in the 99% CI estimated from a sample

We can also construct a CI for the generalization error (error on the test set)