Friday, 18 March 2016

THE BIAS-VARIANCE TRADEOFF

Hello, welcome to my blog. In this post, I want to talk about
the Bias-Variance trade-off which is a very important topic in Machine
Learning. Before I do that let me lay the foundation. In my post on Linear
Regression, I said that the goal of a linear regression model is to find
parameters for our linear regression line that minimize the error between our
predictions and the actual observations.

This does not only apply to linear
regression; in fact that’s the goal of the majority of machine learning
techniques. We want to minimize the error between what we predict and what we
actually observe. This leads us to the 3 sources of error:

Irreducible error

Bias error

Variance error

Irreducible error refers to error that exists due to noise in
the data. This kind of error cannot be eliminated no matter how we try to
optimize the parameters of our machine learning algorithm. This is because the
function (i.e. the model) we use to estimate the relationship between the input
variable(s) and the output is only an approximation of the true (target) function
which is unknown. Therefore, we are always bound to have some error in our
predictions. The next two types of error will be the major focus of this post.
First, I will define two important terms:

BIAS

Bias are the simplifying assumptions made by an algorithm in
order to make the target function easier to learn. Machine Learning (ML)
algorithms with a high bias are inflexible, simple and may have few parameters.
ML algorithms with low bias are flexible, can be complex and potentially may
have lots of parameters. Examples of high bias ML algorithms are linear
regression and logistic regression.

VARIANCE

Variance refers to the amount the estimate of the target
function will change if a different data was used for training. Even though the
estimation of the target function will change for a different dataset, we do
not want the change to be too much for various datasets meaning that the model
is able to capture the underlying pattern between the input and output
variables. ML algorithms with high variance tend to be very flexible and are
able to capture subtle relationships in the training data. Examples of high
variance ML algorithms are decision trees, k-nearest neighbours and support
vector machines.

HOW DO THESE LEAD TO ERROR?

Bias Error: If a model has high bias it will
make prediction errors because it fits the relationship between the input and
output variables using a simple estimate of the target function which is rarely
the case in real-life. This simple estimate is bound to make errors if it is
used for prediction. This problem is called underfitting.
I like to think of this as a student who does not prepare well for an examination.
Clearly, this student is bound to fail the examination.

Variance Error: A model has high variance might
perform well on the training data because it is flexible enough to capture the
relationship between the input and output variables. Good right? Not really,
this is because if the model fits the training data too well it may perform
poorly on future data which is obviously a problem. This problem is called overfitting. Using the same student
analogy, a model with high variance is like a student who reads the study
material too well to the point of
cramming it. While he may pass a question based on the study material (training
data), he will fail questions that do not
originate from the study material which is likely for an examination (future
data). Like the previous student he is bound to also fail the examination.

BIAS-VARIANCE TRADE-OFF

We can see a pattern here, algorithms with high bias have low
variance while those with low bias have high variance. Concretely, as variance
increases bias decreases and vice versa. This is why it’s called a trade-off
because you rarely can have the best of both worlds which would be low bias and
low variance.

CONTROLLING BIAS AND VARIANCE

How do we control bias and variance? Below are some ways of
controlling bias and variance for some ML algorithms.

Linear regression &
Logistic regression:
These algorithms have high bias and low variance but the trade-off can be
altered using a technique called regularization.
This is done by adding a tuning parameter λ to their respective cost functions.
Increasing λ increases the bias (and reduces the variance); while decreasing λ
reduces the bias (and increases the variance) of the model.

K-Nearest Neighbours: This algorithm has low bias and
high variance. This can be adjusted by tuning k which dictates how many neighbours that will contribute to our
prediction. Increasing k increases
the bias and reduces the variance of the model.

Support Vector Machines: Like k-nearest neighbours, this
algorithm also has low bias and high variance. This can be adjusted by tuning the
Cparameter that
influences the number of violations of the margin allowed in the training data
which increases the bias but decreases the variance.

Conclusion

Now you know what the bias-variance trade-off is. For every
problem you encounter in machine learning you will need to find a balance
between these two. If you are interested in learning more about this I suggest reading
Section 2.2.2 of the book An Introduction
to Statistical Learning.

Hope you enjoyed reading this post. If you have any questions
or suggestions please leave a comment and I will happy to attend to you. That
is all for now. Have a wonderful weekend. Cheers!!!