Friday, 22 November 2013

machine learning--04 : Normal equation

Besides gradient descent, we could use normal equation to find out the optimal hypothesis function too.

As equation 1 show, normal equation is far more easier to implement with compare to gradient descent.If the matrix is singular, we could either decrease the number of features, or using SVD to find an approximation of the inverse matrix.

The pros and cons of gradient descent vs normal equation.

Gradient Descent

Need to choose alpha

Needs many iterations

works well even when n(number of features) is large

Normal Equation

No need to choose alpha

Don't need to iterate

Need to compute inverse matrix

slow if n(number of features) is very large

The price of computing the inverse matrix is almost same as O(n^3), this kind of complexity is unacceptable when the number of n is big(10000 or more, depends on your machine).