Loss function of Linear Regression:

Regularization：

Regularization is a very important technique in machine learning to prevent overfitting. Mathematically speaking, it adds a regularization term in order to prevent the coefficients to fit so perfectly to overfit. The difference between the L1 and L2 is just that L2 is the sum of the square of the weights, while L1 is just the sum of the weights. As follows:

L1 Regularization:

Linear Regression with L1 regularization also named Lasso Regression.

L2 Regularization:

Linear Regression with L2 regularization named Ridge Regression.
And $\frac{1}{2}$ here is to simply computation.

Gradient descent:

Gradient descent is a method like climbers look down from hill peek , and go down step by step. Of course, the step size could be tuned by people.
Repeat until convergence:

Learning rate a

If $\alpha $ is small, so it would be a tiny tiny baby step.
If $\alpha $ is large, it would be a large step which may fail to converage.

Gradient descent with regularization

Cost function like this, then derivate of cost function :

Then gradient descent will repeat :

Therefore, $\lambda$ was introduced to minimize the value of $\theta$ by minus an extra value. And this is the main purpose of regularization.