What, When & Why of Regularization in Machine Learning?

In this post, we will try and understand some of the following in relation to regularizing the regression machine learning models to achieve higher accuracy and stable models:

Background

What is regularization?

Why & when does one need to adopt/apply the regularization technique?

Background

At times, when one is building a multi-linear regression model, one uses the least squares method for estimating the coefficients of determination or parameters for features. As a result, some of the following happens:

Often, the regression model fails to generalize on unseen data. This could happen when the model tries to accommodate for all kind of changes in the data including those belonging to both the actual pattern and, also the noise. As a result, the model ends up becoming a complex model having significantly high variance due to overfitting thereby impacting the model performance (accuracy, precision, recall, etc) on unseen data. The diagram below represents the high variance regression model

The goal is to reduce the variance while making sure that the model does not become biased (underfitting). After applying the regularization technique, the following model could be obtained.

Fig 2. The regression model after regularization is applied

A large number of features and the related coefficients (at times large enough) result in computationally intensive models.

The above problems could be tackled using regularization techniques which are described in later sections.

What is Regularization?

Regularization techniques are used to calibrate the coefficients of determination of multi-linear regression models in order to minimize the adjusted loss function (a component added to least squares method). Primarily, the idea is that the loss of the regression model is compensated using the penalty calculated as a function of adjusting coefficients based on different regularization techniques.

References

Summary

In this post, you learned about the regularization techniques, and, why and when are they applied. Primarily, if you have come across the scenario that your regression models are failing to generalize on unseen or new data or the regression model is computationally intensive, you may try and apply regularization techniques. Applying regularization techniques make sure that unimportant features are dropped (leading to a reduction of overfitting) and also, multicollinearity is reduced.

Ajitesh has been recently working in the area of AI and machine learning. Currently, his research area includes Safe & Quality AI. In addition, he is also passionate about various different technologies including programming languages such as Java/JEE, Javascript and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data etc.