Linear Regression Explained with Real Life Example

In this post, linear regression concept in machine learning is explained with multiple real-life examples. Both types of regression (simple and multiple linear regression) is considered for sighting examples. In case you are a machine learning or data science beginner, you may find this post helpful enough. The following topics got covered in this post:

What is linear regression?

Simple linear regression example

Multiple linear regression example

Before we get into looking at some examples of regression, lets quickly get up-to-speed with understanding the concepts related to linear regression.

What is Linear Regression?

Linear regression is a machine learning concept which is used to build or train the models (mathematical structure or equation) for solving supervised learning problems related to predicting numerical (regression) or categorical (classification) value. Supervised learning problems represent the class of the problems where the value (data) of independent or predictor variable (features) and the dependent or response variables are already known. The known values of dependent and independent variable (s) is used to come up with a mathematical formula which is later used to determine (predict) output given the value of input features.

The linear regression mathematical structure or formula represents determining the value of output (dependent variable) as a function of the weighted sum of input features (independent variables). This data is used to determine the most optimum value of the coefficients of the independent variables.

Y = b1*X1 + b2*X2 + c

In the above equation, different values of Y and X1 and X2 are known during the model training phase. As part of training the model, the most optimal value of coefficients b1 and b2 are determined.

There are two different types of linear regression problems. They are the following:

Simple linear regression: The following represents the simple linear regression where there is just one independent variable, X, which is used to predict the dependent variable Y.

Fig 1. Simple linear regression

Multiple linear regression: The following represents the multiple linear regression where there are two or more independent variables (X1, X2) which are used for predicting the dependent variable Y.

Fig 2. Multiple linear regression

Simple Linear Regression Example

As shown above, simple linear regression models comprise of one input feature (independent variable) which is used to predict the value of the output (dependent) variable. The following mathematical formula represents the regression model:

Y = b*X + c

Let’s take an example comprising of one input variable used to predict the output variable. However, in real life, it may get difficult to find a supervised learning problem which could be modeled using simple linear regression.

Simple Linear Model for Predicting Marks

Let’s consider the problem of predicting the marks of a student based on the number of hours he/she put for the preparation. Although at the outset, it may look like a problem which can be modeled using simple linear regression, it could turn out to be a multiple linear regression problem depending on multiple input features. Alternatively, it may also turn out to be a non-linear problem. However, for the sake of example, let’s consider this as a simple linear regression problem.

However, let’s assume for the sake of understanding that the marks of a student (M) do depend on the number of hours (H) he/she put up for preparation. The following formula can represent the model:

Marks = function (No. of hours)

=> Marks = m*Hours + c

The best way to determine whether it is a simple linear regression problem is to do a plot of Marks vs Hours. If the plot comes like below, it may be inferred that a linear model can be used for this problem.

Fig 3. Plot representing a simple linear model for predicting marks

The data represented in the above plot would be used to find out a line such as the following which represents a best-fit line. The slope of the best-fit line would be the value of “m”.

Fig 4. Plot representing a simple linear model with a regression line

The value of m (slope of the line) can be determined using an objective function which is a combination of loss function and a regularization term. For simple linear regression, the objective function would be the summation of Mean Squared Error (MSE). MSE is the sum of squared distances between the target variable (actual marks) and the predicted values (marks calculated using the above equation). The best fit line would be obtained by minimizing the objective function (summation of mean squared error).

Multiple Linear Regression Example

Multiple linear regression can be used to model the supervised learning problems where there are two or more input (independent) features which are used to predict the output variable. The following formula can be used to represent a typical multiple regression model:

Y = b1*X1 + b2*X2 + b3*X3 + … + bn*Xn + c

In the above example, Y represents the response/dependent variable and X1, X2 and X3 represent the input features. The model (mathematical formula) is trained using training data to find the optimum values of b1, b2, and b3 which minimizes the objective function (mean squared error).

Multiple Linear Regression Model for Predicting Weight Reduction

The problem of predicting weight reduction in form of number of KGs reduced, hypothetically, could depend upon input features such as age, height, weight of the person and the time spent on exercises, .

Weight Reduction = Function(Age, Height, Weight, TimeOnExercise)

=> Shoe-size = b1*Height + b2*Weight + b3*age + b4*timeOnExercise + c

As part of training the above model, the goal would be to find the value of b1, b2, b3, b4 and c which would minimize the objective function. The objective function would be the summation of mean squared error which is nothing but the sum of the square of actual value and the predicted value for different values of age, height, weight and timeOnExercise.

References – Further Reads

Summary

In this post, you learned about linear regression, different types of linear regression and examples for each one of them. It can be noted that a supervised learning problem where the output variable is linearly dependent on input features could be solved using linear regression models. Linear regression models get trained using simple linear or multiple linear regression algorithm which represents the output variable as the summation of weighted input features.

Ajitesh has been recently working in the area of AI and machine learning. Currently, his research area includes Safe & Quality AI. In addition, he is also passionate about various different technologies including programming languages such as Java/JEE, Javascript and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data etc.