Linear Regression

Introduction:

This blog guides beginners to get kickstarted with the basics of linear regression concepts so that they can easily build their first linear regression model. The modeling aspect of linear regression is the focus of this blog.

Linear Regression is one of the most fundamental and widely used Machine Learning Algorithms. It’s usually among the first few topics which people pick while learning predictive modeling. Linear Regression models the relationship between a dependent variable (Y) and one or more independent variables (X) using a best fit straight line (also known as regression line). The dependent variable is continuous. The independent variable(s) can be continuous or discrete, and the nature of the relationship is linear.

Free Step-by-step Guide To Become A Data Scientist

Subscribe and get this detailed guide absolutely FREE

Linear relationships can either be positive or negative. A positive relationship between two variables basically means that an increase in the value of one variable also implies an increase in the value of the other variable. A negative relationship between two variables means that an increase in the value of one variable implies a decrease in the value of the other variable.

Mathematical Explanation:

A simple linear regression has one independent variable. Mathematically, the line representing a simple linear regression is expressed through a basic equation:

Y = mX + b + eHere: m is the slope X is the predictor variable b is the intercept/bias term Y is the predicted target variable e is the error term

Do some initial visual inspection between predictor and a target variable

# So let's plot some of the data # - this gives some core routines to experiment with different parametersplt.title('Relationship between dependent and target variable')plt.scatter(x_training_set, y_training_set, color='black')plt.show()

8. Accuracy report with test data:

Let’s visualize the goodness of the fit with the predictions being visualized by a line

# So let's run the model against the test datay_predicted = lm.predict(x_test_set)plt.title('Comparison of Y values in test and the Predicted values')plt.ylabel('Test Set')plt.xlabel('Predicted values')plt.plot(x_test_set, y_predicted, color='blue', linewidth=3)
plt.xticks(())plt.yticks(())plt.show()

9. Prediction:

Algorithm Advantages:

Extremely simple method

When relationships between the independent variables and the dependent variable are almost linear, shows optimal results.

Very easy and intuitive to use and understand

Even when it doesn’t fit the data exactly, we can use it to find the nature of the relationship between the two variables.

Algorithm Disadvantages:

Linear regression is limited to predicting the numeric output.

Very sensitive to the anomalies in the data (or outliers)

If we have a number of parameters than the number of samples available then the model starts to model the noise rather than the relationship between the variables.

Related

Abhay Kumar, lead Data Scientist – Computer Vision in a startup, is an experienced data scientist specializing in Deep Learning in Computer vision and has worked with a variety of programming languages like Python, Java, Pig, Hive, R, Shell, Javascript and with frameworks like Tensorflow, MXNet, Hadoop, Spark, MapReduce, Numpy, Scikit-learn, and pandas.