In spite of being different, they have the commonality that they can both be imagined to be essential parts of the overall linear regression process — we expect a linear regression to fit some training data, and then be able to predict for future unseen data.

We also expect the linear regression model to provide us some indication about how good the fit was — generally in the form of a single numeric quantity or score called coefficient of regression or R².

As expected, we see a function score, which returns exactly that R² number, also hanging around fitand predict.

Neat and clean, isn’t it?Data, functions, and parameters are cohabitating inside a single logical unit.

How was it made possible?It was possible because we rose above the individual differences and thought about the linear regression as a high-level process and decided what essential actions it should serve and what critical parameters it should inform its users about.

We made a high-level class called LinearRegression under which all those apparently disparate functions can be grouped together for easy book-keeping and enhanced usability.

Once we imported this class from the library, we just had to create an instance of the class — we called it lm.

That’s it.

All the functions, grouped under the class, became accessible to us through that newly defined instance lm.

If we are not satisfied with some of the internal implementation of the functions, we can work on them and re-attach them to the main class after modification.

Only the code of the internal function changes, nothing else.

See, how logical and scalable it sounds?Create your own ML estimatorTraditional introduction to OOP will have plenty of examples using classes such as — animals, sports, geometric shapes.

But for data scientists, why not illustrate the concepts using the example of an object they use every day in their code — a machine learning estimator.

Just like the lm object from the Scikit-learn library, shown in the picture above.

A good, old Linear Regression estimator — with a twistIn this Github repo, I have shown, step-by-step, how to build a simple linear regression (single or multivariate) estimator class following the OOP paradigm.

Yes, it is the good old linear regression class.

It has the usual fit and predict methods as in the LinearRegression class from Scikit-learn.

But it has more functionalities.

Here is a sneak peek…Yes, this estimator is richer than the Scikit-learn estimator in the sense that it has, in addition to standard fit, predict, and R² score functions, a host of other utilities which are essential for a linear regression modeling task.

Especially, for data scientists and statistical modeling folks — who not only want to predict but also would like tomeasure the Goodness of fit,verify the assumptions of the linear regression,check for multicollinearity in the data, ordetect outliers.

How do you check the quality of your regression model in Python?Linear regression is rooted strongly in statistical learning and therefore the model must be checked for the ‘goodness…towardsdatascience.

comHow do you start building the class?We start with a simple code snippet to define the class.

We name it — MyLinearRegression.

Here, self denotes the object itself and __init__ is a special function which is invoked when an instance of the class is created somewhere in the code.

As the name suggests, __init__ can be used to initialize the class with necessary parameters (if any).

We can add a simple description string to keep it honest :-)We add the core fit method next.

Note the docstring describing the purpose of the method, what it does and what type of data it expects.

All of these are part of good OOP principles.

We can generate some random data to test our code so far.

We create a linear function of two variables.

Here are the scatter plots of the data.

Now, we can create an instance of the class MyLinearRegression called mlr.

What happens if we try to print the regression parameters?Because the self.

coef_ was set to None, we get the same while trying to print mlr.

coef_.

Note, how the self became synonymous to the instance of the class — mlr once it is created.

But the definition of fit includes setting the attributes once the fitting is done.

Therefore, we can just call mlr.

fit() and print out the fitted regression parameters.

The quintessential Predict methodAfter fitting, comes prediction.

We can add that method easily to our regression class.

What if we want to add a (or a few) plotting utility function?At this point, we start expanding our regression class and add stuff which is not even present in the standard scikit-learn class!.For example, we always want to see how the fitted values compare to the ground truth.

It is easy to create a function for that.

We will call it plot_fitted.

Note that a method is like a normal function.

It can take additional arguments.

Here, we have an argumentreference_line (default set to False) which draws a 45-degree reference line on the fitted vs true plot.

Also, note the docstring description.

We can test the methodplot_fitted by simply doing the following,m = MyLinearRegression()m.

fit(X,y)m.

plot_fitted()Or, we can opt to draw the reference line,m.

plot_fitted(reference_line=True)We get the following plots!Once we understood that we can add any useful methods to work on the same data (a training set), related to the same purpose (linear regression), there is no bound to our imagination!.How about we add the following plots to our class?Pairplots (plots pairwise relation between all features and outputs, much like the pairs function in R)Fitted vs.

to check the validity of the fundamental assumptions)Histogram and the quantile-quantile (Q-Q) plot of the residuals (this checks for the assumption of Normality of the error distribution)Inheritance — don’t overburden your main classAs we enthusiastically plan utility methods to add to the class, we recognize that this approach may make the code of the main class very long and difficult to debug.

To solve the conundrum, we can make use of another beautiful principle of OOP — inheritance.

Inheritance in Python – GeeksforGeeksInheritance is the capability of one class to derive or inherit the properties from some another class.

The benefits of…www.

geeksforgeeks.

orgWe further recognize that all plots are not of the same type.

Pairplots and fitted vs.

true data plots are of similar nature as they can be derived from the data only.

Other plots are related to the goodness-of-fit and residuals.

Therefore, we can create two separate classes with those plotting functions — Data_plots and Diagnostic_plots.

And guess what!.We can define our main MyLinearRegression class in terms of these utility classes.

Add syntactic sugar by creating grouped utilitiesOnce you have inherited other classes, they behave just like the usual Python module you are familiar with.

So, you can add utility methods to the main class to execute multiple methods from a sub-class together.

For example, the following method runs all the usual diagnostics checks at once.

Note how we are accessing the plot methods by putting a simple .

DOT i.

e.

Diagnostics_plot.

histogram_resid.

Just like accessing a function from Pandas or NumPy library!run_diagnostics method in the main classWith this, we can run all the diagnostics with a single line of code after fitting data.

m = MyLinearRegression() # A brand new model instancem.

fit(X,y) # Fit the model with some datam.

run_diagnostics()Similarly, you can add all the outlier plots in a single utility method.

Modularization — import the class as a moduleAlthough not a canonical OOP principle, the essential advantage of following the OOP paradigm is to be able to modularize your code.

You can experiment and develop all this code in a standard Jupyter notebook.

But for maximum modularity, consider converting the Notebook into a standalone executable Python script (with a .

py extension).

As a good practice, remove all the unnecessary comments and test code from this file and keep only the classes together.

Here is the link to the script I put together for this article.

Once you do that, you can import the MyLinearRgression class from a completely different Notebook.

This is often the preferred way of testing your code as this does not touch the core model but only tests it with various data samples and functional parameters.

At this point, you can consider putting this Python script on a Github, creating a Setup.

py file, creating the proper directory structure, and releasing it as a standalone linear regression package which does fitting, prediction, plotting, diagnostics, and more.

Of course, you have to add a lot of docstring description, examples of usage of a function, assertion checks, and unit tests to make it a good package.

But as a data scientist, now you have added a significant skill to your repertoire – software development following OOP principles.

It was not so difficult, was it?EpilogueTo write this post, I was inspired by this fantastic article, which drills down to the concept of OOP in Python in more detail with a context of machine learning.

I wrote a similar article, touching even more basic approaches, in the context of deep learning.

Check it out here,How a simple mix of object-oriented programming can sharpen your deep learning prototypeBy mixing simple concepts of object-oriented programming, like functionalization and class inheritance, you can add…towardsdatascience.

comIf you have any questions or ideas to share, please contact the author at tirthajyoti[AT]gmail.

com.

Also, you can check the author’s GitHub repositories for other fun code snippets in Python, R, or MATLAB and machine learning resources.

If you are, like me, passionate about machine learning/data science, please feel free to add me on LinkedIn or follow me on Twitter.