Caution: If you’re expecting details of either of multiple regression or backward elimination then you may be disappointed.

This is purely focused on implementation of backward elimination.

Still, I would prefer to give a short overview of backward elimination.

It is one of the Dimension Reduction Techniques.

There is common thought coming out from experts of ML is to keep your model as simple as possible.

Don’t fall for building complex models with all available data and I do agree with that.

Including all available data may turn into garbage in garbage out.

You need to care about the data you are providing to build the model.

It is necessary to identify redundant data and drop it.

There are a few techniques which help to achieve this, mentioned as Dimension reduction techniques.

Backward elimination one of that where we eliminate nonsignificant features one by one by analyzing the impact of the feature on target variable.

Let’s dive into detail process of implementation :I am using Boston Housing Dataset available at https://medium.

com/r/?url=https%3A%2F%2Fwww.

cs.

toronto.

edu%2F~delve%2Fdata%2Fboston%2FbostonDetail.

htmlThe dataset has 506 cases.

The dataset contains 13 features which may impact on housing prices.

The details of features are as followsDetails of Features/Attributes of Datasetfrom 1 to 13 are the predictors and 14th feature-MEDV is the target variable housing prices.

Data ProcessingStarting with processing the dataset and preparing it for model building.

There are no missing values in the dataset.

Therefore I am not dropping any rows or columns.

The dataset is of small size and does not require data normalization and standardization.

No categorical variable except ‘CHAS’ and that also encoded already.

It looks like the dataset does not need much processing .

Let’s visualize some features to get good insights of data.

The histogram of target Variable — housing prices looks as belowHistogram of Housing PricesThe Conclusion drawn at looking this histogram is that there is fair distribution of values except at the end , It seems that there is abnormal behavior at 50(POinted by blue mark).

it looks like housing prices are censored at 50.

To avoid biasing in model I choose to drop the observations which has prices equals to 50,000 $.

I have analyzed the behavior of each feature with pricing and come to the conclusion of using multiple regression model.

Almost every attribute shows weak or strong correlation with target variable.

I am not putting all scatter plot here as the details of each features and analysis of its relation with target variable will lenthen the article and I consider it as slightly out of topic here.

Lets get view of correlation with heatmapHeatmap -CorrelationThe conclusions of heatmap :“CHAS” which is dummy variable has 1 if track bounds charles river -has lowest corelation with housing prices.

“LSTAT”- % of lower status population has the highest correaltion.

The negative coefficient indicate that the houisng prices lowers in the area with higher percenatge of lower status population.

There are few features -(RAD and TAX) and (DIS and NOX) and (INDUS and NOX) has high correlation(referred as Multicolinearity)There is another technique used for dimension reduction in which we can choose to use highly correlated(with target variable) features for model building and drop features with low correation.

Also select one out of two , if they are highly correlated features (Multicollinearity).

But here we will not use this results for dimension reduction but to analyse that the Backward feature selection gives similar kind of results .

I hope we have got enough idea about dataset and I would go further with buiding model and implementing backward selection.

Let’s fit the model and calculate performance to compare the performance of the model before and after.

starting with the process of Backward Feature selection to identified nonsignificant features.

The steps followed for Backward Feature Selection is as below and it is cyclic process until we get desired resultsProcess Flow For Backward Feature EliminationBefore going further I will briefly give idea about P value.

P value is one of thee parformance measure.

P value signifies the impact of feature on the target variable.

The high value of P parameter indicates that the feature is nonsignificant and has little or less impact on the target variable and low value of P indicates that the feature is significant and have large impact on target variable.

The satndard value of P is 0.

05 considered as threshold for deciding the consideration of feature in model building.

With P value , Adj R can be used as a performance parameter in backward elimination.

If there is an improvement in Adj R due to the elimination of feature, Confirms that the feature is nonsignificant.

The Stepwise Backward elimination -Model FittingUse all features to build the model in the beginning.

I am using al 13 features and storing in X_Opt array.

This will be optimal array and in the end, we desired to get an optimal array of features stored in X_optimport statsmodels.