This tutorial talks about interpretation of the most fundamental measure reported for models which is R Squared and Adjusted R Squared. We will try to give a clear guidelines for interpreting R Squared and Adjusted R Squared

Once we have fitted our model to data using Regression , we have to find out how well our model fits the data. R gives many goodness of fit statistic out of the box when we create a model. In this tutorial we will discuss about an important statistic called R-Squared ( R² ). We will also try to bust myths that Low R Squared values are always bad and High R Squared values are always good.

So R² = 67% implies that you have a regression equation which can explain 67% variation of observed values around mean.

Obviously when you add more predictor variables to regression equation which explain more variance you will get a higher R². Does it mean that when we compare 2 models on same data , the model with higher R² is always better than the model with lower R² ?

The answer is NO . Not always ! More predictor variables in a model implies more complexity which may have a side effect of Over fitting. So pure R² is not a very reliable measure. We need a measure which can tell us in absolute terms whether addition of new variable can explain variance worth of the additional Complexity.

Its for this reason that we use Adjusted R² .

What is Adjusted R² ?

Adjusted R² is a measure derived from R² which penalizes each addition of variable for additional complexity.

N = Sample Size

p = number of predictors

Please note that p is in denominator and increased p would b=mean a decreased Adj R² if R² does not increase enough and everything else remains constant.

Is Low R² always bad ?

NO. Desirable range of R² is highly domain dependent. Any model which attempts to predict Human behavior is seldom very precise and hence lower R² is expected. Where as for models in medicine and pharma R² values above 90% are very common.