The common case in data science or machine learning applications, different features or predictors manifest them in different scales. This could bring difficulty in interpreting the resulting coefficients of linear regression, such as one feature having very large or small values compare to other predictors and being in different units first of all. The common approach to overcome this to use z-score‘s for each features, centring and scaling.

This approach would allow us to interpret the effects. A possible question however is how could we map regression coefficients obtained with the scaled data back to original coefficients. In the context of ridge regression, this question is posed byMark Seeto in the R mailing list and provided a solution for two predictor case with an R code. In this post we formalize his approach. Note that, in the case of how to scale, Professor Gelman suggests diving them by two standard deviations. In this post we won’t cover that approach and use usual approach.

Algebraic Solution: No error term

An arbitrary linear regression for $n$ variable reads as follows$$y=(Sigma_{i=1}^n beta_{i} x_{i}) + beta_{0}$$here, $y$ is being response variable, $x_{i}$ are the predictors, $n=1,..,n$. Let’s use primes for the scaled regression equation for $n$ variable.$$y’=(Sigma_{i=1}^n beta_{i}’ x_{i}’) + beta_{0}’$$We would like to express $beta_{i}$ by only using $beta_{i}’$ and two statistic from the data, namely mean and standard deviations, $mu_{x_{i}}$, $mu_{y}$, $sigma_{x_{i}}$ and $sigma_{y}$.

The following transformation can be shown by using the z-scores and some algebra,

There are many packages and tools in R to perform ridge regression. One of the prominent one is glmnet. Following Mark Seeto’s example, here we extent that in to many variate case with a helper function scale.back.lm from R1magic package. Function provides a transform utility for $n$-variate case. Here we demo this using 6 predictors, also available as gist,

Multiple regression is used by many practitioners. In this post we have shown how to scale continuous predictors and transform back the regression coefficients to original scale. Scaled coefficients would help us to better interpret the results. The question of when to standardize the data is a different issue.