Because my dependent variables in a multiple linear regression model are in different units, some coefficients are > 10,000 and others less than 0. I need to standardize them by using the scale function on R, but because some of my data is negative, it's impossible to take log of some of the variables which I need to do before standardizing. Is there any other step I can take? Can I standardize without log transformation first?

This question came from our site for professional and enthusiast programmers.

$\begingroup$I suggest visually inspecting scatterplots of each independent variable versus the dependent variable to see if there is any obvious transformation such as exp or log that would help in the regression. This is usually fast and easy to do.$\endgroup$
– James PhillipsFeb 1 at 12:26

$\begingroup$@AdamWheeler thank you!! I just have a quick question—If I were to transform one dependent variable with negative values in this way, would I do the exact same transformation to the other dependent variables and independent variable?$\endgroup$
– user10831611Feb 1 at 13:12

$\begingroup$I'm not clear on the motivation for a log transformation. I also wonder if you mean "standardize them" or simply "rescale them". Rescaling and standardizing are two different operations. But unless I'm failing to miss the point, I think the answer is yes. You can definitely standardize variables without ever log transforming them.$\endgroup$
– Brent HuttoFeb 1 at 14:04

$\begingroup$@BrentHutto I'm not sure if I understand the difference between the two. Is rescaling the same as the scale function in R? If so, then my understanding is that log transformation is necessary beforehand (i.e. #6 on stats.stackexchange.com/questions/156791/…)$\endgroup$
– user10831611Feb 1 at 14:30

2 Answers
2

Sometimes people take the natural log (or better yet, the base-two-log!) of a dependent variable because they feel the distribution of the log-transformed variable has better properties than the distribution of the untransformed variable, relative to whatever model they are estimating. One drawback is your regression parameter can not be interpreted in terms of the original variable, only in terms of the log of the DV. And don't be tricked into "back transforming" regression parameters with an inverse log (exponential) function, that is NOT valid.

The explanation you linked to is saying IF you are going to log-transform a variable then you ought to do that before standardization. It is not saying you should always log-transform, in my opinion sometimes people getting into the habit of log-transforming when it isn't even strictly necessary.

Whether on a plain old dependent variable or a log-transformed dependent variable, the R function scale() will let you alter the centering or the scale of a variable. And if you use some estimate of the variables mean and variability to the rescaling then you're said to have standardized the variable (i.e. made it into a de facto Z-score).

Sometimes you only center a variable. Subtract the variable's overall mean from each value and get a centered version that has mean zero. If you center only, the standard deviation will be unaffected. This is not "standardized".

Other times you may or may not center a variable but you don't like the units it is expressed in for some reason. Maybe your DV values are like 15,000g or 12,800g because that's expressed in grams and you'd rather work in kilograms. So you use the scale() function to divide each value by 1,000 and give you numbers like 15.0kg or 12.8kg. Again, this is not standardization. It is just rescaling.

So you can mix and match centering (or not) rescaling (or not) and you can do it with or without converting to a standardized scale. And all of those choices you can do with or without a log-transform. It all depends on what you're trying to accomplish and why.

$\begingroup$P.S. I myself have not used the scale() function. It is a general-purpose utility that can do all sorts of interesting things, not just the very simple operations I've described. For instance I center variables all the time but do it by simply using "cDV=DV-mean(DV)" to create a centered version very simply and quickly. So I may have misrepresented something about the scale() function...but I hope not!$\endgroup$
– Brent HuttoFeb 1 at 15:01

... because some of my data is negative, it's impossible to take log of some of the variables which I need to do before standardizing.

Your question is similar to an other question here, so I'll give the same advice. Before you worry about the mechanics of using logarithms, you need to step back and think about the coherence of the underlying measurement you are asking for. The logarithmic transformation takes positive rate values and puts them on a scale where multiplicative changes become additive changes. It can be extended to deal with negative numbers, but the logarithm of a negative number is a complex value (i.e., with an "imaginary" part). It is extremely unlikely that you will need to take a logarithmic transformation of negative data values (and therefore move your data into complex analysis) in any standard statistical practice.

I suggest you start by looking at this dependent variable and asking what the negative values actually represent, and how they were formed. You will want to ask two questions: (1) why do I have negative values; and (2) why do the values have such heavy positive skew. If the values were formed by subtracting a constant $y_* >0$ from some underlying positive set of values that could reasonably be presented on a log-scale, then a natural transformation might be to take $\tilde{y}_i = \ln (y_i + y_*)$. If something else happened then a different transformation might be appropriate.