I'm not sure what your goal is, but if the weights must always be powers of ten, then a natural choice would be to use $10^{-n}$ for the column whose average is between $10^n$ and $10^{n+1}$. That is, $n = floor( log(a) )$, where $log$ is logarithm in base $10$, and $a$ is the average value of that column
–
Shaun AultOct 19 '11 at 2:36

Thank you! Actually, this is for a program I'm writing for practice. :) I tried to make my own ranking algorithm, but while I could figure out the concept on paper, I didnt know how to programmatically figure out to find the appropriate weights.
–
kurisukunOct 19 '11 at 2:50

2 Answers
2

If the weights must always be powers of ten, then a natural choice would be to use $10^{−n}$ for the column whose average is between $10^n$ and $10^{n+1}$. That is, $n=\lfloor \log_{10} a \rfloor$, where $a$ is the average value of the column.

You could subtract the average of each column from each data point. Now each column has average $0$. If you want each column to have the same standard deviation ($\sigma$) you can take the standard deviation of each column and divide by that. What is special about powers of $10$? That is an artifact of our notation.

My goal is to make each row of equal "importance", therefore I wanted to bring each column down to the same power of ten. I'm not sure if this actually makes sense, but this is how I had approached it...would a standard deviation make more sense?
–
kurisukunOct 19 '11 at 3:54

My point is that 10 or powers of 10 isn't special. It is an artifact that we work in base 10. If you love base 10, Shaun Ault's answer works fine. Then if you have a column between 1 and 2 and another between 1 and 10, the second will have more weight. Now it is a question of what you want, not of mathematics.
–
Ross MillikanOct 19 '11 at 4:05

I see, thank you for the explanation! I suppose I am mistaken in my logic then. I would like to have equal weighting to each column...
–
kurisukunOct 19 '11 at 4:47

@kurisukun: Maybe you want standard scores, as in en.wikipedia.org/wiki/Standard_score Each datapoint is scored by how many standard deviations it is from the average, so this rescales the average and the standard deviation. The bad news is that it overweights points far from the mean if the tails are larger than a Gaussian (which is the normal case for real world data).
–
Ross MillikanOct 19 '11 at 4:51