This provides with "best-fit" solution for $x$, if I were to re-compute $b' = Ax$ I will get $b \ne b'$

This "best-fit" notion is based on a root-mean-square deviation. ${1 \over n} \sqrt{\sum{(b_i-b_i')^2}}$

Question: if I wanted certain values within $b$ to contribute less than other values, I would want some "weight" associated with it. However, I do not want to simply evaluate a weighted RMSD value. I want to compute a "weighted" left inverse. -- How can this be done?

So we're picturing $A$ having many more rows than columns. Then $M = (A^T A)^{-1}A^T$ is a left inverse of $A$ in the sense that $MA=\text{a small identity matrix}$, where "small" means having only as many rows and columns as $A$ has columns. Instead of speaking of minimizing the root-mean-square deviation, why not keep it simple and speak of minimizing the sum of the squares of the deviations? Anyway, "weighted" least squares is done similarly with certain formulas involving the weights on the diagonal. More on this later, unless someone beats me to it.....
–
Michael HardySep 9 '11 at 19:26

There's also "generalized" least squares, where one uses some other inner product than the usual one and its matrix is not necessarily diagonal.
–
Michael HardySep 9 '11 at 19:27

If you have a non-standard inner product (e.g., a weighted inner product), you want to consider $A$ as a linear transformation on the space, but compute its coordinate matrix relative to an orthonormal basis with respect to that inner product; then use the pseudo-inverse of that matrix to obtain a best-fit solution (or a minimal solution if the system is consistent) in terms of the orthonormal basis. Finally, translate back to standard coordinates.
–
Arturo MagidinSep 9 '11 at 19:27

It's not very intuitive (I'm not a math person) but I'll give it a try. Is there a way to perform this weighted derivation with SVD too?
–
MikhailSep 9 '11 at 20:33

Basically this is the same as the unweighted version if you replace $A$ by $W^{1/2} A$ and $b$ by $W^{1/2} b$. So if you know how to do unweighted least squares using SVD, you can use that with this adjusted matrix and vector.
–
Robert IsraelSep 9 '11 at 21:37

Here is a reformulation of the previous answers and comments which I hope will be somewhat helpful to the OP.

A. The problem you are interested in is the following: given an inner product $\langle \cdot, \cdot \rangle$ find $x$ such that $$\langle b - Ax, b - Ax \rangle$$ is minimized.

When $\langle \cdot, \cdot \rangle$ is the ordinary inner product, this is the ordinary least squares solution. When $\langle x, y \rangle = x^T W y$ where $W$ is some positive diagonal matrix, this the weighted case you are interested in.

B. The solution will satisfy the following optimality criterion: the error must be orthogonal to the column space of $A$.

$$ b- Ax = x_{R(A)} + x_{R(A)^\perp}$$ where $x_{R(A)} \neq 0$ is the projection onto the range of $A$, and $x_{R(A)^\perp}$ is a projection onto its a complement, then we could pick a different $x$ to get a smaller error. Indeed, $$ \langle b - Ax, b-Ax \rangle = \langle x_{R(A)}, x_{R(A)} \rangle + \langle x_{R(A)^\perp}, x_{R(A)^\perp} \rangle $$ by the Pythagorean theorem. Now if $x_{R(A)} = Ay$, then $$ \langle b-A(x+y), b-A(x+y) \rangle = \langle x_{R(A)^\perp}, x_{R(A)^\perp} \rangle$$ which is smaller.

C. For the case of the ordinary inner product, the above optimality principle can be restated as $$ A^T (b-Ax^*) = 0$$ which immediately gives you your least-squares solution; and for the case of the weighted inner product, it can be restated as
$$ A^T W (b-Ax^*)=0$$ which immediately gives you the weighted solution.