1 Answer
1

Since the penalty multiplier $\lambda$ is finite, no, we're not requiring $b_j$ to be zero. However, if we keep increasing the number of parameters, then it pushes them closer to zero, since the sum $\sum_jb_j$ is penalized. The idea of the ridge regression is to limit the magnitude of the parameters.

If you were to use the penalty as $\lambda\sum_j(b_j-c)^2$ then the implications is that $Xb$ would be more likely to overfit. By forcing $b_j$ to be small we limit the ability of $Xb$ to overfit the data.

$\begingroup$I am confused as to why we don't write the penalty function as $\lambda(\sum_j(b_j)^2-c)$ since that is how Lagrange multipliers usually written. Also the solution must satisfy $\sum_j(b_j)^2=c$ because that is how we learned Lagrange multipliers.$\endgroup$
– gbdApr 24 at 3:10