The constraint given in the question, $X^t X w = X^t y$ is the result we derived from the original optimization problem: $argmin_a{\lVert{y - Xa}\rVert_2^2}$.
So taking the gradient again on the constraint doesn't make any sense. am I correct?
I think the constraint should be the original loss function (squared means)…

No, the constraint is correct.
They are saying that from the possible solutions to the original problem, they want one with minimal norm.
So you want to minimize the norm, where its still a legal solution to the original minimization.