Eigenvalues and eigenvectors of a matrix $\boldsymbol A$ tell us a lot about the matrix. On the other hand, if we know our matrix $\boldsymbol A$ is somehow special (say symmetric) it will tell us some information about how its eigenvalues and eigenvectors look like.
Let us begin with a definition. Given a matrix $\boldsymbol A$, the vector $x$ is an eigenvector of $\boldsymbol A$ and has a corresponding eigenvalue $\lambda$, if

Compared to the previous article where we simply used vector derivatives we’ll now try to derive the formula for least squares simply by the properties of linear transformations and the four fundamental subspaces of linear algebra. These are:
Kernel $Ker(A)$: The set of all solutions to $Ax = 0$. Sometimes we can say nullspace $N(A)$ instead of kernel. Image $Im(A)$: The set of all right sides $b$, for which there is a solution $Ax = b$.

This article is a draft and as such there might be typos and other inaccuracies.
In this article we’ll derive the matrix inversion lemma, also known as the Sherman-Morrisson-Woodbury formula. At first it might seem like a very boring piece of linear algebra, but it has a few nifty uses, as we’ll see in one of the followup articles.
Let’s start with the following block matrix:
$$ M = \begin{bmatrix} A & U \\\

There are multiple ways one can arrive at the least squares solution to linear regression. I’ve always seen the one using orthogonality, but there is another way which I’d say is even simpler, especially if you’ve done any calculus. Let’s define the problem first.
Given a matrix \(N \times M\) matrix \(X\) of inputs, and a vector \(y\) of length \(N\) containing the outputs, the goal is to find a weight vector \(w\) of length \(M\) such that: