Our aim is to define a linear model that explain the relation of a set of $m$ given variables $x_1,\ldots,x_m$ (the inputs) and a dependent variable $y$ (the output). To this extent we assume that there exists a set of weights $w\in \mathbb{R}^{m}$ such that ideally

$$
y = x^T w,
$$

where $x=(x_1,\ldots,x_m)$.

In practice, we are given $n$ observations of the relation that links $x$ and $y$. We store them row-wise in a matrix $X\in \mathbb{R}^{n\times m}$. Corresponding outputs $y=(y_i,\ldots,y_n)$ are known as well. If the relation we want to describe is truly linear we can hope to solve the linear system

$$
X w =y.
$$

But usually this is not the case because

in the real-world perfect linear model are fairly rare;

$y_i$'s may be corrupted or noisy;

we may want to include conditions on $w$, i.e. asking that $w\in W$ with $W$ some well define convex set.

Since our main interest is on conic optimization, we assume $W$ to be a conic representable set.
For sake of simplicity we consider $W=\{w\in \mathbb{R}^m \}$.

What we can do is to reduce the error that the linear model introduces. We define

$$
r = Xw - y,
$$

as the vector of residual. Our aim is thus to reduce as much as possible some error function $\phi(\cdot)$ on $r$, i.e.

10 loops, best of 3: 76.9 ms per loop
10 loops, best of 3: 29.2 ms per loop
10 loops, best of 3: 33.3 ms per loop
10 loops, best of 3: 124 ms per loop
10 loops, best of 3: 239 ms per loop
10 loops, best of 3: 78.4 ms per loop

As expected, the running time is roughly the same for all formulations.