Regression belongs to the basic toolbox of statistics, finance, machine learning, biology and many other disciplines that involves constructing approximate models of data. In this notebook we show how to quickly construct some and solve some standard regression models using the Fusion interface for Python.

A general linear regression model approximates the relation between a vector of $d$ independent variables (features) $x=(x_1,\ldots,x_d)$ and a dependent variable $y$. (To allow for intercept we are often going to assume $x_1=1$). Suppose we have $n$ input vectors $x$ arranged in the rows of a matrix $\mathbf{X}\in\mathbb{R}^{n\times d}$ and $n$ corresponding observations $y$ arranged in a vector $\mathbf{y}\in\mathbb{R}^n$. The goal is to find a vector of weights $\mathbf{w}\in\mathbb{R}^{d}$ so that in some approximate sense we can write

$$\mathbf{y}\approx \mathbf{X}\mathbf{w}.$$

We call $\mathbf{r}=\mathbf{y}-\mathbf{X}\mathbf{w}$ the residual and we aim to minimize some penalty function $\phi(\mathbf{r})$.

# Least squares regresiondeflse(X,y):n,d=len(X),len(X[0])M=Model("LSE")# The regression coefficientsw=M.variable("w",d)# The bound on the norm of the residualt=M.variable("t")r=Expr.sub(y,Expr.mul(X,w))# t \geq |r|_2M.constraint(Expr.vstack(t,r),Domain.inQCone())M.objective(ObjectiveSense.Minimize,t)returnM

Note that the unconstrained model leaves open numerous possibilities. It wold be just as easy to phrase the problem without intercepts, add additional linear or conic constraints on $\mathbf{w}$, compute various statistics of the solution and so on.

Regularisation is a technique which helps stabilize computational results, which tend to be sensitive to small perturbations when the matrix $\mathbf{X}$ has nearly dependent columns. In machine learning regularisation is a major technique that helps avoid overfitting. A general regularized least squares regression problem is usually written in the form

where $T$ is a linear operator, for example $T\mathbf{w}=\mathbf{w}$ or $T\mathbf{w}=\mathbf{w}-\mathbf{w}_0$, $\phi'$ is an additional penalty function and $\lambda$ controls the tradeoff between regularisation and fit. The two main regularisation terms occurring in practice are:

quadratic, also known as ridge regression, leading to the problem $$\mathrm{min}_{\mathbf{w}\in\mathbb{R}^d} \|\mathbf{y}-\mathbf{X}\mathbf{w}\|_2^2 + \lambda\|T\mathbf{w}\|_2^2$$

linear, also known as lasso (least absolute shrinkage and selection operator), leading to the problem $$\mathrm{min}_{\mathbf{w}\in\mathbb{R}^d} \|\mathbf{y}-\mathbf{X}\mathbf{w}\|_2^2 + \lambda\|T\mathbf{w}\|_1.$$ When $T=\mathrm{id}$ lasso regression tends to give preference to sparser solutions.

# Implement the lasso part of the constraints, and return the bounding variabledeflassoVar(M,w,d):p=M.variable("p",d)plasso=M.variable("plasso")# plasso >= sum(p_i)M.constraint(Expr.sub(plasso,Expr.sum(p)),Domain.greaterThan(0.0))# p_i = |w_i|M.constraint(Expr.add(p,w),Domain.greaterThan(0.0))M.constraint(Expr.sub(p,w),Domain.greaterThan(0.0))returnplasso# Implement the ridge part of the constraints, and return the bounding variabledefridgeVar(M,w):pridge=M.variable("pridge")M.constraint(Expr.vstack(0.5,pridge,w),Domain.inRotatedQCone())returnpridge# Regularized least-squares regressiondeflseReg(X,y,lambda1=0,lambda2=0):n,d=len(X),len(X[0])M=Model("LSE-REG")# The regression coefficientsw=M.variable("w",d)# The bound on the norm of the residualt=M.variable("t")r=Expr.sub(y,Expr.mul(X,w))# t \geq |r|_2^2M.constraint(Expr.vstack(0.5,t,r),Domain.inRotatedQCone())# The objective, add regularization terms as requiredobjExpr=t.asExpr()iflambda1!=0:objExpr=Expr.add(objExpr,Expr.mul(lambda1,lassoVar(M,w,d)))iflambda2!=0:objExpr=Expr.add(objExpr,Expr.mul(lambda2,ridgeVar(M,w)))M.objective(ObjectiveSense.Minimize,objExpr)returnM

We consider the same dataset we had previously, but this time we try to fit a polynomial of very high degree. This will lead to overfitting, which can then be reduced by varying the regularization parameters $\lambda$.

The penalty function $\phi(x)=x^2$ grows quite rapidly for large values of $x$, making the least-squares approximation overly sensitive to outliers. One solution to this issue in the realm of convex optimization is to replace it with the penalty defined as:
$$
\phi_{\mathrm{Huber}}(x) =
\begin{cases}
x^2 & |x|\leq M, \\
M(2|x|-M) & |x|\geq M,
\end{cases}
$$
for some value of $M$. This Huber loss function agrees with the quadratic loss for small values of $x$ (small residuals) and otherwise it is the slowest-growing function preserving that property while remaining convex. Thus it assignes much lower loss to distant outliers than pure quadratic regression, maing it more robust. The robust regression with Huber loss function is now the problem

# Robust regression with Huber loss functiondefhuber(X,y,MConst):n,d=len(X),len(X[0])M=Model("LSE-HUBER")# The regression coefficients and other variablesw=M.variable("w",d)u=M.variable(n,Domain.inRange(0.0,MConst))v=M.variable(n,Domain.greaterThan(0.0))# The residual and bounds on its absolute value (coordinatewise)r=Expr.sub(y,Expr.mul(X,w))ab=Expr.add(u,v)M.constraint(Expr.add(ab,r),Domain.greaterThan(0.0))M.constraint(Expr.sub(ab,r),Domain.greaterThan(0.0))# t >= |u|_2^2 and the objectivet=M.variable()M.constraint(Expr.vstack(0.5,t,u),Domain.inRotatedQCone())M.objective(ObjectiveSense.Minimize,Expr.add(t,Expr.mul(2*MConst,Expr.sum(v))))returnM