#ELEGANT Majorization
The ELEGANT algorithm uses *augmentation*. Suppose we want to minimize a function $f:X\rightarrow\mathbb{R}$, and the problem is too difficult or cumbersome to be attacked directly. It helps if we can find an *augmented function* $g:X\otimes Y\rightarrow\mathbb{R}$ such that $f(x)=\min_{y\in Y}g(x,y)$ for all $x\in X$ and the subproblems of minimizing $g$ over $x\in X$ for fixed $y\in Y$ and minimizing over $y\in Y$ for fixed $x\in X$ are both easy. We can then alternate the two subproblems to update iterate $(x^{(k)},y^{(k)})$ by
\begin{align*}
y^{(k+1)}&=\mathop{\mathbf{argmin}}_{y\in Y} g(x^{(k)}, y),\\
x^{(k+1)}&=\mathop{\mathbf{argmin}}_{x\in X} g(x, y^{(k+1)}),
\end{align*}
and under weak conditions accumulation points of the sequence $x^{(k)}$ will be stationary points of $f$ (see @deleeuw_C_94c).
In elegant the *augmentation variables* $y$ are the off-diagonal elements of the four-dimensional array $\mathcal{E}=\{\epsilon_{ijk\ell}\}$.
*Majorization algorithms* (nowadays often called the *MM algorithms*, see @lange_16) are augmentation algorithms that use a *majorization function*.
Again $f:X\rightarrow\mathbb{R}$, but now $g:X\otimes X\rightarrow\mathbb{R}$ satisfies $g(x,y)\geq f(x)$ for all $x,y\in X$ and $g(x,x)=f(x)$ for all
$x\in X$. This implies $f(x)=\min_{y\in X}g(x,y)$ and $x^{(k)}=\mathop{\mathbf{argmin}}_{y\in Y} g(x^{(k)}, y)$. Thus the majorization algorithm is simply
$$
x^{(k+1)}=\mathop{\mathbf{argmin}}_{x\in X} g(x, x^{(k)}).
$$
One particular form of majorization we will use in this paper is the quadratic approximation method of @vosz_eckhardt_80 or @bohning_lindsay_88, also known as the quadratic upper bound method (@lange_16, section 4.6). If $f$ is twice differentiable and $\mathcal{D}^2f(x)\lesssim B$ with $B$ positive definite for all $x\in X$ then
$$
g(x,y)=f(y)+(x-y)'\mathcal{D}f(y)+\frac12(x-y)'B(x-y)
$$
is a majorization function. The majorization algorithm is
$$
x^{(k+1)}=x^{(k)}-B^{-1}\mathcal{D}f(x^{(k)}).
$$
In order to get a quadratic upper bound for sstress we prove a useful inequality.
**`r lemma_nums("inequality", display = "f")`**
$$
\sum_{i=1}^n\sum_{j=1}^n w_{ij}(A_{ij}\otimes A_{ij})\lesssim
\left(\sum_{i=1}^n\sum_{j=1}^n w_{ij}^\frac12 A_{ij}\right)\otimes
\left(\sum_{i=1}^n\sum_{j=1}^n w_{ij}^\frac12 A_{ij}\right)=V\otimes V.
$$
**Proof:** Suppose $X$ and $Y$ are an arbitrary $n\times p$ matrices, and define $C=XY'$ and $c=\mathbf{vec}(C)$. Also let
$e_{ijk\ell}(X,Y)=(x_i-x_j)'(y_k-y_\ell)$ and $e_{ij}(X,Y)=(x_i-x_j)'(y_i-y_j)$. Then
\begin{equation}
\sum_{i=1}^n\sum_{j=1}^n w_{ij}e_{ij}^2(X,Y)=\sum_{i=1}^n\sum_{j=1}^n w_{ij}\ \mathbf{tr}\ A_{ij}C'A_{ij}C=\sum_{i=1}^n\sum_{j=1}^n w_{ij}\ c'(A_{ij}\otimes A_{ij})c,
\end{equation}
and
\begin{equation}
\sum_{i=1}^n\sum_{j=1}^n\sum_{k=1}^n\sum_{\ell=1}^n w_{ijkl}e_{ijkl}^2(X,Y)=\sum_{i=1}^n\sum_{j=1}^n\sum_{k=1}^n\sum_{\ell=1}^n w_{ij}^\frac12 w_{kl}^\frac12\mathbf{tr}\ A_{k\ell}C'A_{ij}C=\mathbf{tr}\ VC'VC=c'(V\otimes V)c.
\end{equation}
Thus
$$
c'\left\{\sum_{i=1}^n\sum_{j=1}^n w_{ij}\ (A_{ij}\otimes A_{ij})\right\}c\leq c'(V\otimes V)c
$$
for all $C$ of the form $XY'$, which means for all $C$.
**QED**
```{r check_inequality, echo = FALSE, eval = FALSE}
checkInequality