\documentclass[12pt,]{article}
\usepackage{lmodern}
\usepackage{amssymb,amsmath}
\usepackage{ifxetex,ifluatex}
\usepackage{fixltx2e} % provides \textsubscript
\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\else % if luatex or xelatex
\ifxetex
\usepackage{mathspec}
\else
\usepackage{fontspec}
\fi
\defaultfontfeatures{Ligatures=TeX,Scale=MatchLowercase}
\fi
% use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
% use microtype if available
\IfFileExists{microtype.sty}{%
\usepackage{microtype}
\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
}{}
\usepackage[margin=1in]{geometry}
\usepackage{hyperref}
\hypersetup{unicode=true,
pdftitle={An Alternating Least Squares Approach to Squared Distance Scaling},
pdfauthor={Jan de Leeuw, Patrick Groenen, and Raoul Pietersz},
pdfborder={0 0 0},
breaklinks=true}
\urlstyle{same} % don't use monospace font for urls
\usepackage{color}
\usepackage{fancyvrb}
\newcommand{\VerbBar}{|}
\newcommand{\VERB}{\Verb[commandchars=\\\{\}]}
\DefineVerbatimEnvironment{Highlighting}{Verbatim}{commandchars=\\\{\}}
% Add ',fontsize=\small' for more characters per line
\usepackage{framed}
\definecolor{shadecolor}{RGB}{248,248,248}
\newenvironment{Shaded}{\begin{snugshade}}{\end{snugshade}}
\newcommand{\KeywordTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{\textbf{{#1}}}}
\newcommand{\DataTypeTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{{#1}}}
\newcommand{\DecValTok}[1]{\textcolor[rgb]{0.00,0.00,0.81}{{#1}}}
\newcommand{\BaseNTok}[1]{\textcolor[rgb]{0.00,0.00,0.81}{{#1}}}
\newcommand{\FloatTok}[1]{\textcolor[rgb]{0.00,0.00,0.81}{{#1}}}
\newcommand{\ConstantTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{{#1}}}
\newcommand{\CharTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{{#1}}}
\newcommand{\SpecialCharTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{{#1}}}
\newcommand{\StringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{{#1}}}
\newcommand{\VerbatimStringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{{#1}}}
\newcommand{\SpecialStringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{{#1}}}
\newcommand{\ImportTok}[1]{{#1}}
\newcommand{\CommentTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textit{{#1}}}}
\newcommand{\DocumentationTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{{#1}}}}}
\newcommand{\AnnotationTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{{#1}}}}}
\newcommand{\CommentVarTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{{#1}}}}}
\newcommand{\OtherTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{{#1}}}
\newcommand{\FunctionTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{{#1}}}
\newcommand{\VariableTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{{#1}}}
\newcommand{\ControlFlowTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{\textbf{{#1}}}}
\newcommand{\OperatorTok}[1]{\textcolor[rgb]{0.81,0.36,0.00}{\textbf{{#1}}}}
\newcommand{\BuiltInTok}[1]{{#1}}
\newcommand{\ExtensionTok}[1]{{#1}}
\newcommand{\PreprocessorTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textit{{#1}}}}
\newcommand{\AttributeTok}[1]{\textcolor[rgb]{0.77,0.63,0.00}{{#1}}}
\newcommand{\RegionMarkerTok}[1]{{#1}}
\newcommand{\InformationTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{{#1}}}}}
\newcommand{\WarningTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{{#1}}}}}
\newcommand{\AlertTok}[1]{\textcolor[rgb]{0.94,0.16,0.16}{{#1}}}
\newcommand{\ErrorTok}[1]{\textcolor[rgb]{0.64,0.00,0.00}{\textbf{{#1}}}}
\newcommand{\NormalTok}[1]{{#1}}
\usepackage{graphicx,grffile}
\makeatletter
\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi}
\makeatother
% Scale images if necessary, so that they will not overflow the page
% margins by default, and it is still possible to overwrite the defaults
% using explicit options in \includegraphics[width, height, ...]{}
\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
\IfFileExists{parskip.sty}{%
\usepackage{parskip}
}{% else
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}
}
\setlength{\emergencystretch}{3em} % prevent overfull lines
\providecommand{\tightlist}{%
\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
\setcounter{secnumdepth}{5}
% Redefines (sub)paragraphs to behave more like sections
\ifx\paragraph\undefined\else
\let\oldparagraph\paragraph
\renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}}
\fi
\ifx\subparagraph\undefined\else
\let\oldsubparagraph\subparagraph
\renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
\fi
%%% Use protect on footnotes to avoid problems with footnotes in titles
\let\rmarkdownfootnote\footnote%
\def\footnote{\protect\rmarkdownfootnote}
%%% Change title format to be more compact
\usepackage{titling}
% Create subtitle command for use in maketitle
\newcommand{\subtitle}[1]{
\posttitle{
\begin{center}\large#1\end{center}
}
}
\setlength{\droptitle}{-2em}
\title{An Alternating Least Squares Approach to Squared Distance Scaling}
\pretitle{\vspace{\droptitle}\centering\huge}
\posttitle{\par}
\author{Jan de Leeuw, Patrick Groenen, and Raoul Pietersz}
\preauthor{\centering\large\emph}
\postauthor{\par}
\predate{\centering\large\emph}
\postdate{\par}
\date{Version 011, November 10, 2016}
\begin{document}
\maketitle
\begin{abstract}
We reproduce the 1975 derivation of the alternating least squares
algorithm for squared distance scaling, from an internal report that got
lost in the folds of time. In addition, we present a derivation and a
substantial speed improvement based on majorization.
\end{abstract}
{
\setcounter{tocdepth}{3}
\tableofcontents
}
Note: This is a working paper which will be expanded/updated frequently.
All suggestions for improvement are welcome. The directory
\href{http://gifi.stat.ucla.edu/lost}{gifi.stat.ucla.edu/lost} has a pdf
version, the complete Rmd file with all code chunks, the bib file, and
the R source code.
\section{Introduction}\label{introduction}
Volume 73 of the \emph{Journal of Statistical Software} is a festschrift
for Jan de Leeuw (me). It contains a very nice article by Yoshio Takane,
discussing our early interactions and some of my ``lost papers'' (Takane
(2016)). One of these lost papers is De Leeuw (1975), which discusses
the ELEGANT algorithm, an augmentation method for squared distance
scaling. The algorithm has been discussed in the past by Takane (1977),
Takane (1980), Browne (1987), De Leeuw, Groenen, and Pietersz (2004),
and now again by Takane (2016), but with different derivations than
those in the original paper. In this paper we reconstruct the original
derivation, slighty generalized to include weights. Note that De Leeuw,
Groenen, and Pietersz (2004) also use weights, and also describe a close
approximation of the original algorithm.
Both Takane (1977) and Browne (1987) find the ELEGANT algorithm to be
prohibitively slow to converge. After discussing the original algorithm
and analyzing its convergence rate we use quadratic majorization to
improve the convergence speed. The fact that the original ELEGANT
algorithm can also be derived by using quadratic majorization was first
observed by De Leeuw, Groenen, and Pietersz (2004), using a different
derivation.
\section{The ELEGANT Algorithm}\label{the-elegant-algorithm}
The paper De Leeuw (1975) deals with the problem of minimizing the
\emph{sstress} loss function, defined on the space of \(n\times p\)
\emph{configurations} as
\begin{equation}\label{E:sstress}
\sigma(X):=\frac12\sum_{i=1}^n\sum_{j=1}^n w_{ij}(\epsilon_{ij}-e_{ij}(X))^2.
\end{equation}
Here the \(w_{ij}\) are given non-negative \emph{weights} and the
\(\epsilon_{ij}\) are given non-negative \emph{dissimilarities}. Both
weights and dissimilarties are \emph{symmetric} and \emph{hollow},
i.e.~have a zero diagonal. The \(e_{ij}(X)\) are \emph{squared Euclidean
distances}, defined by \[
e_{ij}(X):=(x_i-x_j)'(x_i-x_j)=(e_i-e_j)'XX'(e_i-e_j)=\mathbf{tr}\ X'A_{ij}X,
\] where \(A_{ij}:=(e_i-e_j)(e_i-e_j)'\) and the \(e_i\) are unit
vectors with a \(1\) in position \(i\) and \(0\) elsewhere. I am pretty
sure the original formulation of ELEGANT did not include weights, but we
include them here for the reasons simlar to those reviewed by Groenen
and Van de Velden (2016), section 6. The alternative derivations by
Takane (1977) and Browne (1987) did not consider weights either, but De
Leeuw, Groenen, and Pietersz (2004) did include weights.
At that time we were trying to generalize the \emph{alternating least
squares} algorithms used for fitting non-metric linear and bilinear
models to multidimensional scaling, i.e.~to the least squares fitting of
distances and squared distances. One way to attack this problem is the
approach chosen in ALSCAL (Takane, Young, and De Leeuw (1977)), where
the problem of fitting the optimal configuration is attacked by block
coordinate descent. The approach in De Leeuw (1975) is quite different,
because it does not partition the variables into blocks, but instead
uses \emph{augmentation} (De Leeuw (1994)). Note that augmentation was
applied to multidimensional scaling shortly before \emph{majorization}
became the preferred approach. In this context majorization methods are
another way to extend alternating least squares. They were first
announced in the proceedings of the 1975 \emph{US-Japan Seminar on
Multidimensional Scaling and Related Techniques}.
To explain the original formulation we introduce, for each quadruple
\((i,j,k,l)\), \[
e_{ijkl}:=(x_i-x_j)'(x_k-x_\ell)=(e_i-e_j)'XX'(e_k-e_\ell)=\mathbf{tr}\ XX'(e_i-e_j)(e_k-e_\ell)'.
\] Also define
\begin{equation}\label{E:disp}
\epsilon_{ijk\ell}:=\begin{cases}\epsilon_{ij}&\text{ if }(i,j)=(k,\ell),\\
\text{arbitrary}&\text{ otherwise},\end{cases}
\end{equation}
and \(w_{ijk\ell}:=\sqrt{w_{ij}w_{kl}}\).
We now define an augmented version of sstress, which is a function of
both the configuration \(X\) and the \emph{disparities}
\(\epsilon_{ijk\ell}\).
\begin{equation}
\sigma_\star(X,E):=\frac12\sum_{i=1}^n\sum_{j=1}^n\sum_{k=1}^n\sum_{\ell=1}^n w_{ijk\ell}(\epsilon_{ijk\ell}-e_{ijk\ell}(X))^2,
\end{equation}
The relation with sstress from \(\eqref{E:sstress}\) is \[
\sigma(X)=\min_{E\in\mathcal{E}}\sigma_\star(X,E),
\] where \(\mathcal{E}\) is the set of disparities satisfying
\(\eqref{E:disp}\). The ELEGANT algorithm in De Leeuw (1975) uses
alternating least squares to minimize \(\sigma_\star\). Thus we
alternate minimization over \(X\) keeping \(E\) fixed with minimization
over \(E\in\mathcal{E}\) keeping \(X\) fixed.
Minimizing \(\sigma_\star\) over \(E\in\mathcal{E}\) for fixed \(X\) is
trivial. We simply set
\begin{equation}\label{E:eps}
\epsilon_{ijk\ell}:=\begin{cases}\epsilon_{ij}&\text{ if }(i,j)=(k,\ell),\\
e_{ijk\ell}(X)&\text{ otherwise}.\end{cases}
\end{equation}
The more complicated and interesting problem is to minimize
\(\sigma_\star\) over \(X\) for fixed \(E\). Observe first \[
\sum_{i=1}^n\sum_{j=1}^n\sum_{k=1}^n\sum_{\ell=1}^n w_{ijk\ell}\epsilon_{ijk\ell}e_{ijk\ell}=
\sum_{i=1}^n\sum_{j=1}^n\sum_{k=1}^n\sum_{\ell=1}^n w_{ijk\ell} \epsilon_{ijk\ell}(e_i-e_j)'XX'(e_k-e_\ell)=
\mathbf{tr}\ X'B(X)X,
\] with
\begin{equation}\label{E:C}
B(X):=\sum_{i=1}^n\sum_{j=1}^n\sum_{k=1}^n\sum_{\ell=1}^nw_{ijk\ell}\epsilon_{ijk\ell}(e_i-e_j)(e_k-e_\ell)'.
\end{equation}
Next \[
\sum_{i=1}^n\sum_{j=1}^n\sum_{k=1}^n\sum_{\ell=1}^n w_{ijk\ell}e_{ijk\ell}^2=
\sum_{i=1}^n\sum_{j=1}^n\sum_{k=1}^n\sum_{\ell=1}^n w_{ij}^\frac12 w_{k\ell}^\frac12(e_i-e_j)'XX'(e_k-e_\ell)(e_k-e_\ell)'XX'(e_i-e_j)=\mathbf{tr}\ (X'VX)^2,
\] with
\begin{equation}\label{E:V}
V:=\sum_{i=1}^n\sum_{j=1}^n w_{ij}^\frac12 A_{ij}.
\end{equation}
It follows that the \(X\) minimizing \(\sigma_\star\) for fixed \(E\)
must satisfy the stationary equations \(B(X)X=VX(X'VX)\). If \(Z\) are
the normalized eigenvectors corresponding with the \(p\) largest
eigenvalues of the generalized eigen problem \(B(X)Z=VZ\Lambda\), with
\(Z'VZ=I\), then \(X=Z\Lambda^\frac12\).
Working with four-dimensional arrays, and computing \(e_{ijkl}(X)\) for
all quadruples, is unattractive. It can be avoided, however. Write
\(\eqref{E:eps}\) in the form \[
\epsilon_{ijk\ell}=\delta^{ik}\delta^{j\ell}\epsilon_{ij}+(1-\delta^{ik}\delta^{j\ell})e_{ijk\ell}(X)=\delta^{ik}\delta^{j\ell}(\epsilon_{ij}-e_{ij})+e_{ijkl}(X)
\] and substitute this into \(\eqref{E:C}\). After some simplification
we find the value of \(B(X)\) at this \(\epsilon_{ijkl}\) to be
\begin{equation}\label{E:C2}
B(X)=R(X)+VXX'V,
\end{equation}
with
\begin{equation}\label{E:B}
R(X):=\sum_{i=1}^n\sum_{j=1}^n w_{ij}(\epsilon_{ij}-e_{ij}(X))A_{ij}.
\end{equation}
Note that \(\eqref{E:V}\) says that \(v_{ij}=-2\sqrt{w_{ij}}\) for
\(i\not= j\), and the diagonal is filled in such that rows and columns
add up to zero. In the same way \(\eqref{E:B}\) says
\(r_{ij}(X)=-w_{ij}(\epsilon_{ij}-e_{ij}(X))\) for \(i\not= j\), and the
diagonal elements again make rows and columns sum to zero.
The algorithm is now simply to compute the update \(X^{(k+1)}\) by
solving the generalized eigen problem defined by \(B(X^{k})\) and \(V\).
Note that the fixed point equations for this algorithm
\((R(X)+VXX'V)X=VX(X'VX)\) are equivalent to \(R(X)X=0\), which are the
stationary equations found by setting the derivatives of \(\sigma\)
equal to zero. It is easy to understand why, in my youthful enthusiasm,
I baptized the algorithm ELEGANT. Note that if all off-diagonal weights
are equal to one then \(VX=2nX\), and thus \(B(X)=R(X)+4n^2XX'\) and we
must compute eigenvalues and eigenvectors of \(XX'+\frac{1}{4n^2}R(X)\).
\section{Ekman example}\label{ekman-example}
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{data}\NormalTok{(ekman, }\DataTypeTok{package=}\StringTok{"smacof"}\NormalTok{)}
\NormalTok{lbs