Abstract

This paper considers regularized block multiconvex optimization, where the feasible set and objective
function are generally nonconvex but convex in each block of variables. It also accepts nonconvex
blocks and requires these blocks to be updated by proximal minimization. We review some interesting
applications and propose a generalized block coordinate descent method. Under certain
conditions, we show that any limit point satisfies the Nash equilibrium conditions. Furthermore, we
establish global convergence and estimate the asymptotic convergence rate of the method by assuming
a property based on the Kurdyka-Lojasiewicz inequality. The proposed algorithms are tested on
nonnegative matrix and tensor factorization, as well as matrix and tensor recovery from incomplete
observations. The tests include synthetic data and hyperspectral data, as well as image sets from
the CBCL and ORL databases. Compared to the existing state-of-the-art algorithms, the proposed
algorithms demonstrate superior performance in both speed and solution quality. The MATLAB
code of nonnegative matrix/tensor decomposition and completion, along with a few demos, are
accessible from the authors' homepages.

Optimization and ControlMachine Learningmathscidoc:1703.27001

Abstract

The stochastic gradient (SG) method can quickly solve a problem with a large number
of components in the objective, or a stochastic optimization problem, to a moderate accuracy. The
block coordinate descent/update (BCD) method, on the other hand, can quickly solve problems with
multiple (blocks of) variables. This paper introduces a method that combines the great features of
SG and BCD for problems with many components in the objective and with multiple (blocks of)
variables. This paper proposes a block SG (BSG) method for both convex and nonconvex programs.
BSG generalizes SG by updating all the blocks of variables in the Gauss–Seidel type (updating the
current block depends on the previously updated block), in either a fixed or randomly shuffled order.
Although BSG has slightly more work at each iteration, it typically outperforms SG because of
BSG’s Gauss–Seidel updates and larger step sizes, the latter of which are determined by the smaller
per-block Lipschitz constants. The convergence of BSG is established for both convex and nonconvex
cases. In the convex case, BSG has the same order of convergence rate as SG. In the nonconvex
case, its convergence is established in terms of the expected violation of a first-order optimality
condition. In both cases our analysis is nontrivial since the typical unbiasedness assumption no
longer holds. BSG is numerically evaluated on the following problems: stochastic least squares and
logistic regression, which are convex, and low-rank tensor recovery and bilinear logistic regression,
which are nonconvex. On the convex problems, BSG performed significantly better than SG. On the
nonconvex problems, BSG significantly outperformed the deterministic BCD method because the
latter tends to stagnate early near local minimizers. Overall, BSG inherits the benefits of both SG
approximation and block coordinate updates and is especially useful for solving large-scale nonconvex
problems.

Abstract

Nonconvex optimization arises in many areas of computational science and engineering.
However, most nonconvex optimization algorithms are only known to have local
convergence or subsequence convergence properties. In this paper, we propose an algorithm
for nonconvex optimization and establish its global convergence (of the whole sequence) to
a critical point. In addition, we give its asymptotic convergence rate and numerically demonstrate
its efficiency. In our algorithm, the variables of the underlying problem are either treated
as one block or multiple disjoint blocks. It is assumed that each non-differentiable component
of the objective function, or each constraint, applies only to one block of variables. The
differentiable components of the objective function, however, can involve multiple blocks of
variables together. Our algorithm updates one block of variables at a time by minimizing a
certain prox-linear surrogate, along with an extrapolation to accelerate its convergence. The
order of update can be either deterministically cyclic or randomly shuffled for each cycle. In
fact, our convergence analysis only needs that each block be updated at least once in every
fixed number of iterations. We show its global convergence (of the whole sequence) to a
critical point under fairly loose conditions including, in particular, the Kurdyka–Łojasiewicz
condition, which is satisfied by a broad class of nonconvex/nonsmooth applications. These
results, of course, remain valid when the underlying problem is convex.We apply our convergence
results to the coordinate descent iteration for non-convex regularized linear regression,
as well as amodified rank-one residue iteration for nonnegative matrix factorization. We show that both applications have global convergence. Numerically,we tested our algorithm on nonnegative matrix and tensor factorization problems, where random shuffling clearly improves
the chance to avoid low-quality local solutions.

Abstract

Motivated by big data applications, first-order methods have been extremely popular
in recent years. However, naive gradient methods generally converge slowly. Hence, much efforts have
been made to accelerate various first-order methods. This paper proposes two accelerated methods
towards solving structured linearly constrained convex programming, for which we assume composite
convex objective that is the sum of a differentiable function and a possibly nondifferentiable one.
The first method is the accelerated linearized augmented Lagrangian method (LALM). At each
update to the primal variable, it allows linearization to the differentiable function and also the
augmented term, and thus it enables easy subproblems. Assuming merely convexity, we show that
LALM owns O(1/t) convergence if parameters are kept fixed during all the iterations and can be
accelerated to O(1/t^2) if the parameters are adapted, where t is the number of total iterations.
The second method is the accelerated linearized alternating direction method of multipliers
(LADMM). In addition to the composite convexity, it further assumes two-block structure on the
objective. Different from classic ADMM, our method allows linearization to the objective and also
augmented term to make the update simple. Assuming strong convexity on one block variable, we
show that LADMM also enjoys O(1=t2) convergence with adaptive parameters. This result is a
significant improvement over that in [Goldstein et. al, SIIMS'14], which requires strong convexity
on both block variables and no linearization to the objective or augmented term.
Numerical experiments are performed on quadratic programming, image denoising, and support
vector machine. The proposed accelerated methods are compared to nonaccelerated ones and also
existing accelerated methods. The results demonstrate the validness of acceleration and superior
performance of the proposed methods over existing ones.

Abstract

Finding a fixed point to a nonexpansive operator, i.e., x = Tx, abstracts many
problems in numerical linear algebra, optimization, and other areas of data sciences. To solve xed-
point problems, we propose ARock, an algorithmic framework in which multiple agents (machines,
processors, or cores) update x in an asynchronous parallel fashion. Asynchrony is crucial to parallel
computing since it reduces synchronization wait, relaxes communication bottleneck, and thus speeds
up computing significantly. At each step of ARock, an agent updates a randomly selected coordinate
xi based on possibly out-of-date information on x. The agents share x through either global memory
or communication. If writing xi is atomic, the agents can read and write x without memory locks.
We prove that if the nonexpansive operator T has a fixed point, then with probability one, ARock
generates a sequence that converges to a fixed point of T. Our conditions on T and step sizes are
weaker than comparable work. Linear convergence is obtained under suitable assumptions.
We propose special cases of ARock for linear systems, convex optimization, machine learning, as
well as distributed and decentralized consensus problems. Numerical experiments of solving sparse
logistic regression problems are presented.