We present several contributions at the interface of first-order methods for convex optimization and problems in statistical machine learning. In the first part of this thesis, we present new results for the Frank-Wolfe method, with a particular focus on: (i) novel computational guarantees that apply for any step-size sequence, (ii) a novel adjustment to the basic algorithm to better account for warm-start information, and (iii) extensions of the computational guarantees that hold in the presence of approximate subproblem and/or gradient computations. In the second part of the thesis, we present a unifying framework for interpreting "greedy" first-order methods -- namely Frank-Wolfe and greedy coordinate descent -- as instantiations of the dual averaging method of Nesterov, and we discuss the implications thereof. In the third part of the thesis, we present an extension of the Frank-Wolfe method that is designed to induce near-optimal low-rank solutions for nuclear norm regularized matrix completion and, for more general problems, induces near-optimal "well-structured" solutions. We establish computational guarantees that trade off efficiency in computing near-optimal solutions with upper bounds on the rank of iterates. We then present extensive computational results that show significant computational advantages over existing related approaches, in terms of delivering low rank and low run-time to compute a target optimality gap. In the fourth part of the thesis, we analyze boosting algorithms in linear regression from the perspective modern first-order methods in convex optimization. We show that classic boosting algorithms in linear regression can be viewed as subgradient descent to minimize the maximum absolute correlation between features and residuals. We also propose a slightly modified boosting algorithm that yields an algorithm for the Lasso, and that computes the Lasso path. Our perspective leads to first-ever comprehensive computational guarantees for all of these boosting algorithms, which provide a precise theoretical description of the amount of data-fidelity and regularization imparted by running a boosting algorithm, for any dataset. In the fifth and final part of the thesis, we present several related results in the contexts of boosting algorithms for logistic regression and the AdaBoost algorithm.

Description:

Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2016.; This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.; Cataloged from student-submitted PDF version of thesis.; Includes bibliographical references (pages 219-225).