Convex optimization is a fundamental theoretical core for many well known machine learning
models used in day-to-day problems. Studying and getting a better understanding of
these methods is an important aspect to build good models and extract proper conclusions.
Furthermore, we will also focus on the study of linear models such as Lasso and Group
Lasso, which incorporates several advantages, being the most important ones the cheap
computational cost and the interpretability of the resulting models.
In this Master's Thesis we will rst review the eld of convex optimization from a pure
theoretical point of view, studying concepts such as subdi erential and proximal descent
minimization in order to minimize composite functions where one of its components is
non di erentiable. This theoretical background is the base to study ISTA, a rst iterative
approach to minimize such functions. To improve the convergence of ISTA, Nesterov rst
introduced an optimization to reach an optimal convergence ratio of O(1=k2), which is
a signi cant improvement over the O(1=k) convergence of ISTA. This optimized method
is known as FISTA. Nonetheless, even if this optimization allows us to reach a faster
theoretical convergence, in practice we see that it may not be monotone, which usually
a ects the convergence, making it slower in real applications. To solve these issues there
have been several proposals. We have studied here some restarting schemes proposed by
O'Donoghe and Candes and further optimizations introduced by Ito and colleagues that
actually make FISTA faster in practice.
To show the e ects of these optimizations we have ended this work by presenting some
real experiments. A rst experiment consists on synthetic data generated from a Gaussian
distribution. This experiment focuses on exploring the e ects of the di erent proposed
optimizations and its advantage over standard FISTA. In this experiment we clearly see
how the optimized methods are clearly better in terms of iterations to convergence than
FISTA. Nonetheless, a synthetic experiment is not a conclusive demonstration. For this, we
have also applied these methods to a real problem of wind energy prediction in Sotavento,
a wind farm located in Galicia, northwest Spain. In this experiment we pursue two goals.
First, to test the usefulness of the optimized methods in a complex cross-validation setup
where we test many hyperparameters and the e ects of such strategies. Second, to test
Group Lasso as a competitive model against state of art models such as Lasso and Ridge.
In summary, we have observed that the optimizations are not only useful in a singlerun
setup, where the worst case shows a performance similar to standard FISTA, but in a
cross-validation setup, where the bene t is actually greater. This has to do with the shape
of the problem and the penalization term. A small penalization implies a very wide area
to nd the solution, and then a greater optimization may be expected. On the other hand,
a big penalization makes the problem pretty much \straightforward", making all models
behave similarly. In terms of competitiveness, we see that Group Lasso performs similarly
to other methods such as Lasso, and better than Ridge, with the added advantage of a
grouped structure and thus an interesting interpretation of the nal model in terms of
feature selection.