Advances in Structured Prediction

Structured prediction is the problem
of making a joint set of decisions to optimize a joint loss. There are
two families of algorithms for such problems: Graphical model approaches
and learning to search approaches. Graphical models include
Conditional Random Fields and Structured SVMs and are effective when
writing down a graphical model and solving it is easy. Learning to
search approaches, explicitly predict the joint set of decisions
incrementally, conditioning on past and future decisions. Such models
may be particularly useful when the dependencies between the predictions
are complex, the loss is complex, or the construction of an explicit
graphical model is impossible.

We will describe both approaches,
with a deeper focus on the latter learning-to-search paradigm, which has
less tutorial support. This paradigm has been gaining increasing
traction over the past five years, making advances in natural language
processing (dependency parsing, semantic parsing), robotics (grasping
and path planning), social network analysis and computer vision (object
segmentation).

Bayesian Time Series Modeling: Structured Representations for Scalability

Time series of increasing complexity
are being collected in a variety of fields ranging from neuroscience,
genomics, and environmental monitoring to e-commerce based on
technologies and infrastructures previously unavailable. These datasets
can be viewed either as providing a single, high-dimensional time series
or as a massive collection of time series with intricate and possibly
evolving relationships between them. For scalability, it is crucial to
discover and exploit sparse dependencies between the data streams or
dimensions. Such representational structures for independent data
sources have been extensively explored in the machine learning
community. However, in the conversation on big data, despite the
importance and prevalence of time series, the question of how to analyze
such data at scale has received limited attention and represents an
area of research opportunities.

For these time series of interest,
there are two key modeling components: the dynamic and relational
models, and their interplay. In this tutorial, we will review some
foundational time series models, including the hidden Markov model (HMM)
and vector autoregressive (VAR) process. Such dynamical models and
their extensions have proven useful in capturing complex dynamics of
individual data streams such as human motion, speech, EEG recordings,
and genome sequences. However, a focus of this tutorial will be on how
to deploy scalable representational structures for capturing sparse
dependencies between data streams. In particular, we consider
clustering, directed and undirected graphical models, and
low-dimensional embeddings in the context of time series. An emphasis
is on learning such structure from the data. We will also provide some
insights into new computational methods for performing efficient
inference in large-scale time series.

Throughout the tutorial we will
highlight Bayesian and Bayesian nonparametric approaches for learning
and inference. Bayesian methods provide an attractive framework for
examining complex data streams by naturally incorporating and
propagating notions of uncertainty and enabling integration of
heterogenous data sources; the Bayesian nonparametric aspect allows the
complexity of the dynamics and relational structure to adapt to the
observed data.

Natural Language Understanding: Foundations and State-of-the-Art

Building systems that can understand
human language—being able to answer questions, follow instructions,
carry on dialogues—has been a long-standing challenge since the early
days of AI. Due to recent advances in machine learning, there is again
renewed interest in taking on this formidable task. A major question is
how one represents and learns the semantics (meaning) of natural
language, to which there are only partial answers. The goal of this
tutorial is (i) to describe the linguistic and statistical challenges
that any system must address; and (ii) to describe the types of cutting
edge approaches and the remaining open problems. Topics include
distributional semantics (e.g., word vectors), frame semantics (e.g.,
semantic role labeling), model-theoretic semantics (e.g., semantic
parsing), the role of context, grounding, neural networks, latent
variables, and inference. The hope is that this unified presentation
will clarify the landscape, and show that this is an exciting time for
the machine learning community to engage in the problems in natural
language understanding.

Policy Search: Methods and Applications

Policy search is a subfield in
reinforcement learning which focuses on finding good parameters for a
given policy parametrization. It is well suited for robotics as it can
cope with high-dimensional state and action spaces, one of the main
challenges in robot learning. We review recent successes of both
model-free and model-based policy search in robot learning.

Model-free policy search is a general
approach to learn policies based on sampled trajectories. We classify
model-free methods based on their policy evaluation strategy, policy
update strategy, and exploration strategy and present a unified view on
existing algorithms. Learning a policy is often easier than learning an
accurate forward model, and, hence, model-free methods are more
frequently used in practice. How- ever, for each sampled trajectory, it
is necessary to interact with the robot, which can be time consuming and
challenging in practice. Model-based policy search addresses this
problem by first learning a simulator of the robot’s dynamics from data.
Subsequently, the simulator generates trajectories that are used for
policy learning. For both model- free and model-based policy search
methods, we review their respective properties and their applicability
to robotic systems.

Peter Richtárik (University of Edimburgh) and Mark Schmidt (University of British Columbia).Download slides part 1 – part 2.

This tutorial reviews recent advances
in convex optimization for training (linear) predictors via
(regularized) empirical risk minimization. We exclusively focus on
practically efficient methods which are also equipped with complexity
bounds confirming the suitability of the algorithms for solving
huge-dimensional problems (a very large number of examples or a very
large number of features).

The first part of the tutorial is
dedicated to modern primal methods (belonging to the stochastic gradient
descent variety), while the second part focuses on modern dual methods
(belonging to the randomized coordinate ascent variety). While we make
this distinction, there are very close links between the primal and dual
methods, some of which will be highlighted. We shall also comment on
mini-batch, parallel and distributed variants of the methods as this is
an important consideration for applications involving big data.