Many
modern data analysis problems lead to inference based on
high-dimensional structured models, typically with large amounts of
observed data. In these situations, often referred to as "Big Data"
problems, computation, inference and optimization need to be studied
together.

The
workshop will focus on various facets of this challenging problem. The
emerging links among the computational and inferential disciplines have
led to the emergence of a new generation of researchers whose approach
is multi-disciplinary: this workshop aims to highlight their recent work
and bring together international participants who wish to contribute to
further developments.

Sylvain Arlot (CNRS - Ecole Normale
Supérieure) Optimal model selection with
V-fold cross-validation: how should V be chosen?V-fold cross-validation is a simple and efficient method for
estimator selection when the goal is to minimize the final prediction
error. Nevertheless, few theoretical results exist about the influence
of V on the risk of the final estimator, in particular results taking
into account the variability of the cross-validation criterion as a
function of V. We will focus on the case-example of model selection
for least-squares density estimation. First, a non-asymptotic oracle
inequality holds for V-fold cross-validation (and its bias corrected
version, V-fold penalization), with a leading constant 1+o(1) for
V-fold penalization, and the second order term in the constant
decreases when V increases. Second, making an exact variance
computation allows to quantify the improvement we can expect when V
increases. In particular, this computation enlightens why the
improvement is larger when V goes from 2 to 10 than when V goes from
10 to 100, for instance. Simulation experiments confirm these
theoretical results for realistic values of the sample size.
This talk is based upon a collaboration with Matthieu Lerasle (CNRS -
Universite de Nice, France). Preprint: http://arxiv.org/abs/1210.5830
Francois Caron (INRIA Bordeaux - Sud-Ouest) Bayesian nonparametric models
for bipartite graphs
In this talk I will present a novel Bayesian nonparametric model for
bipartite graphs, based on the theory of completely random measures.
The model is able to handle a potentially infinite number of nodes and
has appealing properties; in particular, it may exhibit a power-law
behavior for some values of the parameters. I derive a posterior
characterization, a generative process for network growth, and a
simple Gibbs sampler for posterior simulation. Finally, the model is
shown to provide a good fit to several large real-world bipartite
social networks

Nicolas Chopin (CREST - ENSAE) EP-ABC:
Expectation-Propagation for Likelihood-Free InferenceMany models of interest in the natural and social sciences have
no closed-form likelihood function, which means that they cannot be
treated using the usual techniques of statistical inference. In the
case where such models can be efficiently simulated, Bayesian
inference is still possible thanks to the Approximate Bayesian
Computation (ABC) algorithm. Although many refinements have been
suggested, ABC inference is still far from routine. ABC is often
excruciatingly slow due to very low acceptance rates. In addition, ABC
requires introducing a vector of "summary statistics", the choice of
which is relatively arbitrary, and often require some trial and error,
making the whole process quite laborious for the user. We introduce in
this work the EP-ABC algorithm, which is an adaptation to the
likelihood-free context of the variational approximation algorithm
known as Expectation Propagation (Minka, 2001). The main advantage of
EP-ABC is that it is faster by a few orders of magnitude than standard
algorithms, while producing an overall approximation error which is
typically negligible. A second advantage of EP-ABC is that it replaces
the usual global ABC constraint on the vector of summary statistics
computed on the whole dataset, by n local constraints of the form that
apply separately to each data-point. As a consequence, it is often
possible to do away with summary statistics entirely. In that case,
EP-ABC approximates directly the evidence (marginal likelihood) of the
model. Comparisons are performed in three real-world applications
which are typical of likelihood-free inference, including one
application inneuroscience which is novel, and possibly too
challenging for standard ABC techniques. I will also mention briefly
new applications of EP-ABC we are currently working on, in population
genetics and in spatial modelling. (joint work withSimon Barthelmé,
plus some on-going work with Jean-Michel Marin, Pierre Pudlo, Magal
Beffi, and Erlis Ruli)

Aurélien Garivier (Institut
Mathématique de Toulouse, Université Paul Sabatier) Dynamic resource allocation as an
estimation problemAn agent sequentially chooses actions in a set of possible
options. Each action leads to a stochastic reward, whose distribution
is unknown. What dynamic allocation rule should he choose so as to
maximize his cumulated reward?
The study of "bandit problems" (the word refers to the paradigmatic
situation of a gambler facing a row of slot-machines and deciding
which one to choose in order to maximize his rewards), dating back to
the 1930s, was originally motivated by medical trials. In has recently
raised a renewed interest because of computer-driven applications,
from computer experiments to recommender systems and Big Data.
One possible solution, called UCB, relies on upper confidence bounds
of the expected rewards associated to all actions. In this
presentation, I shall explain why fine confidence bounds are required
in order to obtain efficient allocation rules, and I shall present the
statistical challenges involved in their construction.

Zaid Harchaoui (INRIA Grenoble -
Rhône-Alpes)
Large-scale learning with conditional gradient algorithms
We consider convex optimization problems arising in machine learning
in large-scale settings. For several important learning problems, such
as e.g. noisy matrix completion or multi-class classification,
state-of-the-art optimization approaches such as composite
minimization (a.k.a. proximal-gradient) algorithms are difficult to
apply and do not scale up to large datasets. We propose three
extensions of the conditional gradient algorithm (a.k.a. Frank-Wolfe's
algorithm), suitable for large-scale problems, and establish their
finite-time convergence guarantees. Promising experimental results are
presented on large-scale real-world datasets.

Guillaume Obozinski (Ecole des Ponts
- Paristech)­ Convex relaxation for Combinatorial
PenaltiesThe last years have seen the emergence of the field of
structured sparsity, which aims at identifying a model of small
complexity given some a priori knowledge on its possible structure.
Specifically, models with structured sparsity are models in which the
set of non-zero parameters --- often corresponding to a set of
selected variables --- is not only assumed to be small, but also to
display structured patterns. Two important examples are group
sparsity, where groups of parameters are simultaneously zero or
non-zero ,and hierarchical sparsity, were variables can only be
selected following a prescribed partial order encoded by a directed
acyclic graph.
A common approach to the problem is to add to the empirical risk an
implicit or explicit penalization of the structure of the non-zero
patterns.
In this talk, I will consider a generic formulation in which allowed
structures are encoded by a combinatorial penalty, and show that when
combined with continuous regularizer such as an Lp norm, a tightest
convex relaxation can be constructed and used a regularizer. The
formulation considered allows to treat in a unified framework several
a priori disconnected approaches such as norms based on overlapping
groups, norms based on latent representations such as block-coding and
submodular functions, and to obtain generic consistency and support
recovery results for the corresponding estimators obtained as
minimizers of the regularized empirical risk.

Peter Richtarik (University of
Edinburgh) Mini-batch primal and dual methods
for support vector machinesIn this work we address the issue of using mini-batches in
stochastic optimization of support vector machines (SVMs). We show
that the same quantity, the spectral norm of the data, controls the
parallelization speedup obtained for both primal stochastic
subgradient descent (SGD) and stochastic dual coordinate ascent (SCDA)
methods and use it to derive novel variants of mini-batched SDCA. Our
guarantees for both methods are expressed in terms of the original
nonsmooth primal problem based on the hinge-loss.

Igor Pruenster (University of Torino
& Collegio Carlo Alberto)On some distributional
properties of Gibbs-type priorsGibbs-type priors represent a natural (maybe the most natural?)
generalization of the Dirichlet process and can be intuitively
characterized in terms of their predictive structure. This talk will
deal with some of their distributional properties and highlight the
corresponding implications in terms of Bayesian nonparametric
modeling.

Aarti Singh (Carnegie Mellon
University) Optimal convex optimization
under Tsybakov noise through reduction to active learningWe demonstrate that the complexity of convex minimization is
only determined by the rate of growth of the function around its
minimizer, as quantified by a Tsybakov-like noise condition (TNC) with
exponent k. Specifically, we demonstrate optimal first-order
stochastic optimization rates that depend precisely on the TNC
exponent k which include as special cases the classical rates for
convex (k tending to infinity), strongly convex (k=2) and uniformly
convex optimization (k > 2). Even faster rates (nearly exponential)
can be attained if the exponent 1 < k < 2. Our analysis is based
on a reduction of convex optimization to active learning as both
problems involve minimizing the number of queries needed to identify
the minimizer or the decision boundary, respectively, by using
feedback gained from prior queries. Our results demonstrate that the
complexity of convex optimization in d-dimensions is precisely the
same as the complexity of active learning in 1 dimension. First, we
port a lower bound proof from active learning to demonstrate the
complexity of optimizing convex function satisfying TNC, under both
stochastic first-order oracle as well as derivative-free (zeroth
order) setting. We then demonstrate that the optimal rate with TNC
exponent under first-order oracle can be achieved by an epoch-gradient
descent algorithm, as well as a coordinate descent algorithm that uses
a 1-dimensional active learning algorithm as subroutine. We also show
that it is possible to adapt to the unknown TNC exponent and hence the
degree of convexity.

Yee Whye Teh (Oxford University) Fast MCMC sampling for Markov jump
processes and extensionsMarkov jump processes (or continuous-time Markov chains) are a
simple and important class of continuous-time dynamical systems. In
this talk, we tackle the problem of simulating from the posterior
distribution over the unobserved paths in these models given some
observations. Our approach is an auxiliary variable Gibbs sampler, and
is based on the idea of uniformization. This sets up a Markov chain
over paths by alternately sampling a finite set of virtual jump times
given the current path and then sampling a new path given the set of
extant and virtual jump times using a standard hidden Markov model
forward filtering-backward sampling algorithm. Our method is exact and
does not involve approximations like time-discretization. We
demonstrate how our sampler extends naturally to MJP-based models like
Markov-modulated Poisson processes and continuous-time Bayesian
networks and show significant computational benefits over
state-of-the-art MCMC samplers for these models (Joint work with
Vinayak Rao)