AISTATS 2016 Invited Speakers

Abstract
Machine learning algorithms increasingly work with sensitive information on
individuals, and hence the problem of privacy-preserving data analysis -- how to
design data analysis algorithms that operate on the sensitive data of
individuals while still guaranteeing the privacy of individuals in the data--
has achieved great practical importance. In this talk, we address two problems
in privacy-preserving data analysis.

First, we address the problem of privacy-preserving classification, and present
an efficient classifier which is private in the differential privacy model of
Dwork et al. Our classifier works in the ERM (empirical loss minimization)
framework, and includes privacy preserving logistic regression and privacy
preserving support vector machines.

We next address the question of learning from sensitive correlated data, such
as private information on users connected together in a social network, and
measurements of physical activity of a single user across time. Unfortunately
differential privacy cannot adequately address privacy challenges in this kind
of data, and as such, these challenges have been largely ignored by existing
literature. We consider a recent generalization of differential privacy, called
Pufferfish, that can be used to address privacy in correlated data, and present
new privacy mechanisms in this framework.
Based on joint work with Claire Monteleoni (George Washington University),
Anand Sarwate (Rutgers), Yizhen Wang (UCSD) and Shuang Song (UCSD).

Bio
Kamalika Chaudhuri is an Assistant Professor in the Computer Science and
Engineering Department at UC San Diego. Prior to joining the department, she
received a PhD in Computer Science from UC Berkeley in 2007, and was a
postdoctoral researcher at UC San Diego from 2007-2010. She is the recipient of
a Hellman Faculty Fellowship and she received an NSF CAREER award in 2012.
Kamalika's research is on learning theory, which deals with the theoretical
foundations of machine learning. She is particularly interested in
privacy-preserving machine learning -- how to learn a good predictor from
sensitive training data, while ensuring the privacy of individuals in the data
set.

Abstract
People understand many domains more deeply than today's machine learning
systems. Having a good representation for a problem is crucial to the success
of intelligent systems. In this talk, we discuss recent work and future
opportunities for how humans can aid machine learning algorithms. Beyond simply
labeling data, the crowd can help uncover the latent representation behind a
problem. We discuss recent work on eliciting features using active learning as
well as other aspects of crowdsourcing and machine learning, such as how
crowdsourcing can help generate data, raise questions, and assist in more
complex AI tasks.

Bio
Adam Tauman Kalai received his BA (1996) from Harvard, and MA (1998) and PhD
(2001) under the supervision of Avrim Blum from CMU. After an NSF postdoctoral
fellowship at M.I.T. with Santosh Vempala, he served as an assistant professor
at the Toyota Technological institute at Chicago and then at Georgia Tech. He
is now a Principal Researcher at Microsoft Research New England. His honors
include an NSF CAREER award and an Alfred P. Sloan fellowship. His research
focuses on human computation, machine learning, and algorithms.

Abstract
We introduce a very general method for high-dimensional classification, based
on careful combination of the results of applying an arbitrary base classifier
to random projections of the feature vectors into a lower-dimensional space. In
one special case that we study in detail, the random projections are divided
into non-overlapping blocks, and within each block we select the projection
yielding the smallest estimate of the test error. Our random projection
ensemble classifier then aggregates the results of applying the base classifier
on the selected projections, with a data-driven voting threshold to determine
the final assignment. Our theoretical results elucidate the effect on
performance of increasing the number of projections. Moreover, under a boundary
condition implied by the sufficient dimension reduction assumption, we show
that the test excess risk of the random projection ensemble classifier can be
controlled by terms that do not depend on the original data dimension. The
classifier is also compared empirically with several other popular
high-dimensional classifiers via an extensive simulation study, which reveals
its excellent finite-sample performance.

Bio
Richard Samworth is Professor of Statistics in the Statistical Laboratory at
the University of Cambridge, and currently holds a GBP 1.2M Engineering and
Physical Sciences Research Council Early Career Fellowship. He received his
PhD in Statistics, also from the University of Cambridge, in 2004. Richard's
main research interests are in nonparametric and high-dimensional statistical
inference. Particular research topics include shape-constrained density and
other nonparametric function estimation problems, nonparametric classification,
clustering and regression, Independent Component Analysis, bagging and
high-dimensional variable selection problems. Richard was awarded the Royal
Statistical Society (RSS) Research prize (2008), the RSS Guy Medal in Bronze
(2012) and a Philip Leverhulme prize (2014). He has been elected a Fellow of
the Institute for Mathematical Statistics (2014) and the American Statistical
Association (2015).