Google-UW Machine Learning Seminar Series

The Machine Learning Series provides a forum for presentation and discussion
of interesting and current machine learning issues.
The talks that are scheduled for 2010 will be listed below.

Unless otherwise noted, all talks will be in room DC 1302. Coffee to be confirmed

Presentation slides will be posted whenever possible. Please
click on the presentation title to access these notes (usually in pdf format).

Machine Learning Seminar Series is supported by

2010 Seminars - Distinguished Speakers

Wednesday, March 17, 2010,
4:00, DC 1302

Title:

On Noise-Tolerant Learning using Linear Classifiers

Speaker:

Phil Long

Abstract:

This talk is about learning using linear hypotheses in the
presence of noise, including the following topics:

* New algorithms
that tolerate a lot of "malicious noise" given
constraints on a probability distribution generating the examples.

* The ability
of linear classifiers to approximate the optimal error rate
for some tree-structured two-layer sources with the class designation at
the root, the observed variables at the leaves, and some hidden variables
in between.

* Limitations on the noise-tolerance of some boosting algorithms
based on convex optimization.

(This is joint work with Nader Bshouty, Adam Klivans and Rocco Servedio.)

Bio:

Dr. Phil Long is a world known researcher in the area of theoretical machine
learning. His leadership roles include co-chairing the program committee of
COLT'99,serving as an editor for the Machine Learning Journal and currently
serving as area chair from both ICML2010
and NIPS2010.
Dr. Long did his Ph.D. at UC Santa Cruz, and postdocs at
Technische Universitaet Graz and Duke. Then he joined the faculty
atthe National University of Singapore, followed by a stint at the
Genome Institute of Singapore. Next, he went to the Center for
Computational Learning Systems at Columbia. Since 2005 he has been
a member of Google's research unit.

Frequents vs. Bayesians, the PAC-Bayesian synthesis, and support
vector machines.

Speaker:

David McAllester

Abstract:

We will start with a description of the frequentist (objective probability)
and Bayesian (subjective probability) positions. We will then describe the
PAC-Bayesian theorem which allows for a kind of formal synthesis of the two
positions. The talk will then focus on support vector machines as a case
study in PAC-Bayesian analysis. We will discuss the "SVM scandal" ---
no meaningful formal justification for the hinge loss of soft SVMs has ever
been given. We will also apply PAC-Bayesian analysis to recent trends in
structural SVMs. Structural SVMs are a way of training the parameters of
graphical models and are becoming increasingly popular in areas such as computer
vision and natural language processing.

Bio:

Professor McAllester received his B.S., M.S., and Ph.D. degrees from the
Massachusetts Institute of Technology in 1978, 1979, and 1987 respectively.
He served on the faculty of Cornell University for the academic year of 1987-1988
and served on the faculty of MIT from 1988 to 1995. He was a member of technical
staff at AT&T Labs-Research from 1995 to 2002. He has been a fellow of
the American Association of Artificial Intelligence (AAAI) since 1997. Since
2002 he has been Chief Academic Officer at the Toyota Technological Institute
at Chicago. He has authored over 90 refereed publications. Professor McAllester's
research areas include machine learning, the theory of programming languages,
automated reasoning, AI planning, computer game playing (computer chess),
computational linguistics and computer vision. A 1991 paper on AI planning
proved to be one of the most influential papers of the decade in that area.
A 1993 paper on computer game algorithms influenced the design of the algorithms
used in the Deep Blue system that defeated Gary Kasparov. A 1998 paper on
machine learning theory introduced PAC-Bayesian theorems which combine Bayesian
and nonBayesian methods. He is currently part of a team that has scored in
the top two places in the PASCAL object detection challenge (computer vision)
in 2007, 2008 and 2009.

Machine learning is playing a central role in the digital revolution,
in which massive and never-ending data is collected from various
sources such as online commerce, social networking, and online
collaboration. This large amount of data is often noisy or partial.
In this talk I will present learning algorithms appropriate for this
new era: algorithms that not only can handle massive amounts of data
but can also leverage large data sets to reduce the required runtime;
and algorithms that can use the multitude of examples to compensate
for lack of full information on each individual example.

Bio:

Shai Shalev-Shwartz is on the faculty of the Department of Computer
Science and Engineering at the Hebrew university of Jerusalem, Israel.
Dr. Shalev-Shwartz received the PhD degree in computer science from
the Hebrew university, in 2007. Between 2007-2009 he was a research
assistant professor at Toyota Technological Institute at Chicago.
Shai has written more than 40 research papers, focusing on learning theory,
online prediction, optimization techniques, and practical algorithms.
He served as a program committee member for the COLT conference in
2008-2010, a program committee member for ALT in 2009, and he is
part of the editorial boards of the Journal of Machine Learning Research
(JMLR) and the Machine Learning Journal (MLJ).

Tuesday, September 14, 2010, 2:00 p.m., MC 5158

Title:

Hierarchical Bayesian Models of Language and Text

Speaker:

Yee Whye Teh

Abstract:

In this talk I will present a new approach to modelling sequence data
called the sequence memoizer. As opposed to most other sequence models,
our model does not make any Markovian assumptions. Instead, we use
a
hierarchical Bayesian approach which enforces sharing of statistical
strength across the different parts of the model. To make computations
with the model efficient, and to better model the power-law statistics
often observed in sequence data, we use a Bayesian nonparametric
prior
called the Pitman-Yor process as building blocks in the hierarchical
model. We show state-of-the-art results on language modelling and
text
compression.

This is joint work with Frank Wood, Jan Gasthaus, Cedric Archambeau and
Lancelot James.

Bio:

Yee Whye Teh is a Lecturer (equivalent to an assistant professor in US
system) at the Gatsby Computational Neuroscience Unit, UCL. He is
interested in machine learning and Bayesian statistics. His current
focus is on developing Bayesian nonparametric methodologies for
unsupervised learning, computational linguistics, and genetics. Prior
to his appointment he was Lee Kuan Yew Postdoctoral Fellow at the
National University of Singapore and a postdoctoral fellow at
University of California at Berkeley. He obtained his Ph.D. in
Computer Science at the University of Toronto in 2003. He is
programme co-chair of AISTATS 2010.

Hypothesis Testing and Bayesian Inference: New Applications
of Kernel Methods

Speaker:

Arthur Gretton

Abstract:

In the early days of kernel machines research, the "kernel trick" was
considered a useful way of constructing nonlinear learning algorithms
from linear ones, by applying the linear algorithms to feature space
mappings of the original data. More recently, it has become clear
that
a potentially more far reaching use of kernels is as a linear way
of
dealing with higher order statistics, by mapping probabilities to
a
suitable reproducing kernel Hilbert space (i.e., the feature space
is
an RKHS).

I will describe how probabilities can be mapped to kernel feature
spaces, and how to compute distances between these mappings. A
measure of strength of dependence between two random variables follows
naturally from this distance. Applications that make use of kernel
probability embeddings include:

* Nonparametric two-sample testing and independence testing in complex
(high dimensional) domains. In the latter case, we test whether text
in English is translated from the French, as opposed to being random
extracts on the same topic.

* Inference on graphical models, in cases where the variable
interactions are modeled nonparametrically (i.e., when parametric
models are impractical or unknown). In experiments, this approach
outperforms state-of-the-art nonparametric techniques in 3-D depth
reconstruction from 2-D images, and on a protein structure prediction
task.

Bio:

Arthur Gretton is a lecturer with the Gatsby Computational
Neuroscience Unit since August 2010, and is affiliated as a research
scientist with the Max Planck Institute for Biological Cybernetics, where
he has worked since September 2002. He received degrees in physics and systems
engineering from the Australian National University in 1996 and 1998, respectively;
and studied machine learning from 1999 to 2003 with Microsoft Research and
the Signal Processing and Communications Laboratory at the University of
Cambridge, where he completed his PhD. He worked from 2009-2010
as a
project scientist with the Machine Learning Department at Carnegie
Mellon University.

Arthur's research interests include machine learning, kernel
methods, nonparametric inference in graphical models, statistical
learning theory, nonparametric hypothesis testing, and blind source separation.
He has been an associate editor at IEEE Transactions on Pattern
Analysis and Machine Intelligence since March 2009, a member
of the NIPS Program Committee in 2008 and 2009, and an Area
Chair for ICML in
2011.