Algorithms
for data analysis draw heavily on both discrete and continuous
techniques, often in combination. Combinatorial optimization
plays a natural role in methods for clustering and classification:
there are a variety of ways to describe 'good' clusterings
of a dataset as optimizing a discrete objective function,
and this leads to heuristics involving local search and linear
programming relaxations. In a different vein, methods based
on eigenvectors and the singular value decomposition have
been employed for both clustering and approximating high-dimensional
data; such approaches form the basis of the Latent Semantic
Indexing technique in information retrieval and the current
generation of link-based ranking algorithms for Web search.
Our understanding of the power of all these methodologies
has benefited from the study of probabilistic generative models
for large datasets and networks; such models provide a setting
in which to rigorously analyze data analysis algorithms, and
they also can be used to posit 'simple' explanations for phenomena
that are observed across a diverse range of datasets.

TUTORIAL
SCHEDULE

MONDAY,
MAY 5All
talks are in Lecture Hall EE/CS 3-180 unless otherwise
noted. The
schedule is divided into 4 units