This course is an introduction to machine learning and the
analysis of large data sets using distributed computation and storage
infrastructure. Basic machine learning methodology and relevant
statistical theory will be presented in lectures. Homework exercises
will give students hands-on experience with the methods on different
types of data. Methods include algorithms for clustering, binary
classification, and hierarchical Bayesian modeling. Data types include
images, archives of scientific articles, online ad clickthrough logs,
and public records of the City of Chicago. Programming will be based on
Python and R, but previous exposure to these languages is not assumed.