Course Description: Often the success of a machine learning project depends on the choice of features used. Machine learning has made great progress in training classification, regression and recognition systems when "good" representations, or features, of input data are available. However, much human effort is spent on designing good features which are usually knowledge-based and engineered by domain experts over years of trial and error. A natural question to ask then is "Can we automate the learning of useful features from raw data?" Representation learning algorithms such as principal component analysis aim at discovering better representations of inputs by learning transformations of data that disentangle factors of variation in data while retaining most of the information. The success of such data-driven approaches to feature learning depends not only on how much data we can process but also on how well the features that we learn correlate with the underlying unknown labels (semantic content in the data). This course will focus on scalable machine learning approaches for learning representations from large amounts of unlabeled, multi-modal, and heterogeneous data.

Prerequisites: The class is accessible to undergraduates and graduates and only assumes background in basic machine learning or basic probability and linear algebra. Key mathematical concepts will be reviewed before they are used, but a certain level of mathematical maturity is expected.

Grading: Grades will be based on homework assignments (30%), class participation (10%), Project (35%) and an in-class final exam (25%).

Discussions: This term we will be using Piazza for class discussion. Rather than emailing questions to the teaching staff, I encourage you to post your questions on Piazza. Find our class page here.