Syllabus of CSC 665: Online Learning and Multi-armed Bandits

Description of Course

This course is a continuing line of topics from CSC 665 Section 2 Machine Learning Theory that delve into online learning and multi-armed bandits (but the knowledge from ML theory is not required). The students will learn, via the lens of mathematical foundations, how and when machines can learn in an online manner. Specifically, the course offers mathematical formulation of learning environments (e.g., stochastic and adversarial worlds with possibly limited feedback), fundamental limits of learning in these environments, various algorithms concerning sample efficiency, computational efficiency, and generality. Throughout, students will not only learn fundamental mathematical tools upholding the current understanding of sequential decision making in the research community but also develop skills of adapting these techniques to their own research needs such as developing new algorithms.

Course Prerequisites or Co-requisites

Linear Algebra or equivalent:

You will need to understand the relationship between linear operators, linear transformations, and matrices

We will make repeated use of matrix decompositions such as the SVD

Multivariate Calculus or equivalent:

You will need to understand the relationship between the total derivative, the gradient, and how to take advantage of the fact that the derivative is a linear operator

Basic probability theory and statistics:

You will need to understand (conditional) expectation and (conditional) independence of random variables.

Course Format and Teaching Methods

Course Objectives

A successful student will be able to explain the key concepts of online learning, convex optimization, and the bandit feedback, and analyze lower and upper bounds of the achievable performance of the various online learning tasks and algorithms. Main topics include:

The online learning framework

Classes of loss functions: convexity, strong convexity, exp-concavity

Online-to-batch conversion

Online linear optimization

Prediction with expert advice

Online mirror descent

Follow the regularized leader

Adversarial/stochastic multi-armed bandits

Contextual bandits

Linear bandits

Pure exploration

Off-policy evaluation

This course is far from a complete survey of the concurrent topics on online learning and multi-armed bandits due to the sheer volume of recent developments and the fact that those topics require a deep understanding of concepts and fluency in mathematical techniques. That said, this course covers key topics that not only serve as building blocks of online learning but is also relevant in modern research so that the students can directly adapt learned topics to their own research. Students are expected to dedicate a significant amount of time on understanding mathematical concepts and skills outside the classroom.

Expected Learning Outcomes

The expected learning outcomes of the course is: * To be able to understand the concept of the key performance measure called ‘regret’, and how online learning is able to deal with adversarial data (i.e., non-stochastic). * To be able to list the common loss classes and their achievable regret bounds and explain why accelerated rates are available for some classes. * To be able to explain how online mirror descent and follow the regularized leader is different, and when one is better than the other. * To be able to identify the key problem parameters that govern the regret rates for each problem. * To be able to explain the key challenge in the bandit feedback. * To be able to articulate how various bandit problems pose different challenges and what the achievable regret rates are. * To be able to explain how the pure exploration is different from the regret minimization in multi-armed bandits. * To be able to describe the landscape of online learning and list a few open problems researchers are currently working on.

Makeup Policy for Students Who Register Late

If you register late for this class, contact me as soon as you do. You will be expected to submit all missed assignments within a week of your registration. It is your responsibility to catch up to the class content.

Course Communications

We will use D2L for communications and discussion. Make sure your D2L account is up to date - class announcements are sent through the website.

Required Texts or Readings

There is no designated textbook for this course. Much of the course materials will be based on the following materials (in the order of appearance in class schedule):

Assignments and Examinations: Schedule/Due Dates

This course will have a weekly review quiz (1-2 questions) that is relatively easy but requires reviewing the learned material.

There will be two written homework assignments:

A1: Online learning: assigned 2/19, due on 3/2

A2: Multi-armed bandits: assigned 3/30, due on 4/8

Final Examination or Project

This course will not have a final examination.

Each student will select a paper among the provided list of papers and present it for an hour. The presentation must include a clear exposition of the problem being addressed, the solution the paper proposes, a comparison of the solution with similar studies, key technical details and proofs, and possible extensions and open problems. To receive full credit for class participation, students must attend and participate in the discussion of all classes. Students should contact the instructor regarding absences for make-up.

Grading Scale and Policies

The instructing staff will grade your assignments, project, and the final exam on a scale from 0 to 100, with the following weights:

Attendance/participation: 10%

Quiz: 15%

Paper critique: 15%

Assignments: 30%

Paper presentation: 30%

The final grade in the course of the best of a per-class grading curve and overall performance:

90% or better: A;

80% or better: B;

70% or better: C;

60% or better: D;

below 60%: F.

Graded homework will be returned before the next homework is due. The homework is due in 10 days, and will be returned to students before the next homework is due. Exams will be returned within two weeks. Grading delays beyond promised return-by dates will be announced as soon as possible with an explanation for the delay.

As a rule, homework will not be accepted late except in case of documented emergency or illness. You may petition the professor in writing for an exception if you feel you have a compelling reason for turning work in late.

Department of Computer Science Code of Conduct

The Department of Computer Science is committed to providing and maintaining a supportive educational environment for all. We strive to be welcoming and inclusive, respect privacy and confidentiality, behave respectfully and courteously, and practice intellectual honesty. Disruptive behaviors (such as physical or emotional harassment, dismissive attitudes, and abuse of department resources) will not be tolerated. The complete Code of Conduct is available on our department web site. We expect that you will adhere to this code, as well as the UA Student Code of Conduct, while you are a member of this class.

Classroom Behavior Policy

To foster a positive learning environment, students and instructors have a shared responsibility. We want a safe, welcoming, and inclusive environment where all of us feel comfortable with each other and where we can challenge ourselves to succeed. To that end, our focus is on the tasks at hand and not on extraneous activities (e.g., texting, chatting, reading a newspaper, making phone calls, web surfing, etc.). Students are asked to refrain from disruptive conversations with people sitting around them during lecture. Students observed engaging in disruptive activity will be asked to cease this behavior. Those who continue to disrupt the class will be asked to leave lecture or discussion and may be reported to the Dean of Students. Some learning styles are best served by using personal electronics, such as laptops and iPads. These devices can be distracting to other learners. Therefore, students who prefer to use electronic devices for note-taking during lecture should use one side of the classroom.

Notification of Objectionable Materials

This course will contain material of a mature nature, which may include explicit language, depictions of nudity, sexual situations, and/or violence. The instructor will provide advance notice when such materials will be used. Students are not automatically excused from interacting with such materials, but they are encouraged to speak with the instructor to voice concerns and to provide feedback.

Accessibility and Accommodations

At the University of Arizona we strive to make learning experiences as accessible as possible. If you anticipate or experience physical or academic barriers based on disability or pregnancy, you are welcome to let me know so that we can discuss options. You are also encouraged to contact Disability Resources (520-621-3268) to explore reasonable accommodation. If our class meets at a campus location: Please be aware that the accessible table and chairs in this room should remain available for students who find that standard classroom seating is not usable.

UA Nondiscrimination and Anti-harassment Policy

The University is committed to creating and maintaining an environment free of discrimination; see http://policy.arizona.edu/human-resources/nondiscrimination-and-anti-harassment-policy. Our classroom is a place where everyone is encouraged to express well-formed opinions and their reasons for those opinions. We also want to create a tolerant and open environment where such opinions can be expressed without resorting to bullying or discrimination of others.

Subject to Change Statement

Information contained in the course syllabus, other than the grade and absence policy, may be subject to change with advance notice, as deemed appropriate by the instructor.

Crisis Contacts

Where to go, who to call if you’re in crisis:

Located in Tucson? Call the Community-Wide Crisis Line 24 hours a day, 7 days a week at 520-622-6000.

Are you a University of Arizona student? If it is not an emergency and you are a UA student, call or walk-in to Counseling and Psych Services at 520-621-3334 Monday - Friday. Walk-in triage is available between 9 am and 4 pm Monday - Friday.

Are you a concerned friend? Concerned friends can find out more about helping a friend who might be experiencing problems through our Friend 2 Friend website.

The National Suicide Prevention Lifeline is a 24-hour, toll-free, confidential suicide prevention hotline available to anyone in suicidal crisis or emotional distress. By dialing 1-800-273-TALK (8255), the call is routed to the nearest crisis center in our national network of more than 150 crisis centers. The Lifeline’s national network of local crisis centers provides crisis counseling and mental health referrals day and night.

Crisis Text Line: Text HOME to 741741 from anywhere in the United States, anytime, about any type of crisis. A live, trained Crisis Counselor receives the text and responds, all from a secure online platform. Find out more about how it works at crisistextline.org.

Suicide Prevention for LGBTQ Youth through the Trevor Project:

The Trevor Lifeline is a 24/7 suicide hotline: 866-4-U-TREVOR (1-866-488-7386)