IE617: Online Machine Learning and Bandit Algorithms, Jan-April 2022

Aim of this course is to study bandit algorithms in various setting and analyze their performance. We will be using many tools from probability and statistics in this course, a good background on these topics is a pre-requisite.

Lecture Hours

Monday 5.30-7pm
Thursday 5.30-7pm

Location

Hybrid mode. Lecture will be conducted physically and live streamed via MSTeams.

Teaching Assistants

Debamita Ghosh
Hitesh Gudwani

TA Hours

Timings: Every Monday 2-4pm
Venue : Room 201, IEOR building

Syllabus

Stochastic Multi-armed Bandits: Algorithms for simple and cumulative regret (expected and high probability), and their analysis. We will cover UCB, KL-UCB, Thompson Sampling algroithms and their variants

Adversarial Multi-armed bandits: Algorithms for cumulative regret (expected and high probability) and their analysis. We will cover Weighted Majority, Exp3, Exp3.P, Follow-the-Leader algorithms and their variants

Contextual Bandits: Stochastic Linear bandits, Generalized linear bandits, Kernalized Bandits, Neural UCB and their analysis

Multi-armed bandits with muliple players, side observations, special structures (like, unimodality, smoothness, monotonicity)

Reinforcement Learning: Introdcution to MDPs and learning algorithms for unknown environment


Course Grades

25 points: Midterm
30 points: 3 Assingments
20 points: Pre-project report
25 points: Final project

Reference texts

  • S. Shalev-Shwartz and S. Ben-David, ``Understanding Machine Learning:From Theory to Algorithms,'' Cambridge University Press, 2014, Download.
  • T. Hastie, R. Tibshirani and J. Friedman, ``Elements of statistical machine learning,'' Springer series in statistics, 2009, Download.
  • S. Shalev-Shwartz, ``Online Learning and Online Convex Optimization,'' NOW publisher, 2012,Download.
  • S. Bubeck and N. Cesa-Bianchi, ``Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems,'' NOW publisher, 2012, Download.
  • N. Cesa-Bianchi and G. Lugosi, ``Prediction, Learning, and Games, Cambridge University Press, 2006, Download.
  • T. Lattimor and C. Szepesvari, ``Bandit Algorithms, Cambridge University Press, Download.
  • Assignments

  • Assignment 1.
  • Assignment 2.
  • Assignment 3.
  • Project List 1.
  • Project List 2.