IE617: Online Machine Learning and Bandit Algorithms, Jan-April 2022Aim of this course is to study bandit algorithms in various setting and analyze their performance. We will be using many tools from probability and statistics in this course, a good background on these topics is a pre-requisite. Lecture HoursMonday 5.30-7pmThursday 5.30-7pm LocationHybrid mode. Lecture will be conducted physically and live streamed via MSTeams.Teaching AssistantsDebamita GhoshHitesh Gudwani TA HoursTimings: Every Monday 2-4pmVenue : Room 201, IEOR building SyllabusStochastic Multi-armed Bandits: Algorithms for simple and cumulative regret (expected and high probability), and their analysis. We will cover UCB, KL-UCB, Thompson Sampling algroithms and their variants Adversarial Multi-armed bandits: Algorithms for cumulative regret (expected and high probability) and their analysis. We will cover Weighted Majority, Exp3, Exp3.P, Follow-the-Leader algorithms and their variants Contextual Bandits: Stochastic Linear bandits, Generalized linear bandits, Kernalized Bandits, Neural UCB and their analysis Multi-armed bandits with muliple players, side observations, special structures (like, unimodality, smoothness, monotonicity) Reinforcement Learning: Introdcution to MDPs and learning algorithms for unknown environment Course Grades25 points: Midterm30 points: 3 Assingments 20 points: Pre-project report 25 points: Final project Reference textsAssignments |