Skip to main content

IEOR e-Seminar by Dr. Siva Theja Maguluri

Title: A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous, Value-Based Reinforcement Learning Algorithms

Speaker: Dr. Siva Theja Maguluri, Fouts Family Early Career Professor and Assistant Professor, H. Milton Stewart School of Industrial and Systems Engineering, Georgia Tech, USA.

Day, Date and Time: Friday, 5th March 2021, 7.30 PM to 8.30 PM (Indian Standard Time)
 

Abstract: In this work, we develop a unified framework to obtain finite-sample and/or finite-time convergence bounds for various model-free, value based Reinforcement Learning (RL) algorithms. These algorithms are naturally asynchronous, and we view them as Markovian Stochastic Approximation (SA) algorithms to solve Bellman fixed-point equations. We develop a Lyapunov framework and obtain mean square error bounds on the convergence of a general class of Markovian SA algorithms. Based on this central result, we establish finite-sample bounds for Q-learning, n-step TD and TD(\lambda). As a by-product, we demonstrate a bias-variance trade-off, and theoretically explain the well-known empirical observation on efficiency of bootstrapping in TD(\lambda) and n-step TD. We also study off-policy TD algorithms including V-trace, which is used in Deepmind's street learn project. The key proof techniques that we exploit are using a generalized Moreau envelope as a smooth potential/ Lyapunov function, and exploiting fast mixing of underlying Markov chain. The talk is based on the following two papers: 

https://arxiv.org/abs/2102.01567

https://papers.nips.cc/paper/2020/hash/5d44ee6f2c3f71b73125876103c8f6c4…

 
About the Speaker: Siva Theja Maguluri is Fouts Family Early Career Professor and Assistant Professor in the H. Milton Stewart School of Industrial and Systems Engineering at Georgia Tech. Before that, he was a Research Staff Member in the Mathematical Sciences Department at IBM T. J. Watson Research Center. He obtained his Ph.D. and MS in ECE as well as MS in Applied Math from UIUC, and B.Tech in Electrical Engineering from IIT Madras. His research interests span the areas of Control, Optimization, Algorithms and Applied Probability. In particular, he works on Reinforcement Learning theory, scheduling, resource allocation and revenue optimization problems that arise in a variety of systems including Data Centers, Cloud Computing, Wireless Networks, Block Chains, Ride hailing systems, etc. He is a recipient of the biennial "Best Publication in Applied Probability" award in 2017, second place award at INFORMS JFIG best paper competition in 2020,  "CTL/BP Junior Faculty Teaching Excellence Award" in 2020 and "Student Recognition of Excellence in Teaching: Class of 1934 CIOS Award" in 2020.

Webex Meeting Details:

Join from the meeting link

Meeting number (access code): 184 857 5288

Meeting password: nvJJHU23eb3