Prerequisite: IE 708 (Markov decision Processes) or Instructor's consent.
Course Content:
- Reinforcement Learning (RL) as a data driven framework for sequential decision making problems (Markov decision models). 
- Model-free and model-based algorithms; temporal differences; TD (lambda) algorithms; Q-learning and its variants; discounted and average cost models. 
- Value and policy based approaches for discounted and average cost models; actor-critic algorithms with linear function approximations; their convergence via two-time scale stochastic approximation approach; natural gradient based algorithms. 
- Multi-Agent RL for discounted and average cost models; non-cooperative models; minimax criteria; decentralised models; consensus matrix; actor-critic algorithms with linear function approximations. 
- Learning stochastic shortest path problems and their multi-agent versions; regret analysis of algorithms. 
- Deep Reinforcement Learning algorithms for single and multi-agent models. 
Reference:
1) Stefano V. Albrecht, Filippos Christianos and Lukas Schäfer, Multi-Agent Reinforcement Learning: Foundations and Modern Approaches, 2023, MIT Press, Cambridge, USA 
2) Dimitri Bertsekas, A Course in Reinforcement Learning, 2023, Athena Scientific, USA 
3) Sean Meyn, Control Systems and Reinforcement Learning, 2021, Cambridge University Press, UK 
4) Vivek Borkar, Stochastic Approximation: A Dynamical System Viewpoint, Second Edition, TRIM Series, Springer, 2023 
5) Richard Sutton and Andrew Barto, Reinforcement Learning, Second Edition, Springer, 2018 
6) Open Literature