IE 708: Markov Decision Processes

Prerequisite: IE 611 or equivalent or Instructor's consent

Contents

Overview of decision making in the context of stochastic systems evolving over time--examples. Framework and some types of cost criteria: Expected total cost, Discounted cost and Average cost. Finite horizon models; Some classes of policies; Optimality of Markov policies; Dynamic programming principle and algorithm.

Infinite horizon models: Stationary models, Adequacy of Markov policies. Discounted cost models: Optimality of Markov (pure) policies. Policy iteration, value iteration and modified policy iteration algorithms. Linear and convex programming formulations. Average cost models: Unichain and multichain models. Iterative algorithms. Linear and convex programming formulations. Expected total cost models: Positive and Negative models. Math programming formulations of optimal policies.

Learning algorithms: Q-learning algorithms, reinforcement learning algorithms, actor-critic algorithms.

References

M. Puterman, Markov Decision Processes, Wiley, 1994
D. Bertsekas, Dynamic Programming and Optimal Control Volumes 1 and 2, Athena Scientific, Belmont, 1995
E. Fienberg and A. Shwartz, Handbook of Markov Decision Processes, Kluwer, 2002
E. Altman, Constrained Markov Decision Processes, Chapman Hall/CRC, 1999
J. Filar and O. Vrize, Competitive Markov Decision Processes, Springer, 1997
D. Bertsekas and J. Tsitsiklis, Neuro-dynamic programming, Athena Scientific, Belmont, 1996
Open Literature

औद्योगिक अभियांत्रिकी एवं प्रचालन अनुसंधान विभाग

DEPARTMENT OF
INDUSTRIAL ENGINEERING & OPERATIONS RESEARCH
IIT BOMBAY

IE 708: Markov Decision Processes

औद्योगिक अभियांत्रिकी एवं प्रचालन अनुसंधान विभाग

DEPARTMENT OF INDUSTRIAL ENGINEERING & OPERATIONS RESEARCH IIT BOMBAY

IE 708: Markov Decision Processes

DEPARTMENT OF
INDUSTRIAL ENGINEERING & OPERATIONS RESEARCH
IIT BOMBAY