IE 708: Markov Decision Processes

Prerequisite: IE 611 or equivalent or Instructor's consent


Overview of decision making in the context of stochastic systems evolving over time--examples. Framework and some types of cost criteria: Expected total cost, Discounted cost and Average cost. Finite horizon models; Some classes of policies; Optimality of Markov policies; Dynamic programming principle and algorithm.
Infinite horizon models: Stationary models, Adequacy of Markov policies. Discounted cost models: Optimality of Markov (pure) policies. Policy iteration, value iteration and modified policy iteration algorithms. Linear and convex programming formulations. Average cost models: Unichain and multichain models. Iterative algorithms. Linear and convex programming formulations. Expected total cost models: Positive and Negative models. Math programming formulations of optimal policies.
Learning algorithms: Q-learning algorithms, reinforcement learning algorithms, actor-critic algorithms.


  • M. Puterman, Markov Decision Processes, Wiley, 1994
  • D. Bertsekas, Dynamic Programming and Optimal Control Volumes 1 and 2, Athena Scientific, Belmont, 1995
  • E. Fienberg and A. Shwartz, Handbook of Markov Decision Processes, Kluwer, 2002
  • E. Altman, Constrained Markov Decision Processes, Chapman Hall/CRC, 1999
  • J. Filar and O. Vrize, Competitive Markov Decision Processes, Springer, 1997
  • D. Bertsekas and J. Tsitsiklis, Neuro-dynamic programming, Athena Scientific, Belmont, 1996
  • Open Literature