**Prerequisite:** IE 611 or equivalent or Instructor's consent

**Contents**

Overview of decision making in the context of stochastic systems evolving over time--examples. Framework and some types of cost criteria: Expected total cost, Discounted cost and Average cost. Finite horizon models; Some classes of policies; Optimality of Markov policies; Dynamic programming principle and algorithm.

Infinite horizon models: Stationary models, Adequacy of Markov policies. Discounted cost models: Optimality of Markov (pure) policies. Policy iteration, value iteration and modified policy iteration algorithms. Linear and convex programming formulations. Average cost models: Unichain and multichain models. Iterative algorithms. Linear and convex programming formulations. Expected total cost models: Positive and Negative models. Math programming formulations of optimal policies.

Learning algorithms: Q-learning algorithms, reinforcement learning algorithms, actor-critic algorithms.

**References**

- M. Puterman, Markov Decision Processes, Wiley, 1994
- D. Bertsekas, Dynamic Programming and Optimal Control Volumes 1 and 2, Athena Scientific, Belmont, 1995
- E. Fienberg and A. Shwartz, Handbook of Markov Decision Processes, Kluwer, 2002
- E. Altman, Constrained Markov Decision Processes, Chapman Hall/CRC, 1999
- J. Filar and O. Vrize, Competitive Markov Decision Processes, Springer, 1997
- D. Bertsekas and J. Tsitsiklis, Neuro-dynamic programming, Athena Scientific, Belmont, 1996
- Open Literature