Skip to content

Dynamic Programming

Exact RL methods for known MDPs: policy evaluation, policy iteration, value iteration, and their convergence properties. The theoretical foundation for all RL algorithms.

Learning Objectives

1. From MDP to Dynamic Programming

1.1 When Can We Use DP?

1.2 The Curse of Dimensionality

2. Policy Evaluation (Prediction)

2.1 Iterative Policy Evaluation

2.2 Convergence

3. Policy Iteration

3.1 Policy Improvement Theorem

3.2 Full Algorithm

4. Value Iteration

4.1 Bellman Optimality Backup

4.2 Full Algorithm

5. Asynchronous DP

6. Generalized Policy Iteration (GPI)

7. Python Implementation: Grid World

Exercises

References