Skip to content

Temporal Difference Learning

TD learning methods: TD(0), SARSA, Q-learning, expected SARSA, n-step TD, TD(λ) with eligibility traces, and their convergence properties.

Learning Objectives

1. From DP to Model-Free Learning

2. TD(0) Prediction

2.1 The TD Update Rule

2.2 TD vs. Monte Carlo

3. SARSA (On-Policy Control)

3.1 Algorithm

3.2 Python Implementation: Cliff Walking

4. Q-Learning (Off-Policy Control)

4.1 Algorithm

4.2 Python Implementation

4.3 SARSA vs. Q-Learning Comparison

5. Expected SARSA

6. Multi-Step TD Methods

6.1 n-Step Returns

6.2 n-Step SARSA

7. TD(λ) and Eligibility Traces

7.1 Forward View

7.2 Backward View with Eligibility Traces

8. Convergence Properties

Exercises

References