Chapter 6

TD(0)

state-value function

on-policy

Sarsa

action-value funcation

on-policy

Q-learning

off-policy

Expected Sarsa

Double Q-learning

With 0.5 probability:

or

Chapter 7

n-step TD

n-step Sarsa

n-step Off-policy Learning by Importance Sampling

n-step Tree Backup Algorithm