concept reinforcement ★★ seed

Q-Learning

Watkins' 1989 model-free RL algorithm that learns action-value functions. Converges to optimal policy without requiring a model of the environment.

#q-learning #model-free #value-function #watkins