concept reinforcement ★★ seed
Q-Learning
Watkins' 1989 model-free RL algorithm that learns action-value functions. Converges to optimal policy without requiring a model of the environment.
#q-learning
#model-free
#value-function
#watkins