concept reinforcement ★★ seed
Policy Gradient Methods
RL algorithms that directly optimize the policy by gradient ascent on expected reward. REINFORCE (Williams, 1992) and actor-critic methods are foundational variants.
#policy-gradient
#reinforce
#actor-critic