Kavli Affiliate: Yi Zhou | First 5 Authors: Yudan Wang, Yue Wang, Yi Zhou, Shaofeng Zou, | Summary: Actor-critic (AC) is a powerful method for learning an optimal policy in reinforcement learning, where the critic uses algorithms, e.g., temporal difference (TD) learning with function approximation, to evaluate the current policy and the actor updates the […]
Continue.. Non-Asymptotic Analysis for Single-Loop (Natural) Actor-Critic with Compatible Function Approximation