Kavli Affiliate: Yi Zhou | First 5 Authors: Yue Wang, Yi Zhou, Shaofeng Zou, , | Summary: Greedy-GQ with linear function approximation, originally proposed in cite{maei2010toward}, is a value-based off-policy algorithm for optimal control in reinforcement learning, and it has a non-linear two timescale structure with a non-convex objective function. This paper develops its finite-time […]
Continue.. Finite-Time Error Bounds for Greedy-GQ