Finite-Time Global Optimality Convergence in Deep Neural Actor-Critic Methods for Decentralized Multi-Agent Reinforcement Learning

Kavli Affiliate: Jia Liu

| First 5 Authors: Zhiyao Zhang, Myeung Suk Oh, FNU Hairi, Ziyue Luo, Alvaro Velasquez

| Summary:

Actor-critic methods for decentralized multi-agent reinforcement learning
(MARL) facilitate collaborative optimal decision making without centralized
coordination, thus enabling a wide range of applications in practice. To date,
however, most theoretical convergence studies for existing actor-critic
decentralized MARL methods are limited to the guarantee of a stationary
solution under the linear function approximation. This leaves a significant gap
between the highly successful use of deep neural actor-critic for decentralized
MARL in practice and the current theoretical understanding. To bridge this gap,
in this paper, we make the first attempt to develop a deep neural actor-critic
method for decentralized MARL, where both the actor and critic components are
inherently non-linear. We show that our proposed method enjoys a global
optimality guarantee with a finite-time convergence rate of O(1/T), where T is
the total iteration times. This marks the first global convergence result for
deep neural actor-critic methods in the MARL literature. We also conduct
extensive numerical experiments, which verify our theoretical results.

| Search Query: ArXiv Query: search_query=au:”Jia Liu”&id_list=&start=0&max_results=3