Enabling Pareto-Stationarity Exploration in Multi-Objective Reinforcement Learning: A Multi-Objective Weighted-Chebyshev Actor-Critic Approach

Kavli Affiliate: Jia Liu

| First 5 Authors: Fnu Hairi, Fnu Hairi, , ,

| Summary:

In many multi-objective reinforcement learning (MORL) applications, being
able to systematically explore the Pareto-stationary solutions under multiple
non-convex reward objectives with theoretical finite-time sample complexity
guarantee is an important and yet under-explored problem. This motivates us to
take the first step and fill the important gap in MORL. Specifically, in this
paper, we propose a ulineMulti-ulineObjective weighted-ulineCHebyshev
ulineActor-critic (MOCHA) algorithm for MORL, which judiciously integrates
the weighted-Chebychev (WC) and actor-critic framework to enable
Pareto-stationarity exploration systematically with finite-time sample
complexity guarantee. Sample complexity result of MOCHA algorithm reveals an
interesting dependency on $p_min$ in finding an $epsilon$-Pareto-stationary
solution, where $p_min$ denotes the minimum entry of a given weight vector
$mathbfp$ in WC-scarlarization. By carefully choosing learning rates, the
sample complexity for each exploration can be
$tildemathcalO(epsilon^-2)$. Furthermore, simulation studies on a
large KuaiRand offline dataset, show that the performance of MOCHA algorithm
significantly outperforms other baseline MORL approaches.

| Search Query: ArXiv Query: search_query=au:”Jia Liu”&id_list=&start=0&max_results=3