Kavli Affiliate: Li Xin Li
| First 5 Authors: Jingxin Liu, Xiang Gao, Yisha Li, Xin Li, Haiyang Lu
| Summary:
In the context of a short video & live stream mixed recommendation scenario,
the live stream recommendation system (RS) decides whether to allocate at most
one live stream into the video feed for each user request. To maximize
long-term user engagement, it is crucial to determine an optimal live stream
policy for accurate live stream allocation. The inappropriate live stream
allocation policy can significantly affect the duration of the usage app and
user retention, which ignores the long-term negative impact of live stream
allocation. Recently, reinforcement learning (RL) has been widely applied in
recommendation systems to capture long-term user engagement. However,
traditional RL algorithms often face divergence and instability problems, which
restricts the application and deployment in the large-scale industrial
recommendation systems, especially in the aforementioned challenging scenario.
To address these challenges, we propose a novel Supervised Learning-enhanced
Multi-Group Actor Critic algorithm (SL-MGAC). Specifically, we introduce a
supervised learning-enhanced actor-critic framework that incorporates variance
reduction techniques, where multi-task reward learning helps restrict
bootstrapping error accumulation during critic learning. Additionally, we
design a multi-group state decomposition module for both actor and critic
networks to reduce prediction variance and improve model stability. We also
propose a novel reward function to prevent overly greedy live stream
allocation. Empirically, we evaluate the SL-MGAC algorithm using offline policy
evaluation (OPE) and online A/B testing. Experimental results demonstrate that
the proposed method not only outperforms baseline methods under the
platform-level constraints but also exhibits enhanced stability in online
recommendation scenarios.
| Search Query: ArXiv Query: search_query=au:”Li Xin Li”&id_list=&start=0&max_results=3