Supervised Learning-enhanced Multi-Group Actor Critic for Live Stream Allocation in Feed

Kavli Affiliate: Li Xin Li

| First 5 Authors: Jingxin Liu, Xiang Gao, Yisha Li, Xin Li, Haiyang Lu

| Summary:

Reinforcement Learning (RL) has been widely applied in recommendation systems
to capture long-term user engagement, thus improving dwelling time and
improving user retention. In the context of a short video & live stream mixed
recommendation scenario, the live stream recommendation system (RS) decides
whether to inject at most one live stream into the video feed for each user
request. To maximize long-term user engagement, it is crucial to determine an
optimal live stream injection policy for accurate live stream allocation.
However, traditional RL algorithms often face divergence and instability
problems, and these issues may cause too many live stream allocation, which
interrupts user’s short video interest and leads to a decrease in the user’s
app usage duration. To address these challenges, we propose a novel Supervised
Learning-enhanced Multi-Group Actor Critic algorithm (SL-MGAC). Specifically,
we introduce a supervised learning-enhanced actor-critic framework that
incorporates variance reduction techniques, where multi-task reward learning
helps restrict bootstrapping error accumulation during critic learning.
Additionally, we design a multi-group state decomposition module for both actor
and critic networks to reduce prediction variance and improve model stability.
We also propose a novel reward function to prevent overly greedy live stream
allocation. Empirically, we evaluate the SL-MGAC algorithm using offline policy
evaluation (OPE) and online A/B testing. Experimental results demonstrate that
the proposed method not only outperforms baseline methods under the
platform-level constraints but also exhibits enhanced stability in online
recommendation scenarios.

| Search Query: ArXiv Query: search_query=au:”Li Xin Li”&id_list=&start=0&max_results=3

Read More