SDPO: Segment-Level Direct Preference Optimization for Social Agents

Kavli Affiliate: Ke Wang

| First 5 Authors: Aobo Kong, Wentao Ma, Shiwan Zhao, Yongbin Li, Yuchuan Wu

| Summary:

Social agents powered by large language models (LLMs) can simulate human
social behaviors but fall short in handling complex goal-oriented social
dialogues. Direct Preference Optimization (DPO) has proven effective in
aligning LLM behavior with human preferences across a variety of agent tasks.
Existing DPO-based approaches for multi-turn interactions are divided into
turn-level and session-level methods. The turn-level method is overly
fine-grained, focusing exclusively on individual turns, while session-level
methods are too coarse-grained, often introducing training noise. To address
these limitations, we propose Segment-Level Direct Preference Optimization
(SDPO), which focuses on specific key segments within interactions to optimize
multi-turn agent behavior while minimizing training noise. Evaluations on the
SOTOPIA benchmark demonstrate that SDPO-tuned agents consistently outperform
both existing DPO-based methods and proprietary LLMs like GPT-4o, underscoring
SDPO’s potential to advance the social intelligence of LLM-based agents. We
release our code and data at
https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/SDPO.

| Search Query: ArXiv Query: search_query=au:”Ke Wang”&id_list=&start=0&max_results=3

Read More