Kavli Affiliate: Ke Wang | First 5 Authors: Aobo Kong, Wentao Ma, Shiwan Zhao, Yongbin Li, Yuchuan Wu | Summary: Social agents powered by large language models (LLMs) can simulate human social behaviors but fall short in handling complex goal-oriented social dialogues. Direct Preference Optimization (DPO) has proven effective in aligning LLM behavior with human […]
Continue.. SDPO: Segment-Level Direct Preference Optimization for Social Agents