Kavli Affiliate: Ke Wang | First 5 Authors: Xiaoqian Liu, Ke Wang, Yongbin Li, Yuchuan Wu, Wentao Ma | Summary: Large Language Models (LLMs) have shown impressive reasoning capabilities in well-defined problems with clear solutions, such as mathematics and coding. However, they still struggle with complex real-world scenarios like business negotiations, which require strategic reasoning-an […]
Continue.. EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning