Kavli Affiliate: Ke Wang | First 5 Authors: Jiawei Wang, Jiawei Wang, , , | Summary: In long-horizon tasks, recent agents based on Large Language Models (LLMs) face a significant challenge that sparse, outcome-based rewards make it difficult to assign credit to intermediate steps. Previous methods mainly focus on creating dense reward signals to guide […]
Continue.. Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents