Kavli Affiliate: Wei Gao | First 5 Authors: Xuan Zhang, Chao Du, Tianyu Pang, Qian Liu, Wei Gao | Summary: The recent development of chain-of-thought (CoT) decoding has enabled large language models (LLMs) to generate explicit logical reasoning paths for complex problem-solving. However, research indicates that these paths are not always deliberate and optimal. The […]
Continue.. Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs