Motion-R1: Chain-of-Thought Reasoning and Reinforcement Learning for Human Motion Generation

Kavli Affiliate: Zheng Zhu

| First 5 Authors: Runqi Ouyang, Haoyun Li, Zhenyuan Zhang, Xiaofeng Wang, Zheng Zhu

| Summary:

Recent advances in large language models, especially in natural language
understanding and reasoning, have opened new possibilities for text-to-motion
generation. Although existing approaches have made notable progress in semantic
alignment and motion synthesis, they often rely on end-to-end mapping
strategies that fail to capture deep linguistic structures and logical
reasoning. Consequently, generated motions tend to lack controllability,
consistency, and diversity. To address these limitations, we propose Motion-R1,
a unified motion-language modeling framework that integrates a Chain-of-Thought
mechanism. By explicitly decomposing complex textual instructions into
logically structured action paths, Motion-R1 provides high-level semantic
guidance for motion generation, significantly enhancing the model’s ability to
interpret and execute multi-step, long-horizon, and compositionally rich
commands. To train our model, we adopt Group Relative Policy Optimization, a
reinforcement learning algorithm designed for large models, which leverages
motion quality feedback to optimize reasoning chains and motion synthesis
jointly. Extensive experiments across multiple benchmark datasets demonstrate
that Motion-R1 achieves competitive or superior performance compared to
state-of-the-art methods, particularly in scenarios requiring nuanced semantic
understanding and long-term temporal coherence. The code, model and data will
be publicly available.

| Search Query: ArXiv Query: search_query=au:”Zheng Zhu”&id_list=&start=0&max_results=3