Kavli Affiliate: Cheng Peng | First 5 Authors: Shuyao Xu, Cheng Peng, Jiangxuan Long, Weidi Xu, Wei Chu | Summary: Recent advances in model distillation demonstrate that data from advanced reasoning models (e.g., DeepSeek-R1, OpenAI’s o1) can effectively transfer complex reasoning abilities to smaller, efficient student models. However, standard practices employ rejection sampling, discarding incorrect […]
Continue.. Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning