Kavli Affiliate: Max Tegmark | First 5 Authors: David D. Baek, Max Tegmark, , , | Summary: In this paper, we investigate how model distillation impacts the development of reasoning features in large language models (LLMs). To explore this, we train a crosscoder on Qwen-series models and their fine-tuned variants. Our results suggest that the […]
Continue.. Towards Understanding Distilled Reasoning Models: A Representational Approach