Kavli Affiliate: Jia Liu | First 5 Authors: Changxin Tian, Changxin Tian, , , | Summary: Recent advances in learning rate (LR) scheduling have demonstrated the effectiveness of decay-free approaches that eliminate the traditional decay phase while maintaining competitive performance. Model merging techniques have emerged as particularly promising solutions in this domain. We present Warmup-Stable […]
Continue.. WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training