Kavli Affiliate: Max Tegmark | First 5 Authors: Vedang Lad, Jin Hwa Lee, Wes Gurnee, Max Tegmark, | Summary: We investigate the robustness of Large Language Models (LLMs) to structural interventions by deleting and swapping adjacent layers during inference. Surprisingly, models retain 72-95% of their original top-1 prediction accuracy without any fine-tuning. We find that […]
Continue.. The Remarkable Robustness of LLMs: Stages of Inference?