Kavli Affiliate: Max Tegmark | First 5 Authors: Vedang Lad, Wes Gurnee, Max Tegmark, , | Summary: We demonstrate and investigate the remarkable robustness of Large Language Models by deleting and swapping adjacent layers. We find that deleting and swapping interventions retain 72-95% of the original model’s prediction accuracy without fine-tuning, whereas models with more […]
Continue.. The Remarkable Robustness of LLMs: Stages of Inference?