Kavli Affiliate: Max Tegmark | First 5 Authors: Lee Sharkey, Bilal Chughtai, Joshua Batson, Jack Lindsey, Jeff Wu | Summary: Mechanistic interpretability aims to understand the computational mechanisms underlying neural networks’ capabilities in order to accomplish concrete scientific and engineering goals. Progress in this field thus promises to provide greater assurance over AI system behavior […]
Continue.. Open Problems in Mechanistic Interpretability