Kavli Affiliate: Max Tegmark | First 5 Authors: Anish Mudide, Joshua Engels, Eric J. Michaud, Max Tegmark, Christian Schroeder de Witt | Summary: Sparse autoencoders (SAEs) are a recent technique for decomposing neural network activations into human-interpretable features. However, in order for SAEs to identify all features represented in frontier models, it will be necessary […]
Continue.. Efficient Dictionary Learning with Switch Sparse Autoencoders