Kavli Affiliate: Max Tegmark | First 5 Authors: Subhash Kantamneni, Joshua Engels, Senthooran Rajamanoharan, Max Tegmark, Neel Nanda | Summary: Sparse autoencoders (SAEs) are a popular method for interpreting concepts represented in large language model (LLM) activations. However, there is a lack of evidence regarding the validity of their interpretations due to the lack of […]
Continue.. Are Sparse Autoencoders Useful? A Case Study in Sparse Probing