Kavli Affiliate: Max Tegmark | First 5 Authors: Joshua Engels, Eric J. Michaud, Isaac Liao, Wes Gurnee, Max Tegmark | Summary: Recent work has proposed that language models perform computation by manipulating one-dimensional representations of concepts ("features") in activation space. In contrast, we explore whether some language model representations may be inherently multi-dimensional. We begin […]
Continue.. Not All Language Model Features Are Linear