Kavli Affiliate: Max Tegmark | First 5 Authors: Joshua Engels, Isaac Liao, Eric J. Michaud, Wes Gurnee, Max Tegmark | Summary: Recent work has proposed the linear representation hypothesis: that language models perform computation by manipulating one-dimensional representations of concepts ("features") in activation space. In contrast, we explore whether some language model representations may be […]
Continue.. Not All Language Model Features Are Linear