Kavli Affiliate: Max Tegmark | First 5 Authors: Samuel Marks, Max Tegmark, , , | Summary: Large Language Models (LLMs) have impressive capabilities, but are also prone to outputting falsehoods. Recent work has developed techniques for inferring whether a LLM is telling the truth by training probes on the LLM’s internal activations. However, this line […]
Continue.. The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets