Towards Understanding Grokking: An Effective Theory of Representation Learning

Kavli Affiliate: Max Tegmark

| First 5 Authors: Ziming Liu, Ouail Kitouni, Niklas Nolte, Eric J. Michaud, Max Tegmark

| Summary:

We aim to understand grokking, a phenomenon where models generalize long
after overfitting their training set. We present both a microscopic analysis
anchored by an effective theory and a macroscopic analysis of phase diagrams
describing learning performance across hyperparameters. We find that
generalization originates from structured representations whose training
dynamics and dependence on training set size can be predicted by our effective
theory in a toy setting. We observe empirically the presence of four learning
phases: comprehension, grokking, memorization, and confusion. We find
representation learning to occur only in a "Goldilocks zone" (including
comprehension and grokking) between memorization and confusion. Compared to the
comprehension phase, the grokking phase stays closer to the memorization phase,
leading to delayed generalization. The Goldilocks phase is reminiscent of
"intelligence from starvation" in Darwinian evolution, where resource
limitations drive discovery of more efficient solutions. This study not only
provides intuitive explanations of the origin of grokking, but also highlights
the usefulness of physics-inspired tools, e.g., effective theories and phase
diagrams, for understanding deep learning.

| Search Query: ArXiv Query: search_query=au:”Max Tegmark”&id_list=&start=0&max_results=10