Harmonic Loss Trains Interpretable AI Models

Kavli Affiliate: Max Tegmark

| First 5 Authors: David D. Baek, Ziming Liu, Riya Tyagi, Max Tegmark,

| Summary:

In this paper, we introduce **harmonic loss** as an alternative to the
standard cross-entropy loss for training neural networks and large language
models (LLMs). Harmonic loss enables improved interpretability and faster
convergence, owing to its scale invariance and finite convergence point by
design, which can be interpreted as a class center. We first validate the
performance of harmonic models across algorithmic, vision, and language
datasets. Through extensive experiments, we demonstrate that models trained
with harmonic loss outperform standard models by: (a) enhancing
interpretability, (b) requiring less data for generalization, and (c) reducing
grokking. Moreover, we compare a GPT-2 model trained with harmonic loss to the
standard GPT-2, illustrating that the harmonic model develops more
interpretable representations. Looking forward, we believe harmonic loss has
the potential to become a valuable tool in domains with limited data
availability or in high-stakes applications where interpretability and
reliability are paramount, paving the way for more robust and efficient neural
network models.

| Search Query: ArXiv Query: search_query=au:”Max Tegmark”&id_list=&start=0&max_results=3

Read More