Meta-learning is expressed through altered prefrontal cortical dynamics

Kavli Affiliate: Joshua Berke, Loren Frank, and Razi Haque

| Authors: Xulu Sun, Alison E. Comrie, Ari E. Kahn, Emily J. Monroe, Abhilasha Joshi, Jennifer A. Guidera, Eric L. Denovellis, Timothy A. Krausz, Jenny Zhou, Paige Thompson, Jose Hernandez, Allison Yorita, Razi Haque, Joshua D. Berke, Nathaniel D. Daw and Loren M. Frankn

| Summary:

Learning where and when rewards like food and water are available is essential for survival1,2. In the simplest cases where resource availability is stable, animals can learn reward contingencies by integrating outcomes across repeated samples of each option. In more natural settings, however, reward availability is governed by complex contingencies such as depletion and repletion over time. To flexibly adapt to such changing environments, optimal choices require meta-learning wherein animals learn how to learn from external feedback, eventually enabling them to infer reward contingencies from more abstract and generalizable rules rather than relying solely on recent outcomes3,4. The existence of meta-learning in animal behavior is well established3,5–8, yet the neural circuits and computations that implement it remain poorly understood9–11. Here we investigated meta-learning using a spatial foraging task in which rats acquired a novel depletion-repletion rule, and carried out longitudinal, high-density recordings from the medial prefrontal cortex (mPFC). We show that meta-learning engages specific, systematic changes in mPFC neural dynamics that embed the learned rule and thereby alter the way the network learns option values from reward outcomes. These dynamics are based on mixed coding of task structure and value in individual mPFC neurons which, at the population level, organize into low-dimensional dynamical motifs that generalize across task conditions. As meta-learning progresses, these motifs are reshaped to instantiate both rule-guided inference of future states before outcome delivery and rule-based value updating during the outcome period. These results indicate that meta-learning sculpts pre-existing prefrontal dynamics to support the acquisition of new reward-learning strategies.