Kavli Affiliate: David W. Miller | Summary:Scaling laws are typically fit using a family of models with a narrow range of frozen hyperparameter choices. In this work we study scaling laws using multiple architectural shapes and hyperparameter choices, highlighting their impact on resulting prescriptions. As a primary artifact of our research, we release the Gemstones: […]
Continue.. Gemstones: A Model Suite for Multi-Faceted Scaling Laws