Beyond Model Ranking: Predictability-Aligned Evaluation for Time Series Forecasting

Kavli Affiliate: Feng Yuan | First 5 Authors: Wanjin Feng, Wanjin Feng, , , | Summary: In the era of increasingly complex AI models for time series forecasting, progress is often measured by marginal improvements on benchmark leaderboards. However, this approach suffers from a fundamental flaw: standard evaluation metrics conflate a model’s performance with the […]


Continue.. Beyond Model Ranking: Predictability-Aligned Evaluation for Time Series Forecasting

A benchmark for vericoding: formally verified program synthesis

Kavli Affiliate: Max Tegmark | First 5 Authors: Sergiu Bursuc, Sergiu Bursuc, , , | Summary: We present and test the largest benchmark for vericoding, LLM-generation of formally verified code from formal specifications – in contrast to vibe coding, which generates potentially buggy code from a natural language description. Our benchmark contains 12,504 formal specifications, […]


Continue.. A benchmark for vericoding: formally verified program synthesis

GRB 250702B: Discovery of a Gamma-Ray Burst from a Black Hole Falling into a Star

Kavli Affiliate: Erin Kara | First 5 Authors: Eliza Neights, Eliza Neights, , , | Summary: Gamma-ray bursts are the most luminous electromagnetic events in the universe. Their prompt gamma-ray emission has typical durations between a fraction of a second and several minutes. A rare subset of these events have durations in excess of a […]


Continue.. GRB 250702B: Discovery of a Gamma-Ray Burst from a Black Hole Falling into a Star

Comprehensive X-ray Observations of the Exceptional Ultra-long X-ray and Gamma-ray Transient GRB 250702B with Swift, NuSTAR, and Chandra: Insights from the X-ray Afterglow Properties

Kavli Affiliate: Dheeraj Pasham | First 5 Authors: Brendan O’Connor, Brendan O’Connor, , , | Summary: GRB 250702B is an exceptional transient that produced multiple episodes of luminous gamma-ray radiation lasting for $>25$ ks, placing it among the class of ultra-long gamma-ray bursts (GRBs). However, unlike any known GRB, the textitEinstein Probe detected soft X-ray […]


Continue.. Comprehensive X-ray Observations of the Exceptional Ultra-long X-ray and Gamma-ray Transient GRB 250702B with Swift, NuSTAR, and Chandra: Insights from the X-ray Afterglow Properties

Optical/infrared observations of the extraordinary GRB 250702B: a highly obscured afterglow in a massive galaxy consistent with multiple possible progenitors

Kavli Affiliate: Dheeraj Pasham | First 5 Authors: Jonathan Carney, Jonathan Carney, , , | Summary: GRB 250702B was the longest gamma-ray burst ever observed, with a duration that challenges standard collapsar models and suggests an exotic progenitor. We collected a rich set of optical and infrared follow-up observations of its rapidly fading afterglow using […]


Continue.. Optical/infrared observations of the extraordinary GRB 250702B: a highly obscured afterglow in a massive galaxy consistent with multiple possible progenitors

A Splashback-like Feature of Central Galaxies in Galaxy Clusters

Kavli Affiliate: Eli S. Rykoff | First 5 Authors: Yuanyuan Zhang, Yuanyuan Zhang, , , | Summary: We investigate a splashback-like feature in the outer region of central galaxies (CGs) in clusters. This feature is detected as a "dip" in the radial slope of the CG surface brightness, derived through the stacking of Dark Energy […]


Continue.. A Splashback-like Feature of Central Galaxies in Galaxy Clusters

VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing

Kavli Affiliate: Ke Wang | First 5 Authors: Ke Wang, Ke Wang, , , | Summary: The growing capabilities of large language models and multimodal systems have spurred interest in voice-first AI assistants, yet existing benchmarks are inadequate for evaluating the full range of these systems’ capabilities. We introduce VoiceAssistant-Eval, a comprehensive benchmark designed to […]


Continue.. VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing

WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning

Kavli Affiliate: Ke Wang | First 5 Authors: Zimu Lu, Zimu Lu, , , | Summary: Agent systems powered by large language models (LLMs) have demonstrated impressive performance on repository-level code-generation tasks. However, for tasks such as website codebase generation, which depend heavily on visual effects and user-interaction feedback, current code agents rely only on […]


Continue.. WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning

Towards Understanding Feature Learning in Parameter Transfer

Kavli Affiliate: Jing Wang | First 5 Authors: Hua Yuan, Hua Yuan, , , | Summary: Parameter transfer is a central paradigm in transfer learning, enabling knowledge reuse across tasks and domains by sharing model parameters between upstream and downstream models. However, when only a subset of parameters from the upstream model is transferred to […]


Continue.. Towards Understanding Feature Learning in Parameter Transfer

A nearly pristine star from the Large Magellanic Cloud

Kavli Affiliate: Alexander P. Ji | First 5 Authors: Alexander P. Ji, Alexander P. Ji, , , | Summary: The first stars formed out of pristine gas, causing them to be so massive that none are expected to have survived until today. If their direct descendants were sufficiently low-mass stars, they could exist today and […]


Continue.. A nearly pristine star from the Large Magellanic Cloud