A benchmark for vericoding: formally verified program synthesis

Kavli Affiliate: Max Tegmark | First 5 Authors: Sergiu Bursuc, Sergiu Bursuc, , , | Summary: We present and test the largest benchmark for vericoding, LLM-generation of formally verified code from formal specifications – in contrast to vibe coding, which generates potentially buggy code from a natural language description. Our benchmark contains 12,504 formal specifications, […]


Continue.. A benchmark for vericoding: formally verified program synthesis

GRB 250702B: Discovery of a Gamma-Ray Burst from a Black Hole Falling into a Star

Kavli Affiliate: Erin Kara | First 5 Authors: Eliza Neights, Eliza Neights, , , | Summary: Gamma-ray bursts are the most luminous electromagnetic events in the universe. Their prompt gamma-ray emission has typical durations between a fraction of a second and several minutes. A rare subset of these events have durations in excess of a […]


Continue.. GRB 250702B: Discovery of a Gamma-Ray Burst from a Black Hole Falling into a Star

Comprehensive X-ray Observations of the Exceptional Ultra-long X-ray and Gamma-ray Transient GRB 250702B with Swift, NuSTAR, and Chandra: Insights from the X-ray Afterglow Properties

Kavli Affiliate: Dheeraj Pasham | First 5 Authors: Brendan O’Connor, Brendan O’Connor, , , | Summary: GRB 250702B is an exceptional transient that produced multiple episodes of luminous gamma-ray radiation lasting for $>25$ ks, placing it among the class of ultra-long gamma-ray bursts (GRBs). However, unlike any known GRB, the textitEinstein Probe detected soft X-ray […]


Continue.. Comprehensive X-ray Observations of the Exceptional Ultra-long X-ray and Gamma-ray Transient GRB 250702B with Swift, NuSTAR, and Chandra: Insights from the X-ray Afterglow Properties

Optical/infrared observations of the extraordinary GRB 250702B: a highly obscured afterglow in a massive galaxy consistent with multiple possible progenitors

Kavli Affiliate: Dheeraj Pasham | First 5 Authors: Jonathan Carney, Jonathan Carney, , , | Summary: GRB 250702B was the longest gamma-ray burst ever observed, with a duration that challenges standard collapsar models and suggests an exotic progenitor. We collected a rich set of optical and infrared follow-up observations of its rapidly fading afterglow using […]


Continue.. Optical/infrared observations of the extraordinary GRB 250702B: a highly obscured afterglow in a massive galaxy consistent with multiple possible progenitors

A Splashback-like Feature of Central Galaxies in Galaxy Clusters

Kavli Affiliate: Eli S. Rykoff | First 5 Authors: Yuanyuan Zhang, Yuanyuan Zhang, , , | Summary: We investigate a splashback-like feature in the outer region of central galaxies (CGs) in clusters. This feature is detected as a "dip" in the radial slope of the CG surface brightness, derived through the stacking of Dark Energy […]


Continue.. A Splashback-like Feature of Central Galaxies in Galaxy Clusters

Universality of Shallow Global Quenches in Critical Spin Chains

Kavli Affiliate: Joel E. Moore | First 5 Authors: Julia Wei, Julia Wei, , , | Summary: Measuring universal data in the strongly correlated regime of quantum critical points remains a fundamental objective for quantum simulators. In foundational work, Calabrese and Cardy demonstrated how this data governs the dynamics of certain global quenches to 1+1-dimensional […]


Continue.. Universality of Shallow Global Quenches in Critical Spin Chains

VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing

Kavli Affiliate: Ke Wang | First 5 Authors: Ke Wang, Ke Wang, , , | Summary: The growing capabilities of large language models and multimodal systems have spurred interest in voice-first AI assistants, yet existing benchmarks are inadequate for evaluating the full range of these systems’ capabilities. We introduce VoiceAssistant-Eval, a comprehensive benchmark designed to […]


Continue.. VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing

WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning

Kavli Affiliate: Ke Wang | First 5 Authors: Zimu Lu, Zimu Lu, , , | Summary: Agent systems powered by large language models (LLMs) have demonstrated impressive performance on repository-level code-generation tasks. However, for tasks such as website codebase generation, which depend heavily on visual effects and user-interaction feedback, current code agents rely only on […]


Continue.. WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning

EgoDemoGen: Novel Egocentric Demonstration Generation Enables Viewpoint-Robust Manipulation

Kavli Affiliate: Zheng Zhu | First 5 Authors: Yuan Xu, Yuan Xu, , , | Summary: Imitation learning based policies perform well in robotic manipulation, but they often degrade under *egocentric viewpoint shifts* when trained from a single egocentric viewpoint. To address this issue, we present **EgoDemoGen**, a framework that generates *paired* novel egocentric demonstrations […]


Continue.. EgoDemoGen: Novel Egocentric Demonstration Generation Enables Viewpoint-Robust Manipulation

EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer

Kavli Affiliate: Zheng Zhu | First 5 Authors: Zhehao Dong, Zhehao Dong, , , | Summary: Vision-language-action (VLA) models increasingly rely on diverse training data to achieve robust generalization. However, collecting large-scale real-world robot manipulation data across varied object appearances and environmental conditions remains prohibitively time-consuming and expensive. To overcome this bottleneck, we propose Embodied […]


Continue.. EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer