VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing

Kavli Affiliate: Ke Wang | First 5 Authors: Ke Wang, Ke Wang, , , | Summary: The growing capabilities of large language models and multimodal systems have spurred interest in voice-first AI assistants, yet existing benchmarks are inadequate for evaluating the full range of these systems’ capabilities. We introduce VoiceAssistant-Eval, a comprehensive benchmark designed to […]


Continue.. VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing

WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning

Kavli Affiliate: Ke Wang | First 5 Authors: Zimu Lu, Zimu Lu, , , | Summary: Agent systems powered by large language models (LLMs) have demonstrated impressive performance on repository-level code-generation tasks. However, for tasks such as website codebase generation, which depend heavily on visual effects and user-interaction feedback, current code agents rely only on […]


Continue.. WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning

EgoDemoGen: Novel Egocentric Demonstration Generation Enables Viewpoint-Robust Manipulation

Kavli Affiliate: Zheng Zhu | First 5 Authors: Yuan Xu, Yuan Xu, , , | Summary: Imitation learning based policies perform well in robotic manipulation, but they often degrade under *egocentric viewpoint shifts* when trained from a single egocentric viewpoint. To address this issue, we present **EgoDemoGen**, a framework that generates *paired* novel egocentric demonstrations […]


Continue.. EgoDemoGen: Novel Egocentric Demonstration Generation Enables Viewpoint-Robust Manipulation

EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer

Kavli Affiliate: Zheng Zhu | First 5 Authors: Zhehao Dong, Zhehao Dong, , , | Summary: Vision-language-action (VLA) models increasingly rely on diverse training data to achieve robust generalization. However, collecting large-scale real-world robot manipulation data across varied object appearances and environmental conditions remains prohibitively time-consuming and expensive. To overcome this bottleneck, we propose Embodied […]


Continue.. EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer

UrbanFeel: A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective

Kavli Affiliate: Xiang Zhang | First 5 Authors: Jun He, Jun He, , , | Summary: Urban development impacts over half of the global population, making human-centered understanding of its structural and perceptual changes essential for sustainable development. While Multimodal Large Language Models (MLLMs) have shown remarkable capabilities across various domains, existing benchmarks that explore […]


Continue.. UrbanFeel: A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective

MimicDreamer: Aligning Human and Robot Demonstrations for Scalable VLA Training

Kavli Affiliate: Zheng Zhu | First 5 Authors: Haoyun Li, Haoyun Li, , , | Summary: Vision Language Action (VLA) models derive their generalization capability from diverse training data, yet collecting embodied robot interaction data remains prohibitively expensive. In contrast, human demonstration videos are far more scalable and cost-efficient to collect, and recent studies confirm […]


Continue.. MimicDreamer: Aligning Human and Robot Demonstrations for Scalable VLA Training

Towards Understanding Feature Learning in Parameter Transfer

Kavli Affiliate: Jing Wang | First 5 Authors: Hua Yuan, Hua Yuan, , , | Summary: Parameter transfer is a central paradigm in transfer learning, enabling knowledge reuse across tasks and domains by sharing model parameters between upstream and downstream models. However, when only a subset of parameters from the upstream model is transferred to […]


Continue.. Towards Understanding Feature Learning in Parameter Transfer

A nearly pristine star from the Large Magellanic Cloud

Kavli Affiliate: Alexander P. Ji | First 5 Authors: Alexander P. Ji, Alexander P. Ji, , , | Summary: The first stars formed out of pristine gas, causing them to be so massive that none are expected to have survived until today. If their direct descendants were sufficiently low-mass stars, they could exist today and […]


Continue.. A nearly pristine star from the Large Magellanic Cloud

BlackTHUNDER: evidence for three massive black holes in a z~5 galaxy

Kavli Affiliate: Roberto Maiolino | First 5 Authors: Hannah Übler, Hannah Übler, , , | Summary: We present observational evidence for three massive, accreting black holes in the $z=5.0167$ galaxy J0148-4214 from JWST/NIRSpec-IFU spectroscopy. The black holes are revealed through broad H$alpha$ emission (FWHM = 430-2920 km/s) without a forbidden-line counterpart in the bright [O […]


Continue.. BlackTHUNDER: evidence for three massive black holes in a z~5 galaxy

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

Kavli Affiliate: Jing Wang | First 5 Authors: Sicong Leng, Sicong Leng, , , | Summary: Large multimodal reasoning models have achieved rapid progress, but their advancement is constrained by two major limitations: the absence of open, large-scale, high-quality long chain-of-thought (CoT) data, and the instability of reinforcement learning (RL) algorithms in post-training. Group Relative […]


Continue.. MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources