EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer

Kavli Affiliate: Zheng Zhu | First 5 Authors: Zhehao Dong, Zhehao Dong, , , | Summary: Vision-language-action (VLA) models increasingly rely on diverse training data to achieve robust generalization. However, collecting large-scale real-world robot manipulation data across varied object appearances and environmental conditions remains prohibitively time-consuming and expensive. To overcome this bottleneck, we propose Embodied […]


Continue.. EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer

UrbanFeel: A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective

Kavli Affiliate: Xiang Zhang | First 5 Authors: Jun He, Jun He, , , | Summary: Urban development impacts over half of the global population, making human-centered understanding of its structural and perceptual changes essential for sustainable development. While Multimodal Large Language Models (MLLMs) have shown remarkable capabilities across various domains, existing benchmarks that explore […]


Continue.. UrbanFeel: A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective

MimicDreamer: Aligning Human and Robot Demonstrations for Scalable VLA Training

Kavli Affiliate: Zheng Zhu | First 5 Authors: Haoyun Li, Haoyun Li, , , | Summary: Vision Language Action (VLA) models derive their generalization capability from diverse training data, yet collecting embodied robot interaction data remains prohibitively expensive. In contrast, human demonstration videos are far more scalable and cost-efficient to collect, and recent studies confirm […]


Continue.. MimicDreamer: Aligning Human and Robot Demonstrations for Scalable VLA Training

MimicDreamer: Aligning Human and Robot Demonstrations for Scalable VLA Training

Kavli Affiliate: Zheng Zhu | First 5 Authors: Haoyun Li, Haoyun Li, , , | Summary: Vision Language Action (VLA) models derive their generalization capability from diverse training data, yet collecting embodied robot interaction data remains prohibitively expensive. In contrast, human demonstration videos are far more scalable and cost-efficient to collect, and recent studies confirm […]


Continue.. MimicDreamer: Aligning Human and Robot Demonstrations for Scalable VLA Training

Towards Understanding Feature Learning in Parameter Transfer

Kavli Affiliate: Jing Wang | First 5 Authors: Hua Yuan, Hua Yuan, , , | Summary: Parameter transfer is a central paradigm in transfer learning, enabling knowledge reuse across tasks and domains by sharing model parameters between upstream and downstream models. However, when only a subset of parameters from the upstream model is transferred to […]


Continue.. Towards Understanding Feature Learning in Parameter Transfer

A nearly pristine star from the Large Magellanic Cloud

Kavli Affiliate: Alexander P. Ji | First 5 Authors: Alexander P. Ji, Alexander P. Ji, , , | Summary: The first stars formed out of pristine gas, causing them to be so massive that none are expected to have survived until today. If their direct descendants were sufficiently low-mass stars, they could exist today and […]


Continue.. A nearly pristine star from the Large Magellanic Cloud

BlackTHUNDER: evidence for three massive black holes in a z~5 galaxy

Kavli Affiliate: Debora Sijacki | First 5 Authors: Hannah Übler, Hannah Übler, , , | Summary: We present observational evidence for three massive, accreting black holes in the $z=5.0167$ galaxy J0148-4214 from JWST/NIRSpec-IFU spectroscopy. The black holes are revealed through broad H$alpha$ emission (FWHM = 430-2920 km/s) without a forbidden-line counterpart in the bright [O […]


Continue.. BlackTHUNDER: evidence for three massive black holes in a z~5 galaxy

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

Kavli Affiliate: Jing Wang | First 5 Authors: Sicong Leng, Sicong Leng, , , | Summary: Large multimodal reasoning models have achieved rapid progress, but their advancement is constrained by two major limitations: the absence of open, large-scale, high-quality long chain-of-thought (CoT) data, and the instability of reinforcement learning (RL) algorithms in post-training. Group Relative […]


Continue.. MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

RollPacker: Mitigating Long-Tail Rollouts for Fast, Synchronous RL Post-Training

Kavli Affiliate: Wei Gao | First 5 Authors: Wei Gao, Wei Gao, , , | Summary: Reinforcement Learning (RL) is a pivotal post-training technique for enhancing the reasoning capabilities of Large Language Models (LLMs). However, synchronous RL post-training often suffers from significant GPU underutilization, referred to as bubbles, caused by imbalanced response lengths within rollout […]


Continue.. RollPacker: Mitigating Long-Tail Rollouts for Fast, Synchronous RL Post-Training

Tracking spin qubit frequency variations over 912 days

Kavli Affiliate: Giordano Scappucci | First 5 Authors: Kenji Capannelli, Kenji Capannelli, , , | Summary: Solid-state qubits are sensitive to their microscopic environment, causing the qubit properties to fluctuate on a wide range of timescales. The sub-Hz end of the spectrum is usually dealt with by repeated background calibrations, which bring considerable overhead. It […]


Continue.. Tracking spin qubit frequency variations over 912 days