UCAS – Kavli Institute Pre-Print Publications

Character Mixing for Video Generation

Posted by klaurent October 6, 2025October 7, 2025This Week: Theoretical Physics, UCAS

Kavli Affiliate: Yi Zhou | First 5 Authors: Tingting Liao, Tingting Liao, , , | Summary: Imagine Mr. Bean stepping into Tom and Jerry–can we generate videos where characters interact naturally across different worlds? We study inter-character interaction in text-to-video generation, where the key challenge is to preserve each character’s identity and behaviors while enabling […]

Continue..

Multi-Modal Multi-Task Semantic Communication: A Distributed Information Bottleneck Perspective

Posted by klaurent October 5, 2025October 7, 2025This Week: Theoretical Physics, UCAS

Kavli Affiliate: Cheng Peng | First 5 Authors: Yujie Zhou, Yujie Zhou, , , | Summary: Semantic communication (SemCom) shifts the focus from data transmission to meaning delivery, enabling efficient and intelligent communication. Existing AI-based coding schemes for multi-modal multi-task SemCom often require transmitters with full-modal data to participate in all receivers’ tasks, which leads […]

Continue..

Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction

Posted by klaurent October 3, 2025October 7, 2025This Week: Theoretical Physics, UCAS

Kavli Affiliate: Cheng Peng | First 5 Authors: Kaisi Guan, Kaisi Guan, , , | Summary: This study focuses on a challenging yet promising task, Text-to-Sounding-Video (T2SV) generation, which aims to generate a video with synchronized audio from text conditions, meanwhile ensuring both modalities are aligned with text. Despite progress in joint audio-video training, two […]

Continue..

VLA-R1: Enhancing Reasoning in Vision-Language-Action Models

Posted by klaurent October 2, 2025October 7, 2025This Week: Theoretical Physics, UCAS

Kavli Affiliate: Zheng Zhu | First 5 Authors: Angen Ye, Angen Ye, , , | Summary: Vision-Language-Action (VLA) models aim to unify perception, language understanding, and action generation, offering strong cross-task and cross-scene generalization with broad impact on embodied AI. However, current VLA models often lack explicit step-by-step reasoning, instead emitting final actions without considering […]

Continue..

EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory

Posted by klaurent October 1, 2025October 7, 2025This Week: Theoretical Physics, UCAS

Kavli Affiliate: Cheng Peng | First 5 Authors: Jiahao Wang, Jiahao Wang, , , | Summary: Humans possess a remarkable ability to mentally explore and replay 3D environments they have previously experienced. Inspired by this mental process, we present EvoWorld: a world model that bridges panoramic video generation with evolving 3D memory to enable spatially […]

Continue..

Charge and Valley Hydrodynamics in the Quantum Hall Regime of Gapped Graphene

Posted by klaurent October 1, 2025October 7, 2025This Week: Theoretical Physics, UCAS

Kavli Affiliate: Mamoru Matsuo | First 5 Authors: Danyu Shu, Danyu Shu, , , | Summary: We develop a unified viscous hydrodynamics for charge and valley transport in gapped graphene in the quantum Hall regime. We redefine Hall viscosity as a response to static electric-field gradients instead of strain, establishing a derivative hierarchy that fundamentally […]

Continue..

Erased, But Not Forgotten: Erased Rectified Flow Transformers Still Remain Unsafe Under Concept Attack

Posted by klaurent October 1, 2025October 7, 2025This Week: Theoretical Physics, UCAS

Kavli Affiliate: Zheng Zhu | First 5 Authors: Nanxiang Jiang, Nanxiang Jiang, , , | Summary: Recent advances in text-to-image (T2I) diffusion models have enabled impressive generative capabilities, but they also raise significant safety concerns due to the potential to produce harmful or undesirable content. While concept erasure has been explored as a mitigation strategy, […]

Continue..

Triangle Splatting+: Differentiable Rendering with Opaque Triangles

Posted by klaurent September 29, 2025October 7, 2025This Week: Theoretical Physics, UCAS

Kavli Affiliate: Yi Zhou | First 5 Authors: Jan Held, Jan Held, , , | Summary: Reconstructing 3D scenes and synthesizing novel views has seen rapid progress in recent years. Neural Radiance Fields demonstrated that continuous volumetric radiance fields can achieve high-quality image synthesis, but their long training and rendering times limit practicality. 3D Gaussian […]

Continue..

EgoDemoGen: Novel Egocentric Demonstration Generation Enables Viewpoint-Robust Manipulation

Posted by klaurent September 26, 2025October 7, 2025UCAS

Kavli Affiliate: Zheng Zhu | First 5 Authors: Yuan Xu, Yuan Xu, , , | Summary: Imitation learning based policies perform well in robotic manipulation, but they often degrade under *egocentric viewpoint shifts* when trained from a single egocentric viewpoint. To address this issue, we present **EgoDemoGen**, a framework that generates *paired* novel egocentric demonstrations […]

Continue..

EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer

Posted by klaurent September 26, 2025October 7, 2025UCAS

Kavli Affiliate: Zheng Zhu | First 5 Authors: Zhehao Dong, Zhehao Dong, , , | Summary: Vision-language-action (VLA) models increasingly rely on diverse training data to achieve robust generalization. However, collecting large-scale real-world robot manipulation data across varied object appearances and environmental conditions remains prohibitively time-consuming and expensive. To overcome this bottleneck, we propose Embodied […]

Continue..