Iterative Refinement Improves Compositional Image Generation

Kavli Affiliate: Li Xin Li| First 5 Authors: [#item_custom_name[1, [#item_custom_name[2, [#item_custom_name[3, [#item_custom_name[4, [#item_custom_name[5| Summary:Text-to-image (T2I) models have achieved remarkable progress, yet they continue to struggle with complex prompts that require simultaneously handling multiple objects, relations, and attributes. Existing inference-time strategies, such as parallel sampling with verifiers or simply increasing denoising steps, can improve prompt alignment […]


Continue.. Iterative Refinement Improves Compositional Image Generation

Walk through Paintings: Egocentric World Models from Internet Priors

Kavli Affiliate: Lile Wang| First 5 Authors: [#item_custom_name[1, [#item_custom_name[2, [#item_custom_name[3, [#item_custom_name[4, [#item_custom_name[5| Summary:What if a video generation model could not only imagine a plausible future, but the correct one, accurately reflecting how the world changes with each action? We address this question by presenting the Egocentric World Model (EgoWM), a simple, architecture-agnostic method that transforms […]


Continue.. Walk through Paintings: Egocentric World Models from Internet Priors

Rethinking Video Generation Model for the Embodied World

Kavli Affiliate: Li Xin Li| First 5 Authors: [#item_custom_name[1, [#item_custom_name[2, [#item_custom_name[3, [#item_custom_name[4, [#item_custom_name[5| Summary:Video generation models have significantly advanced embodied intelligence, unlocking new possibilities for generating diverse robot data that capture perception, reasoning, and action in the physical world. However, synthesizing high-quality videos that accurately reflect real-world robotic interactions remains challenging, and the lack of […]


Continue.. Rethinking Video Generation Model for the Embodied World

StableWorld: Towards Stable and Consistent Long Interactive Video Generation

Kavli Affiliate: Li Xin Li| First 5 Authors: [#item_custom_name[1, [#item_custom_name[2, [#item_custom_name[3, [#item_custom_name[4, [#item_custom_name[5| Summary:In this paper, we explore the overlooked challenge of stability and temporal consistency in interactive video generation, which synthesizes dynamic and controllable video worlds through interactive behaviors such as camera movements and text prompts. Despite remarkable progress in world modeling, current methods […]


Continue.. StableWorld: Towards Stable and Consistent Long Interactive Video Generation

Evaluation of Large Language Models in Legal Applications: Challenges, Methods, and Future Directions

Kavli Affiliate: Lile Wang| First 5 Authors: [#item_custom_name[1, [#item_custom_name[2, [#item_custom_name[3, [#item_custom_name[4, [#item_custom_name[5| Summary:Large language models (LLMs) are being increasingly integrated into legal applications, including judicial decision support, legal practice assistance, and public-facing legal services. While LLMs show strong potential in handling legal knowledge and tasks, their deployment in real-world legal settings raises critical concerns beyond […]


Continue.. Evaluation of Large Language Models in Legal Applications: Challenges, Methods, and Future Directions

QDK/Chemistry: A Modular Toolkit for Quantum Chemistry Applications

Kavli Affiliate: Hsiaowen Chen| First 5 Authors: [#item_custom_name[1, [#item_custom_name[2, [#item_custom_name[3, [#item_custom_name[4, [#item_custom_name[5| Summary:We present QDK/Chemistry, a software toolkit for quantum chemistry workflows targeting quantum computers. The toolkit addresses a key challenge in the field: while quantum algorithms for chemistry have matured considerably, the infrastructure connecting classical electronic structure calculations to quantum circuit execution remains fragmented. […]


Continue.. QDK/Chemistry: A Modular Toolkit for Quantum Chemistry Applications

FlowSSC: Universal Generative Monocular Semantic Scene Completion via One-Step Latent Diffusion

Kavli Affiliate: Hsiaowen Chen| First 5 Authors: [#item_custom_name[1, [#item_custom_name[2, [#item_custom_name[3, [#item_custom_name[4, [#item_custom_name[5| Summary:Semantic Scene Completion (SSC) from monocular RGB images is a fundamental yet challenging task due to the inherent ambiguity of inferring occluded 3D geometry from a single view. While feed-forward methods have made progress, they often struggle to generate plausible details in occluded […]


Continue.. FlowSSC: Universal Generative Monocular Semantic Scene Completion via One-Step Latent Diffusion

Above Room Temperature Ferroelectricity in Epitaxially Strained KTaO3

Kavli Affiliate: Darrell Schlom| First 5 Authors: Tobias Schwaigert, Tobias Schwaigert, , , | Summary:Epitaxial strain is a powerful means to engineer emergent phenomena in thin films and heterostructures. Here, we demonstrate that KTaO3, a cubic perovskite in bulk form, can be epitaxially strained into a highly tunable ferroelectric. KTaO3 films grown commensurate to SrTiO3 […]


Continue.. Above Room Temperature Ferroelectricity in Epitaxially Strained KTaO3

Above Room Temperature Ferroelectricity in Epitaxially Strained KTaO3

Kavli Affiliate: David Muller| First 5 Authors: Tobias Schwaigert, Tobias Schwaigert, , , | Summary:Epitaxial strain is a powerful means to engineer emergent phenomena in thin films and heterostructures. Here, we demonstrate that KTaO3, a cubic perovskite in bulk form, can be epitaxially strained into a highly tunable ferroelectric. KTaO3 films grown commensurate to SrTiO3 […]


Continue.. Above Room Temperature Ferroelectricity in Epitaxially Strained KTaO3

Implicit Neural Representation Facilitates Unified Universal Vision Encoding

Kavli Affiliate: Lile Wang| First 5 Authors: [#item_custom_name[1, [#item_custom_name[2, [#item_custom_name[3, [#item_custom_name[4, [#item_custom_name[5| Summary:Models for image representation learning are typically designed for either recognition or generation. Various forms of contrastive learning help models learn to convert images to embeddings that are useful for classification, detection, and segmentation. On the other hand, models can be trained to […]


Continue.. Implicit Neural Representation Facilitates Unified Universal Vision Encoding