Project Aria: A New Tool for Egocentric Multi-Modal AI Research

Kavli Affiliate: Cheng Peng | First 5 Authors: Jakob Engel, Kiran Somasundaram, Michael Goesele, Albert Sun, Alexander Gamino | Summary: Egocentric, multi-modal data as available on future augmented reality (AR) devices provides unique challenges and opportunities for machine perception. These future devices will need to be all-day wearable in a socially acceptable form-factor to support […]


Continue.. Project Aria: A New Tool for Egocentric Multi-Modal AI Research

LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition

Kavli Affiliate: Cheng Peng | First 5 Authors: Changxu Cheng, Peng Wang, Cheng Da, Qi Zheng, Cong Yao | Summary: The diversity in length constitutes a significant characteristic of text. Due to the long-tail distribution of text lengths, most existing methods for scene text recognition (STR) only work well on short or seen-length text, lacking […]


Continue.. LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition

GRIP: Generating Interaction Poses Using Latent Consistency and Spatial Cues

Kavli Affiliate: Yi Zhou | First 5 Authors: Omid Taheri, Yi Zhou, Dimitrios Tzionas, Yang Zhou, Duygu Ceylan | Summary: Hands are dexterous and highly versatile manipulators that are central to how humans interact with objects and their environment. Consequently, modeling realistic hand-object interactions, including the subtle motion of individual fingers, is critical for applications […]


Continue.. GRIP: Generating Interaction Poses Using Latent Consistency and Spatial Cues

COCA: Classifier-Oriented Calibration for Source-Free Universal Domain Adaptation via Textual Prototype

Kavli Affiliate: Yi Zhou | First 5 Authors: Xinghong Liu, Yi Zhou, Tao Zhou, Chun-Mei Feng, Ling Shao | Summary: Universal Domain Adaptation (UniDA) aims to distinguish common and private classes between the source and target domains where domain shift exists. Recently, due to more stringent data restrictions, researchers have introduced Source-Free UniDA (SF-UniDA) in […]


Continue.. COCA: Classifier-Oriented Calibration for Source-Free Universal Domain Adaptation via Textual Prototype

COCA: Classifier-Oriented Calibration via Textual Prototype for Source-Free Universal Domain Adaptation

Kavli Affiliate: Yi Zhou | First 5 Authors: Xinghong Liu, Yi Zhou, Tao Zhou, Chun-Mei Feng, Ling Shao | Summary: Universal domain adaptation (UniDA) aims to address domain and category shifts across data sources. Recently, due to more stringent data restrictions, researchers have introduced source-free UniDA (SF-UniDA). SF-UniDA methods eliminate the need for direct access […]


Continue.. COCA: Classifier-Oriented Calibration via Textual Prototype for Source-Free Universal Domain Adaptation

COCA: Classifier-Oriented Calibration via Textual Prototype for Source-Free Universal Domain Adaptation

Kavli Affiliate: Yi Zhou | First 5 Authors: Xinghong Liu, Yi Zhou, Tao Zhou, Chun-Mei Feng, Ling Shao | Summary: Universal domain adaptation (UniDA) aims to address domain and category shifts across data sources. Recently, due to more stringent data restrictions, researchers have introduced source-free UniDA (SF-UniDA). SF-UniDA methods eliminate the need for direct access […]


Continue.. COCA: Classifier-Oriented Calibration via Textual Prototype for Source-Free Universal Domain Adaptation

Tree-of-Mixed-Thought: Combining Fast and Slow Thinking for Multi-hop Visual Reasoning

Kavli Affiliate: Yi Zhou | First 5 Authors: Pengbo Hu, Ji Qi, Xingyu Li, Hong Li, Xinqi Wang | Summary: There emerges a promising trend of using large language models (LLMs) to generate code-like plans for complex inference tasks such as visual reasoning. This paradigm, known as LLM-based planning, provides flexibility in problem solving and […]


Continue.. Tree-of-Mixed-Thought: Combining Fast and Slow Thinking for Multi-hop Visual Reasoning

Boosting Multi-modal Model Performance with Adaptive Gradient Modulation

Kavli Affiliate: Yi Zhou | First 5 Authors: Hong Li, Xingyu Li, Pengbo Hu, Yinuo Lei, Chunxiao Li | Summary: While the field of multi-modal learning keeps growing fast, the deficiency of the standard joint training paradigm has become clear through recent studies. They attribute the sub-optimal performance of the jointly trained model to the […]


Continue.. Boosting Multi-modal Model Performance with Adaptive Gradient Modulation

Unveiling Correlated Topological Insulators through Fermionic Tensor Network States — Classification, Edge Theories and Variational Wavefunctions

Kavli Affiliate: Shenghan Jiang | First 5 Authors: Chao Xu, Yixin Ma, Shenghan Jiang, , | Summary: The study of topological band insulators has revealed fascinating phases characterized by band topology indices, harboring extraordinary boundary modes protected by anomalous symmetry actions. In strongly correlated systems, where the traditional notion of electronic bands becomes obsolete, it […]


Continue.. Unveiling Correlated Topological Insulators through Fermionic Tensor Network States — Classification, Edge Theories and Variational Wavefunctions

Unveiling Correlated Two-dimensional Topological Insulators through Fermionic Tensor Network States — Classification, Edge Theories and Variational Wavefunctions

Kavli Affiliate: Shenghan Jiang | First 5 Authors: Chao Xu, Yixin Ma, Shenghan Jiang, , | Summary: The study of topological band insulators has revealed fascinating phases characterized by band topology indices and anomalous boundary modes protected by global symmetries. In strongly correlated systems, where the traditional notion of electronic bands becomes obsolete, it has […]


Continue.. Unveiling Correlated Two-dimensional Topological Insulators through Fermionic Tensor Network States — Classification, Edge Theories and Variational Wavefunctions