Humanoid Occupancy: Enabling A Generalized Multimodal Occupancy Perception System on Humanoid Robots

Kavli Affiliate: Zheng Zhu

| First 5 Authors: Wei Cui, Wei Cui, , ,

| Summary:

Humanoid robot technology is advancing rapidly, with manufacturers
introducing diverse heterogeneous visual perception modules tailored to
specific scenarios. Among various perception paradigms, occupancy-based
representation has become widely recognized as particularly suitable for
humanoid robots, as it provides both rich semantic and 3D geometric information
essential for comprehensive environmental understanding. In this work, we
present Humanoid Occupancy, a generalized multimodal occupancy perception
system that integrates hardware and software components, data acquisition
devices, and a dedicated annotation pipeline. Our framework employs advanced
multi-modal fusion techniques to generate grid-based occupancy outputs encoding
both occupancy status and semantic labels, thereby enabling holistic
environmental understanding for downstream tasks such as task planning and
navigation. To address the unique challenges of humanoid robots, we overcome
issues such as kinematic interference and occlusion, and establish an effective
sensor layout strategy. Furthermore, we have developed the first panoramic
occupancy dataset specifically for humanoid robots, offering a valuable
benchmark and resource for future research and development in this domain. The
network architecture incorporates multi-modal feature fusion and temporal
information integration to ensure robust perception. Overall, Humanoid
Occupancy delivers effective environmental perception for humanoid robots and
establishes a technical foundation for standardizing universal visual modules,
paving the way for the widespread deployment of humanoid robots in complex
real-world scenarios.

| Search Query: ArXiv Query: search_query=au:”Zheng Zhu”&id_list=&start=0&max_results=3