EquiContact: A Hierarchical SE(3) Vision-to-Force Equivariant Policy for Spatially Generalizable Contact-rich Tasks

Kavli Affiliate: Xiang Zhang

| First 5 Authors: Joohwan Seo, Joohwan Seo, , ,

| Summary:

This paper presents a framework for learning vision-based robotic policies
for contact-rich manipulation tasks that generalize spatially across task
configurations. We focus on achieving robust spatial generalization of the
policy for the peg-in-hole (PiH) task trained from a small number of
demonstrations. We propose EquiContact, a hierarchical policy composed of a
high-level vision planner (Diffusion Equivariant Descriptor Field, Diff-EDF)
and a novel low-level compliant visuomotor policy (Geometric Compliant ACT,
G-CompACT). G-CompACT operates using only localized observations (geometrically
consistent error vectors (GCEV), force-torque readings, and wrist-mounted RGB
images) and produces actions defined in the end-effector frame. Through these
design choices, we show that the entire EquiContact pipeline is
SE(3)-equivariant, from perception to force control. We also outline three key
components for spatially generalizable contact-rich policies: compliance,
localized policies, and induced equivariance. Real-world experiments on PiH
tasks demonstrate a near-perfect success rate and robust generalization to
unseen spatial configurations, validating the proposed framework and
principles. The experimental videos can be found on the project website:
https://sites.google.com/berkeley.edu/equicontact

| Search Query: ArXiv Query: search_query=au:”Xiang Zhang”&id_list=&start=0&max_results=3