Kavli Affiliate: Jia Liu
| First 5 Authors: Jianhui Zhang, Jianhui Zhang, , ,
| Summary:
In this work, we present Patch-Adapter, an effective framework for
high-resolution text-guided image inpainting. Unlike existing methods limited
to lower resolutions, our approach achieves 4K+ resolution while maintaining
precise content consistency and prompt alignment, two critical challenges in
image inpainting that intensify with increasing resolution and texture
complexity. Patch-Adapter leverages a two-stage adapter architecture to scale
the diffusion model’s resolution from 1K to 4K+ without requiring structural
overhauls: (1) Dual Context Adapter learns coherence between masked and
unmasked regions at reduced resolutions to establish global structural
consistency; and (2) Reference Patch Adapter implements a patch-level attention
mechanism for full-resolution inpainting, preserving local detail fidelity
through adaptive feature fusion. This dual-stage architecture uniquely
addresses the scalability gap in high-resolution inpainting by decoupling
global semantics from localized refinement. Experiments demonstrate that
Patch-Adapter not only resolves artifacts common in large-scale inpainting but
also achieves state-of-the-art performance on the OpenImages and
Photo-Concept-Bucket datasets, outperforming existing methods in both
perceptual quality and text-prompt adherence.
| Search Query: ArXiv Query: search_query=au:”Jia Liu”&id_list=&start=0&max_results=3