OmniControlNet: Dual-stage Integration for Conditional Image Generation

Kavli Affiliate: Xiang Zhang

| First 5 Authors: Yilin Wang, Haiyang Xu, Xiang Zhang, Zeyuan Chen, Zhizhou Sha

| Summary:

We provide a two-way integration for the widely adopted ControlNet by
integrating external condition generation algorithms into a single dense
prediction method and incorporating its individually trained image generation
processes into a single model. Despite its tremendous success, the ControlNet
of a two-stage pipeline bears limitations in being not self-contained (e.g.
calls the external condition generation algorithms) with a large model
redundancy (separately trained models for different types of conditioning
inputs). Our proposed OmniControlNet consolidates 1) the condition generation
(e.g., HED edges, depth maps, user scribble, and animal pose) by a single
multi-tasking dense prediction algorithm under the task embedding guidance and
2) the image generation process for different conditioning types under the
textual embedding guidance. OmniControlNet achieves significantly reduced model
complexity and redundancy while capable of producing images of comparable
quality for conditioned text-to-image generation.

| Search Query: ArXiv Query: search_query=au:”Xiang Zhang”&id_list=&start=0&max_results=3