Learning to Decompose Visual Features with Latent Textual Prompts

Kavli Affiliate: Feng Wang | First 5 Authors: Feng Wang, Manling Li, Xudong Lin, Hairong Lv, Alexander G. Schwing | Summary: Recent advances in pre-training vision-language models like CLIP have shown great potential in learning transferable visual representations. Nonetheless, for downstream inference, CLIP-like models suffer from either 1) degraded accuracy and robustness in the case […]


Continue.. Learning to Decompose Visual Features with Latent Textual Prompts