Kavli Affiliate: Li Xin Li| First 5 Authors: [#item_custom_name[1, [#item_custom_name[2, [#item_custom_name[3, [#item_custom_name[4, [#item_custom_name[5| Summary:Text-to-image (T2I) models have achieved remarkable progress, yet they continue to struggle with complex prompts that require simultaneously handling multiple objects, relations, and attributes. Existing inference-time strategies, such as parallel sampling with verifiers or simply increasing denoising steps, can improve prompt alignment […]
Continue.. Iterative Refinement Improves Compositional Image Generation