Kavli Affiliate: Matthew Fisher
| First 5 Authors: Titas Anciukevičius, Zexiang Xu, Matthew Fisher, Paul Henderson, Hakan Bilen
| Summary:
Diffusion models currently achieve state-of-the-art performance for both
conditional and unconditional image generation. However, so far, image
diffusion models do not support tasks required for 3D understanding, such as
view-consistent 3D generation or single-view object reconstruction. In this
paper, we present RenderDiffusion, the first diffusion model for 3D generation
and inference, trained using only monocular 2D supervision. Central to our
method is a novel image denoising architecture that generates and renders an
intermediate three-dimensional representation of a scene in each denoising
step. This enforces a strong inductive structure within the diffusion process,
providing a 3D consistent representation while only requiring 2D supervision.
The resulting 3D representation can be rendered from any view. We evaluate
RenderDiffusion on FFHQ, AFHQ, ShapeNet and CLEVR datasets, showing competitive
performance for generation of 3D scenes and inference of 3D scenes from 2D
images. Additionally, our diffusion-based approach allows us to use 2D
inpainting to edit 3D scenes.
| Search Query: ArXiv Query: search_query=au:”Matthew Fisher”&id_list=&start=0&max_results=3