Zero-shot Medical Image Translation via Frequency-Guided Diffusion Models

Kavli Affiliate: Jing Wang

| First 5 Authors: Yunxiang Li, Hua-Chieh Shao, Xiao Liang, Liyuan Chen, Ruiqi Li

| Summary:

Recently, the diffusion model has emerged as a superior generative model that
can produce high quality and realistic images. However, for medical image
translation, the existing diffusion models are deficient in accurately
retaining structural information since the structure details of source domain
images are lost during the forward diffusion process and cannot be fully
recovered through learned reverse diffusion, while the integrity of anatomical
structures is extremely important in medical images. For instance, errors in
image translation may distort, shift, or even remove structures and tumors,
leading to incorrect diagnosis and inadequate treatments. Training and
conditioning diffusion models using paired source and target images with
matching anatomy can help. However, such paired data are very difficult and
costly to obtain, and may also reduce the robustness of the developed model to
out-of-distribution testing data. We propose a frequency-guided diffusion model
(FGDM) that employs frequency-domain filters to guide the diffusion model for
structure-preserving image translation. Based on its design, FGDM allows
zero-shot learning, as it can be trained solely on the data from the target
domain, and used directly for source-to-target domain translation without any
exposure to the source-domain data during training. We trained FGDM solely on
the head-and-neck CT data, and evaluated it on both head-and-neck and lung
cone-beam CT (CBCT)-to-CT translation tasks. FGDM outperformed the
state-of-the-art methods (GAN-based, VAE-based, and diffusion-based) in metrics
of Fr’echet Inception Distance (FID), Peak Signal-to-Noise Ratio (PSNR), and
Structural Similarity Index Measure (SSIM), showing its significant advantages
in zero-shot medical image translation.

| Search Query: ArXiv Query: search_query=au:”Jing Wang”&id_list=&start=0&max_results=3