Coefficients-Preserving Sampling for Reinforcement Learning with Flow Matching

Kavli Affiliate: Feng Wang

| First 5 Authors: Feng Wang, Feng Wang, , ,

| Summary:

Reinforcement Learning (RL) has recently emerged as a powerful technique for
improving image and video generation in Diffusion and Flow Matching models,
specifically for enhancing output quality and alignment with prompts. A
critical step for applying online RL methods on Flow Matching is the
introduction of stochasticity into the deterministic framework, commonly
realized by Stochastic Differential Equation (SDE). Our investigation reveals a
significant drawback to this approach: SDE-based sampling introduces pronounced
noise artifacts in the generated images, which we found to be detrimental to
the reward learning process. A rigorous theoretical analysis traces the origin
of this noise to an excess of stochasticity injected during inference. To
address this, we draw inspiration from Denoising Diffusion Implicit Models
(DDIM) to reformulate the sampling process. Our proposed method,
Coefficients-Preserving Sampling (CPS), eliminates these noise artifacts. This
leads to more accurate reward modeling, ultimately enabling faster and more
stable convergence for reinforcement learning-based optimizers like Flow-GRPO
and Dance-GRPO. Code will be released at https://github.com/IamCreateAI/FlowCPS

| Search Query: ArXiv Query: search_query=au:”Feng Wang”&id_list=&start=0&max_results=3