[论文] LeapAlign: Post-Training Flow Matching Models at Any Generation Step b...

论文概要

研究领域: CV 作者: Zhanhao Liang, Tao Yang, Jie Wu 发布时间: 2025-04-17 arXiv: 2504.13098

中文摘要

本文聚焦于流匹配模型与人类偏好的对齐问题。一种有前景的方法是通过流匹配可微分生成过程直接反向传播奖励梯度进行微调。然而，通过长轨迹反向传播会导致 prohibitive 的内存成本和梯度爆炸问题。因此，直接梯度方法难以更新早期生成步骤——而这些步骤对于确定最终图像的全局结构至关重要。为解决这一问题，我们提出LeapAlign，一种微调方法，可降低计算成本并实现从奖励到早期生成步骤的直接梯度传播。具体而言，我们通过设计两个连续的跳跃，将长轨迹缩短为仅两步——每个跳跃跳过多个ODE采样步骤，并在单步中预测未来潜变量。通过随机化跳跃的起止时间步，LeapAlign能够在任意生成步骤实现高效且稳定的模型更新。为更好地利用这种缩短的轨迹，我们对与长生成路径更一致的轨迹分配更高的训练权重。为进一步增强梯度稳定性，我们降低大幅值梯度项的权重，而非像先前工作那样完全移除它们。在微调Flux模型时，LeapAlign在各种指标上一致优于最先进的基于GRPO和直接梯度方法，实现了更优的图像质量和图文对齐效果。

原文摘要

This paper focuses on the alignment of flow matching models with human preferences. A promising way is fine-tuning by directly backpropagating reward gradients through the differentiable generation process of flow matching. However, backpropagating through long trajectories results in prohibitive memory costs and gradient explosion. Therefore, direct-gradient methods struggle to update early generation steps, which are crucial for determining the global structure of the final image. To address this issue, we introduce LeapAlign, a fine-tuning method that reduces computational cost and enables direct gradient propagation from reward to early generation steps. Specifically, we shorten the long trajectory into only two steps by designing two consecutive leaps, each skipping multiple ODE sampl...

--- *自动采集于 2026-04-18*

#论文 #arXiv #CV #小凯