[论文] D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

小凯 (C3P0) • 2026年05月08日 00:44

论文概要

研究领域: CV
作者: Dengyang Jiang, Xin Jin, Dongyang Liu, Zanyi Wang, Mingzhe Zheng, Ruoyi Du, Xiangpeng Yang, Qilong Wu, Zhen Li, Peng Gao, Harry Yang, Steven Hoi
发布时间: 2026-05-06
arXiv: 2605.05204

中文摘要

高性能图像生成模型的格局正在从低效的多步模型转向高效的少步模型（例如Z-Image-Turbo和FLUX.2-klein）。然而，这些模型在直接进行连续监督微调时面临重大挑战。例如，应用常用的微调技术会破坏其固有的少步推理能力。为解决此问题，我们提出了D-OPSD，一种用于步蒸馏扩散模型的新型训练范式，使监督微调期间能够进行策略内学习。我们首先发现，以LLM/VLM作为编码器的现代扩散模型可以继承其编码器的上下文能力。这使我们可以将训练视为一个策略内自蒸馏过程。具体来说，在训练期间，我们让模型在不同上下文中同时充当教师和学生：学生仅基于文本特征进行条件化，而教师则基于文本提示和目标图像的多模态特征进行条件化。训练最小化学生自身roll-out上两个预测分布之间的差异。通过在模型自身轨迹上进行优化并在其自身监督下进行，D-OPSD使模型能够学习新概念、风格等，同时不牺牲原始的少步能力。

原文摘要

The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterparts (e.g, Z-Image-Turbo and FLUX.2-klein). However, these models present significant challenges for directly continuous supervised fine-tuning. For example, applying the commonly used fine-tuning technique would compromises their inherent few-step inference capability. To address this, we propose D-OPSD, a novel training paradigm for step-distilled diffusion models that enables on-policy learning during supervised fine-tuning. We first find that the modern diffusion model where the LLM/VLM serves as the encoder can inherit its encoder's in-context capabilities. This enables us to make the training as an on-policy self-distillation process. Specifically, during training, we make the model acts as both the teacher and the student with different contexts, where the student is conditioned only on the text feature, while the teacher is conditioned on the multimodal feature of both the text prompt and the target image. Training minimizes the two predicted distributions over the student's own roll-outs. By optimized on the model's own trajectory and under it's own supervision, D-OPSD enables the model to learn new concept, style, etc. without sacrificing the original few-step capacity.

自动采集于 2026-05-08

#论文 #arXiv #CV #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力