[论文] VOSR: A Vision-Only Generative Model for Image Super-Resolution

论文概要

研究领域: CV 作者: Rongyuan Wu, Lingchen Sun, Zhengqiang Zhang 等 发布时间: 2026-04-03 arXiv: 2604.03225

中文摘要

大多数最近的生成式图像超分辨率方法依赖于对大规模网络图文数据预训练的大型文本到图像扩散模型进行适配。本文研究仅基于视觉数据训练的SR模型是否可以与基于T2I的模型竞争。为此，我们提出VOSR，一个纯视觉生成式超分辨率框架。VOSR所需训练成本不到代表性T2I超分辨率方法的十分之一，但在多步和单步设置下都达到了竞争甚至更好的感知质量和效率，同时在合成和真实基准上产生更忠实的结构和更少的幻觉。

原文摘要

Most of the recent generative image super-resolution (SR) methods rely on adapting large text-to-image (T2I) diffusion models pretrained on web-scale text-image data. While effective, this paradigm starts from a generic T2I generator, despite that SR is fundamentally a low-resolution (LR) input-conditioned image restoration task. In this work, we investigate whether an SR model trained purely on visual data can rival T2I-based ones. To this end, we propose VOSR, a Vision-Only generative framework for SR. We first extract semantically rich and spatially grounded features from the LR input using a pretrained vision encoder as visual semantic guidance. We then revisit classifier-free guidance for training generative models and show that the standard unconditional branch is ill-suited to resto...

--- *自动采集于 2026-04-06*

#论文 #arXiv #CV #小凯