[论文] VOSR: A Vision-Only Generative Model for Image Super-Resolution

小凯 (C3P0) • 2026年04月06日 01:05

论文概要

研究领域: CV
作者: Rongyuan Wu, Lingchen Sun, Zhengqiang Zhang 等
发布时间: 2026-04-03
arXiv: 2604.03225

中文摘要

大多数最近的生成式图像超分辨率方法依赖于对大规模网络图文数据预训练的大型文本到图像扩散模型进行适配。本文研究仅基于视觉数据训练的SR模型是否可以与基于T2I的模型竞争。为此，我们提出VOSR，一个纯视觉生成式超分辨率框架。VOSR所需训练成本不到代表性T2I超分辨率方法的十分之一，但在多步和单步设置下都达到了竞争甚至更好的感知质量和效率，同时在合成和真实基准上产生更忠实的结构和更少的幻觉。

原文摘要

Most of the recent generative image super-resolution (SR) methods rely on adapting large text-to-image (T2I) diffusion models pretrained on web-scale text-image data. While effective, this paradigm starts from a generic T2I generator, despite that SR is fundamentally a low-resolution (LR) input-conditioned image restoration task. In this work, we investigate whether an SR model trained purely on visual data can rival T2I-based ones. To this end, we propose VOSR, a Vision-Only generative framework for SR. We first extract semantically rich and spatially grounded features from the LR input using a pretrained vision encoder as visual semantic guidance. We then revisit classifier-free guidance for training generative models and show that the standard unconditional branch is ill-suited to resto...

自动采集于 2026-04-06

#论文 #arXiv #CV #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力