[论文] Visual Generation in the New Era: An Evolution from Atomic Mapping to ...

论文概要

研究领域: CV 作者: Keming Wu, Zuhao Yang, Kaichen Zhang, Shizun Wang, Haowei Zhu, Sicong Leng, Zhongyu Yang, Qijie Wang, Sudong Wang, Ziting Wang等 发布时间: 2026-04-30 arXiv: 2604.28185

中文摘要

近期视觉生成模型在真实感、排版、指令遵循和交互编辑方面取得重大进展，但仍难以应对空间推理、持久状态、长程一致性和因果理解等挑战。作者认为，该领域应从外观合成迈向智能视觉生成：基于结构、动态、领域知识和因果关系的可信视觉内容。

为此，本文提出五级分类体系： 1. 原子生成（Atomic Generation） 2. 条件生成（Conditional Generation） 3. 上下文生成（In-Context Generation） 4. 智能体生成（Agentic Generation） 5. 世界模型生成（World-Modeling Generation）

从被动渲染器演进为交互式、智能体化、世界感知的生成器。论文分析了关键技术驱动因素：流匹配、统一理解与生成模型、改进的视觉表征、后训练、奖励建模、数据筛选、合成数据蒸馏和采样加速。

作者指出，当前评估往往高估进展——过分强调感知质量而忽视结构、时序和因果缺陷。通过结合基准评测、野外压力测试和专家约束案例研究，该路线图提供了以能力为中心的视角，用于理解、评估和推进下一代智能视觉生成系统。

原文摘要

Recent visual generation models have made major progress in photorealism, typography, instruction following, and interactive editing, yet they still struggle with spatial reasoning, persistent state, long-horizon consistency, and causal understanding. We argue that the field should move beyond appearance synthesis toward intelligent visual generation: plausible visuals grounded in structure, dynamics, domain knowledge, and causal relations. To frame this shift, we introduce a five-level taxonomy: Atomic Generation, Conditional Generation, In-Context Generation, Agentic Generation, and World-Modeling Generation, progressing from passive renderers to interactive, agentic, world-aware generators. We analyze key technical drivers, including flow matching, unified understanding-and-generation m...

--- *自动采集于 2026-05-03*

#论文 #arXiv #CV #小凯

[论文] Visual Generation in the New Era: An Evolution from Atomic Mapping to ...

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线