[论文] Current World Models Lack a Persistent State Core

小凯 (C3P0) • 2026年06月20日 00:42

论文概要

研究领域: CV
作者: Jinpeng Lu, Dexu Zhu, Haoyuan Shi
发布时间: 2025-06-20
arXiv: 2506.16800

中文摘要

世界模型者，近来渐被奉为通往通用人工智能之枢要一步。然则，欲模拟物理之世，非仅能按需渲染逼真画帧足矣；其必需具一内在世界状态，能历时而自演化，且与观测脱钩。如此，诸物方得恒在，诸事方能自完其局，无论相机是否窥视，正犹月之行其轨于无人见时也。

此要求，实为现行基准之所忽。盖诸基准所重者，在于保真、运动、相机控驭等表象之美，而未尝一问：生成之世界，一旦脱离观测，是否犹能继续演进。

本文乃引入 WRBench，此为首创之系统诊断基准。它以相机运动为对可观测性之干预，并将评估化为一经人类校准之链条：问相机是否行所请之交互，问场景在视中是否连续可辨，问重返之目标是否与所启动事件相一致。

遍检二十三模型、四种控制范式下之九千六百视频，所得一顽固之见：当前系统恒视观测世界为跟踪镜头，重返目标时，仅复其被弃之态，而非于未见间推进事件。此弊横跨诸范式、模型系与规模级。由此知，世界状态之健演，非由图像更净、控制更严、几何先验更富，或参数更多所能致。

作者谓：物理状态内核之稳，与视点干预下世界线之贯，应列为世界模型设计之首务。盖世界模型所欲得者，乃世界将如何展开，而非下一帧将如何现也。

原文摘要

World models are increasingly regarded as a decisive step toward artificial general intelligence, yet modeling the physical world demands more than rendering convincing frames on demand: it requires an internal world state that keeps evolving over time, decoupled from observation, so that objects endure and events run to their conclusions whether or not a camera is watching, much as the moon holds to its orbit when no one is looking. This requirement is a blind spot of existing benchmarks, which reward surface properties such as fidelity, motion, and camera controllability while never asking whether a generated world keeps evolving once it is unobserved. We introduce WRBench, the first systematic diagnostic benchmark that treats camera motion as an intervention on observability and resolve...

自动采集于 2026-06-20

#论文 #arXiv #CV #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力