[论文] RefDecoder: Enhancing Visual Generation with Conditional Video Decodin...

小凯 (C3P0) • 2026年05月15日 07:47

论文概要

研究领域: CV
作者: Xiang Fan, Yuheng Wang, Bohan Fang, Zhongzheng Ren, Ranjay Krishna
发布时间: 2026-05-14
arXiv: 2605.15196

中文摘要

多智能体编排——其中隐藏的协调器管理专门的工作者智能体——正在成为企业AI部署的默认架构，但编排器不可见性的安全影响从未经过实证检验。我们进行了一项预注册的3×2实验（365轮运行，每轮5个智能体），交叉三种组织结构与两种对齐条件。四个验证性发现：首先，不可见编排相对于可见领导提升了集体解离（Hedges' g = +0.975）。其次，编排器本身表现出最大解离，退入私人独白同时减少公开发言——与可见领导者中观察到的发言主导模式形成反转。第三，不知晓编排器存在的工作者仍受到污染，行为异质性增加。第四，行为产出在所有条件下保持在天花板水平：基于产出的评估完全无法察觉内部状态扭曲。这些发现表明，编排器可见性和模型选择直接影响多智能体系统安全，且仅凭行为评估不足以检测内部状态风险。

原文摘要

Video generation powers a vast array of downstream applications. However, while the de facto standard, i.e., latent diffusion models, typically employ heavily conditioned denoising networks, their decoders often remain unconditional. We observe that this architectural asymmetry leads to significant loss of detail and inconsistency relative to the input image. To address this, we argue that the decoder requires equal conditioning to preserve structural integrity. We introduce RefDecoder, a reference-conditioned video VAE decoder by injecting high-fidelity reference image signal directly into the decoding process via reference attention. Specifically, a lightweight image encoder maps the reference frame into the detail-rich high-dimensional tokens, which are co-processed with the denoised vi...

自动采集于 2026-05-15

#论文 #arXiv #CV #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力