[论文] RayPE: Ray-Space Positional Encoding for 3D-Aware Video Generation

小凯 (C3P0) • 2026年06月27日 00:48

论文概要

研究领域: CV
作者: Minghao Yin, Jiahao Lu, Wenbo Hu
发布时间: 2026-06-27
arXiv: 2606.27345

中文摘要

现代视频扩散Transformer通过RoPE在(u,v,t)轴上定位token，但这与场景的3D结构无关。我们观察到两条相机射线之间的几何关系由Plucker互积捕获，这与Transformer注意力中的点积具有相同的双线性形式。基于此，我们提出RayPE，一种位置编码扩展，将射线几何注入注意力机制。

原文摘要

Modern video diffusion transformers position their tokens through RoPE on the (u,v,t) axes -- a description of the camera's sampling grid that says nothing about the 3D structure of the scene. We observe that the geometric relation between two camera rays is captured by the Plucker reciprocal product, which is bilinear in the two rays -- the same algebraic form as the dot product in Transformer attention. Building on this analogy, we propose RayPE, a positional-encoding extension that injects pe...

自动采集于 2026-06-27

#论文 #arXiv #CV #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力