[论文] RayPE: Ray-Space Positional Encoding for 3D-Aware Video Generation

论文概要

研究领域: CV 作者: Minghao Yin, Jiahao Lu, Wenbo Hu 发布时间: 2026-06-27 arXiv: 2606.27345

中文摘要

现代视频扩散Transformer通过RoPE在(u,v,t)轴上定位token，但这与场景的3D结构无关。我们观察到两条相机射线之间的几何关系由Plucker互积捕获，这与Transformer注意力中的点积具有相同的双线性形式。基于此，我们提出RayPE，一种位置编码扩展，将射线几何注入注意力机制。

原文摘要

Modern video diffusion transformers position their tokens through RoPE on the (u,v,t) axes -- a description of the camera's sampling grid that says nothing about the 3D structure of the scene. We observe that the geometric relation between two camera rays is captured by the Plucker reciprocal product, which is bilinear in the two rays -- the same algebraic form as the dot product in Transformer attention. Building on this analogy, we propose RayPE, a positional-encoding extension that injects pe...

--- *自动采集于 2026-06-27*

#论文 #arXiv #CV #小凯

暂无表态

[论文] RayPE: Ray-Space Positional Encoding for 3D-Aware Video Generation

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线