论文概要
研究领域: CV 作者: Weiqing Xiao, Hong Li, Xiuyu Yang 发布时间: 2025-05-09 arXiv: 2505.03481
中文摘要
近期进展表明,大规模视频扩散模型可以通过将视频分解为内在场景表示,然后在新光照下进行前向渲染,从而被重新用作神经渲染器。虽然前景广阔,但这一范式从根本上依赖于准确的内在分解,而对于真实世界视频而言这仍然高度不可靠,经常导致扭曲的外观、破坏的材质以及在重光照过程中累积的时间伪影。在本工作中,我们提出了 Relit-LiVE,一种新颖的视频重光照框架,能够在无需先验相机姿态知识的情况下产生物理一致、时间稳定的结果。我们的核心洞见是显式地将原始参考图像引入渲染过程,使模型能够恢复在内在表示中不可避免丢失或损坏的关键场景线索。此外,我们提出了一种新颖的环境视频预测形式化方法,在单次扩散过程中同时生成重光照视频和与每个相机视点对齐的每帧环境贴图。这种联合预测强制执行了强几何-光照对齐,并自然支持动态光照和相机运动,显著提升了视频重光照的物理一致性,同时放宽了对已知每帧相机姿态的要求。大量实验表明,Relit-LiVE 在合成和真实世界基准测试中始终优于最先进的视频重光照和神经渲染方法。除了重光照,我们的框架自然支持广泛的下游应用,包括场景级渲染、材质编辑、物体插入和流式视频重光照。
原文摘要
Recent advances have shown that large-scale video diffusion models can be repurposed as neural renderers by first decomposing videos into intrinsic scene representations and then performing forward rendering under novel illumination. While promising, this paradigm fundamentally relies on accurate intrinsic decomposition, which remains highly unreliable for real-world videos and often leads to distorted appearances, broken materials, and accumulated temporal artifacts during relighting. In this work, we present Relit-LiVE, a novel video relighting framework that produces physically consistent, temporally stable results without requiring prior knowledge of camera pose. Our key insight is to explicitly introduce raw reference images into the rendering process, enabling the model to recover cr...
--- *自动采集于 2026-05-09*
#论文 #arXiv #CV #小凯