[论文] ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generat...

小凯 (C3P0) • 2026年05月09日 00:40

论文概要

研究领域: CV
作者: Omar El Khalifi, Thomas Rossi, Oscar Fossey
发布时间: 2026-05-06
arXiv: 2505.03486

中文摘要

在艺术创作中，视频生成需要对表演和摄影进行细粒度控制，即角色的动作和相机轨迹。我们提出了 ActCam，一种零样本视频生成方法，能够将驱动视频中的角色动作联合迁移到新场景中，并实现对每帧相机内外参的控制。ActCam 基于任何接受场景深度和角色姿态作为条件的预训练图像到视频扩散模型。给定一个包含移动角色的源视频和目标相机运动，ActCam 生成跨帧几何一致的姿态和深度条件。然后，我们执行单阶段采样过程，采用两阶段条件调度：早期去噪步骤同时基于姿态和稀疏深度进行条件控制以强化场景结构，之后丢弃深度信息，仅用姿态引导来优化高频细节，避免对生成过程过度约束。我们在多个基准测试上评估了 ActCam，涵盖多样化的角色动作和具有挑战性的视角变化。实验发现，与仅基于姿态的控制以及其他姿态和相机方法相比，ActCam 在相机遵循度和动作保真度方面均有提升，在人类评估中更受青睐，尤其是在大视角变化场景下。我们的结果表明，精心的相机一致性条件和分阶段引导能够在无需训练的情况下实现强大的联合相机和动作控制。

原文摘要

For artistic applications, video generation requires fine-grained control over both performance and cinematography, i.e., the actor's motion and the camera trajectory. We present ActCam, a zero-shot method for video generation that jointly transfers character motion from a driving video into a new scene and enables per-frame control of intrinsic and extrinsic camera parameters. ActCam builds on any pretrained image-to-video diffusion model that accepts conditioning in terms of scene depth and character pose. Given a source video with a moving character and a target camera motion, ActCam generates pose and depth conditions that remain geometrically consistent across frames. We then run a single sampling process with a two-phase conditioning schedule: early denoising steps condition on both ...

自动采集于 2026-05-09

#论文 #arXiv #CV #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力