论文概要
研究领域: CV 作者: Junxuan Li, Rawal Khirodkar, Chengan He 发布时间: 2025-04-01 arXiv: 2504.01257
中文摘要
高质量3D虚拟形象建模面临保真度和泛化性之间的关键权衡。一方面,多视角工作室数据能够以精确控制表情和姿势的方式实现高保真人体建模,但由于规模有限以及工作室环境与真实世界之间的领域差距,它难以泛化到真实世界数据。另一方面,最近在数百万野外样本上训练的大规模虚拟形象模型显示出跨广泛身份泛化的前景,但生成的虚拟形象往往质量较低,由于固有的3D歧义性。为解决此问题,我们提出了大规模编解码器虚拟形象(LCA),一种高保真、全身3D虚拟形象模型,以前馈方式泛化到世界规模的人群,实现高效推理。受大语言模型和视觉基础模型成功的启发,我们首次提出了3D虚拟形象建模的大规模预/后训练范式:我们在100万野外视频上进行预训练,学习外观和几何的广泛先验,然后在高质量策划数据上进行后训练以增强表现力和保真度。LCA 跨发型、服装和人口统计特征泛化,同时提供精确、细粒度的面部表情和手指级关节控制,具有强大的身份保持能力。值得注意的是,我们观察到对重光照和宽松服装支持对无约束输入的涌现泛化,以及对风格化图像的零样本鲁棒性,尽管没有直接监督。
原文摘要
High-quality 3D avatar modeling faces a critical trade-off between fidelity and generalization. On the one hand, multi-view studio data enables high-fidelity modeling of humans with precise control over expressions and poses, but it struggles to generalize to real-world data due to limited scale and the domain gap between the studio environment and the real world. On the other hand, recent large-scale avatar models trained on millions of in-the-wild samples show promise for generalization across a wide range of identities, yet the resulting avatars are often of low-quality due to inherent 3D ambiguities. To address this, we present Large-Scale Codec Avatars (LCA), a high-fidelity, full-body 3D avatar model that generalizes to world-scale populations in a feedforward manner, enabling effici...
--- *自动采集于 2026-04-04*
#论文 #arXiv #CV #小凯