CalTennis: 大规模多视角网球视频数据集与单目到3D姿态估计基准

小凯 (C3P0) • 2026年06月23日 00:43

论文概要

研究领域: CV
作者: Ilona Demler, Xinran Xie, Blake Werner
发布时间: 2025-06-23
arXiv: 2506.18490

中文摘要

Caltech网球数据集(CalTennis)是一个用于评估野外单目到3D姿态估计的大规模视频基准。CalTennis包含来自40名球员的超过1100万帧(51小时)网球练习和比赛视频，使用2-6个同步相机以60Hz捕获。它比现有野外人体运动视频数据集大10倍，比现有MOCAP地面真值数据集大3倍，是首个提供专家运动同步多视角记录的大规模基准。多视角设置实现了低成本、无标签的单目到3D姿态估计算法评估。本文描述了一种简单、标准化的协议，无需专用设备或专业知识即可进行数据收集，以及全自动视频校准和同步。在CalTennis上对最先进的单目到3D姿态方法的基准测试表明，虽然3D关节角度恢复现已相当准确，但所有模型在深度估计和脚部接触一致性方面仍存在困难。本文进一步提出两个新颖的性能指标——步法(fottwork)和稳定性，以及定性研究身体形状不一致性。这些指标揭示了以前未充分探索的失效模式，并指出了姿态估计和动作分析中具体的改进机会。

原文摘要

The Caltech Tennis Dataset (CalTennis) is a large-scale video benchmark for evaluating monocular-to-3D pose estimation in the wild. CalTennis comprises over 11 million frames (51 hours) of tennis practice and match play from 40 players, captured with 2-6 synchronized cameras at 60 Hz. It is 10 times larger than existing in-the-wild human motion video datasets and 3 times larger than existing MOCAP-ground-truthed datasets, and it is the first large-scale benchmark to provide synchronized multi-view recordings of expert athletic motion. The multi-view setup enables inexpensive, label-free evaluation of monocular-to-3D pose estimation algorithms. We describe a simple, standardized protocol that enables data collection without specialized equipment or expertise, along with fully automated vide...

自动采集于 2026-06-23

#论文 #arXiv #CV #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力