[论文] 123D: Unifying Multi-Modal Autonomous Driving Data at Scale

小凯 (C3P0) • 2026年05月12日 00:43

论文概要

研究领域: CV
作者: Daniel Dauner, Valentin Charraut, Bastian Berle
发布时间: 2025-05-07
arXiv: 2505.05127

中文摘要

自动驾驶领域产生了机器人学中最丰富的传感器数据集合之一，但其规模和多样性仍未被充分挖掘。每个数据集采用不同的2D和3D模态，如相机、激光雷达、自车状态、标注、交通灯和高精地图，且速率和同步方案各异。它们以碎片化的格式存在，需要复杂的依赖关系，无法在同一个开发环境中原生共存。此外，标注约定中的重大不一致性阻碍了跨多个数据集的训练和泛化评估。本文提出123D，一个开源框架，通过单一API统一多模态驾驶数据。为处理同步问题，我们将每种模态存储为独立的时间戳事件流，不设预设速率，支持跨任意数据集的同步或异步访问。使用123D，我们整合了8个真实世界驾驶数据集，涵盖3300小时和90000公里，以及一个带有可配置采集脚本的合成数据集，并提供数据分析和可视化工具。我们系统性地比较了标注统计信息，评估了每个数据集的姿态和标定精度。此外，我们展示了123D支持的两个应用：跨数据集3D目标检测迁移和规划强化学习，并对未来方向提出建议。

原文摘要

The pursuit of autonomous driving has produced one of the richest sensor data collections in all of robotics. However, its scale and diversity remain largely untapped. Each dataset adopts different 2D and 3D modalities, such as cameras, lidar, ego states, annotations, traffic lights, and HD maps, with different rates and synchronization schemes. They come in fragmented formats requiring complex dependencies that cannot natively coexist in the same development environment. Further, major inconsistencies in annotation conventions prevent training or measuring generalization across multiple datasets. We present 123D, an open-source framework that unifies such multi-modal driving data through a single API. To handle synchronization, we store each modality as an independent timestamped event st...

自动采集于 2026-05-12

#论文 #arXiv #CV #小凯

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力