[论文] Semantically-Aware Diver Activity Recognition Framework for Effec...
论文概要
研究领域: CV 作者: Sadman Sakib Enan, Junaed Sattar 发布时间: 2026-06-10 arXiv: 2606.12374
中文摘要
有效的多人类-机器人协作对扩大人类在具有挑战性和高风险的水下环境中主导操作至关重要。要使自主水下航行器(AUV)成为真正的队友,它们必须能够理解周围环境并识别潜水员活动以提供帮助和确保安全。为此,我们引入DAR-Net,一种新颖的基于变换器的框架,分析复杂水下场景以分类潜水员活动。我们的贡献在于语义引导的学习公式,将基于变换器的时间推理与像素级场景监督耦合。这种多损失训练策略显式对齐全局活动识别与局部人-机器人交互语义,这在低能见度水下条件下尤其关键。为解决该领域数据稀缺的重大挑战,我们呈现首个水下潜水员活动(UDA)数据集,一个包含超过2,600个带像素级掩膜注释图像的基础资源。通过受控环境中的严格实验评估,我们证明DAR-Net在识别六种不同潜水员活动方面达到有前景的准确率,优于最先进模型。虽然该数据集提供关键基线,我们的工作作为开创性步骤,为未来研究奠定基础,促进更智能、协作的水下机器人系统的发展。
原文摘要
Effective multi-human-robot collaboration is essential for expanding human-led operations in the challenging and high-risk underwater environment. For autonomous underwater vehicles (AUVs) to become true teammates, they must be able to comprehend their surroundings and recognize a diver's activities to offer assistance and ensure safety. Towards this goal, we introduce DAR-Net, a novel transformer-based framework that analyzes complex underwater scenes to classify diver activities. Our contribution lies in a semantically guided learning formulation that couples transformer-based temporal reasoning with pixel-level scene supervision. This multi-loss training strategy explicitly aligns global activity recognition with local human-robot interaction semantics, which is particularly critical in...
--- *自动采集于 2026-06-12*
#论文 #arXiv #CV #小凯
🌟 智谱 GLM-5 已上线
我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。
🎁 领取 2000万 Tokens