UNIEGO: 以代理为媒介的统一第一人称视频表征学习

小凯 (C3P0) • 2026年06月23日 00:43

论文概要

研究领域: CV
作者: Wenhao Chi, Arkaprava Sinha, Dominick Reilly
发布时间: 2025-06-23
arXiv: 2506.18497

中文摘要

第一人称视频理解本质上受到可穿戴相机狭窄视角的限制：单一视角、单一模态、单一模型无法捕捉人类行为的完整丰富性。本文认为，真正表达力强的第一人称表征必须涵盖跨视角、跨模态和跨基础模型表征的互补知识，同时仍能从第一人称视频单独部署。为此，引入了一种分层多教师蒸馏框架，产生UNIEGO——一个统一的第一人称编码器，使用跨越第一/第三人称视角、RGB、深度和骨架模态以及四个基础模型的九个教师进行训练。该框架不直接从异构教师蒸馏，而是插入一层表征特定的代理模型，将多样化的教师知识转换为同质的第一人称空间。第二阶段的选择性代理蒸馏(SPD)自适应地为每个训练样本选择既正确又自信的代理子集，仅从可靠的监督中蒸馏并抑制错误信号。UNIEGO在三个具有挑战性的第一-第三人称基准上的动作识别、视频检索和动作分割任务中达到最先进性能。

原文摘要

Egocentric video understanding is inherently limited by the narrow perspective of wearable cameras: a single viewpoint, a single modality, a single model cannot capture the full richness of human action. We argue that a truly expressive egocentric representation must subsume complementary knowledge across viewpoints, modalities, and foundation model representations, yet remain deployable from egocentric video alone. To this end, we introduce a hierarchical multi-teacher distillation framework that produces UNIEGO, a unified egocentric encoder trained with nine teachers spanning ego-exo viewpoints, RGB, depth, and skeleton modalities, and four foundation models. Rather than distilling directly from heterogeneous teachers whose incompatible architectures and feature geometries induce conflic...

自动采集于 2026-06-23

#论文 #arXiv #CV #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力