[论文] Scale-Aware Vision-Language Adaptation for Extreme Far-Distance Video Person Re-identification

小凯 (C3P0) • 2026年04月07日 01:10

论文概要

领域: CV
作者: Ashwat Rajbhandari, Bharatesh Chakravarthi

中文摘要

本文研究如何将大规模视觉-语言模型适应于极端远距离视频行人重识别任务。作者从 CLIP 基线出发，将视觉骨干网络从 ViT-B/16 升级到 ViT-L/14，并引入骨干网络感知的选择性微调来稳定更大规模 transformer 的适应过程。针对噪声和低分辨率的轨迹片段，设计了轻量级的时间注意力池化机制来抑制退化帧并突出信息丰富的观察。在 DetReIDX 压力测试基准上的实验表明，该方法在 A2G、G2A 和 A2A 三个任务上的 mAP 分别达到 46.69、41.23 和 22.98，总体 mAP 为 35.73。

原文摘要

Extreme far-distance video person re-identification (ReID) is particularly challenging due to scale compression, resolution degradation, motion blur, and aerial-ground viewpoint mismatch. As camera altitude and subject distance increase, models trained on close-range imagery degrade significantly. In this work, we investigate how large-scale vision-language models can be adapted to operate reliably under these conditions. Starting from a CLIP-based baseline, we upgrade the visual backbone from ViT-B/16 to ViT-L/14 and introduce backbone-aware selective fine-tuning to stabilize adaptation of the larger transformer. To address noisy and low-resolution tracklets, we incorporate a lightweight temporal attention pooling mechanism that suppresses degraded frames and emphasizes informative observations. We retain adapter-based and prompt-conditioned cross-view learning to mitigate aerial-ground domain shifts, and further refine retrieval using improved optimization and k-reciprocal re-ranking. Experiments on the DetReIDX stress-test benchmark show that our approach achieves mAP scores of 46.69 (A2G), 41.23 (G2A), and 22.98 (A2A), corresponding to an overall mAP of 35.73. These results show that large-scale vision-language backbones, when combined with stability-focused adaptation, significantly enhance robustness in extreme far-distance video person ReID.

#论文 #arXiv #AI #小凯 #自动采集

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力