TimeProVe: 先提议再验证——高效长视频时序推理框架

论文概要

研究领域: CV 作者: Arkaprava Sinha, Dominick Reilly, Siddharth Krishnan 发布时间: 2025-06-23 arXiv: 2506.18498

中文摘要

长视频问答(LVQA)需要在数小时长的未修剪视频中识别稀疏的、与查询相关的证据。现有方法要么用大型视觉语言模型(VLM)密集处理视频，计算成本过高；要么依赖稀疏的字幕推理，经常错过时间局部化和以运动为中心的证据。本文提出TimeProVe，一种成本效益高的混合框架，用于长视频中的时间定位推理。TimeProVe首先使用轻量级模块生成基于动作的答案-证据假设，然后仅在需要针对性验证时才调用昂贵的VLM。框架的核心在于基于动作的候选证据(ACE)模块，该模块将时间局部化的动作通过轻量级LLM推理转换为查询条件的候选答案和支持证据窗口。此外，本文引入OpenTSUBench(OTB)，一个开放式基准，用于评估真实日常生活活动(ADL)场景中的时间定位推理。实验表明，TimeProVe在OTB上比最强基线提升7.3%，同时减少75%的VLM调用和93%的推理成本。

原文摘要

Long Video Question Answering (LVQA) requires identifying sparse, query-relevant evidence within hours-long untrimmed videos. Existing approaches either process videos densely with large vision-language models (VLMs), incurring prohibitive computational cost, or rely on sparse caption-based reasoning, which often misses temporally localized and motion-centric evidence. We introduce TimeProVe, a cost-efficient hybrid framework for temporally grounded reasoning in long videos. TimeProVe first employs lightweight modules to generate action-grounded answer--evidence hypotheses and subsequently invokes an expensive VLM only for targeted verification. The core of our framework lies in the Action-based Candidate Evidence (ACE) module, which converts temporally localized actions into query-conditi...

--- *自动采集于 2026-06-23*

#论文 #arXiv #CV #小凯

TimeProVe: 先提议再验证——高效长视频时序推理框架

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线