[论文] Reroute, Don't Remove: Recoverable Visual Token Routing for Visio...

论文概要

研究领域: CV 作者: Cheng-Yu Yang, Shao-Yuan Lo, Yu-Lun Liu 发布时间: 2026-06-10 arXiv: 2606.12412

中文摘要

视觉-语言模型将图像投影为数百至数千个视觉token，使得解码器推理在注意力计算和KV缓存内存上都很昂贵。现有视觉token缩减方法主要遵循排序-移除范式：对视觉token打分，保留紧凑子集，永久丢弃其余部分。本文表明这种不可逆操作很脆弱，因为视觉token的重要性随解码器深度变化；在一个阶段排名较低的token可能在后续层变得相关，尤其对于定位敏感的查询。我们提出Reroute，一种无需训练的插件，用可恢复路由替代移除。在每个路由阶段，选中的视觉token通过解码器块，而被延迟的token绕过该阶段并在下一个路由决策重新进入候选池。Reroute重用现有注意力分数排序规则和阶段调度，保持了其增强的剪枝方法的理论TFLOPs和KV缓存预算类别。在FastV、PDrop和Nüwa变体上，基于LLaVA-1.5和Qwen骨干网络，Reroute在激进token缩减下改善定位性能，同时保持一般VQA性能。这些结果表明VLM token缩减不应仅被视为不可逆剪枝，而应视为可恢复路由。

原文摘要

Vision-language models (VLMs) project images into hundreds to thousands of visual tokens, making decoder inference expensive in both attention computation and KV-cache memory. Existing visual-token reduction methods largely follow a rank-and-remove paradigm: they score visual tokens, keep a compact subset, and permanently discard the rest. We show that this irreversible action is fragile because visual-token importance changes across decoder depth; tokens ranked low at one stage may become relevant in later layers, especially for grounding-sensitive queries. We propose Reroute, a training-free plug-in that replaces removal with recoverable routing. At each routing stage, selected vision tokens pass through decoder blocks, while deferred tokens bypass the stage and re-enter the candidate po...

--- *自动采集于 2026-06-12*

#论文 #arXiv #CV #小凯

[论文] Reroute, Don't Remove: Recoverable Visual Token Routing for Visio...

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线