论文概要
研究领域: CV 作者: Shuang Chen, Kaituo Feng, Hangting Chen, Wenxuan Huang, Dasen Dai, Quanxin Shou, Yunlong Lin, Xiangyu Yue, Shenghua Gao, Tianyu Pang 发布时间: 2026-05-06 arXiv: 2605.05185中文摘要
深度搜索已成为前沿多模态智能体的关键能力,使模型能够通过主动搜索、证据验证和多步推理来解决复杂问题。尽管进展迅速,顶级多模态搜索智能体仍然难以复现,很大程度上是由于缺乏开放的高质量训练数据、透明的轨迹合成流水线或详细的训练配方。为此,我们引入了OpenSearch-VL,一个用于训练前沿多模态深度搜索智能体的完全开源方案,采用智能体强化学习。首先,我们策划了一个专用流水线,通过Wikipedia路径采样、模糊实体重写和源锚视觉定位来构建高质量训练数据,共同减少捷径和一步检索坍塌。基于这一流水线,我们策划了两个训练数据集:SearchVL-SFT-36k用于SFT,SearchVL-RL-8k用于RL。此外,我们设计了一个多样化的工具环境,统一了文本搜索、图像搜索、OCR、裁剪、锐化、超分辨率和透视校正,使智能体能够将主动感知与外部知识获取相结合。最后,我们提出了一种多轮fatal-aware GRPO训练算法,通过掩蔽失败后的token同时保留失败前有用推理来处理级联工具故障。基于这一方案,OpenSearch-VL在七个基准上实现了超过10点的平均提升,并在多个任务上达到了与专有商业模型相当的结果。我们将发布所有数据、代码和模型以支持多模态深度搜索智能体的开放研究。原文摘要
Deep search has become a crucial capability for frontier multimodal agents, enabling models to solve complex questions through active search, evidence verification, and multi-step reasoning. Despite rapid progress, top-tier multimodal search agents remain difficult to reproduce, largely due to the absence of open high-quality training data, transparent trajectory synthesis pipelines, or detailed training recipes. To this end, we introduce OpenSearch-VL, a fully open-source recipe for training frontier multimodal deep search agents with agentic reinforcement learning. First, we curated a dedicated pipeline to construct high-quality training data through Wikipedia path sampling, fuzzy entity rewriting, and source-anchor visual grounding, which jointly reduce shortcuts and one-step retrieval collapse. Based on this pipeline, we curate two training datasets, SearchVL-SFT-36k for SFT and SearchVL-RL-8k for RL. Besides, we design a diverse tool environment that unifies text search, image search, OCR, cropping, sharpening, super-resolution, and perspective correction, enabling agents to combine active perception with external knowledge acquisition. Finally, we propose a multi-turn fatal-aware GRPO training algorithm that handles cascading tool failures by masking post-failure tokens while preserving useful pre-failure reasoning through one-sided advantage clamping. Built on this recipe, OpenSearch-VL delivers substantial performance gains, with over 10-point average improvements across seven benchmarks, and achieves results comparable to proprietary commercial models on several tasks. We will release all data, code, and models to support open研究 on multimodal deep search agents.--- *自动采集于 2026-05-08*
#论文 #arXiv #CV #小凯