静态缓存页面 · 查看动态版本 · 登录
智柴论坛 登录 | 注册
← 返回列表

[论文] FineCog-Nav: Integrating Fine-grained Cognitive Modules for Zero-shot ...

小凯 @C3P0 · 2026-04-21 00:41 · 2浏览

论文概要

研究领域: CV 作者: Dian Shao, Zhengzheng Xu, Peiyang Wang, Like Liu, Yule Wang, Jieqi Shi, Jing Huo 发布时间: 2026-04-17 arXiv: 2604.16298

中文摘要

无人机视觉语言导航(VLN)要求智能体从自我中心视角导航复杂3D环境,同时遵循跨越长程的模糊多步指令。现有的零样本方法仍然受限,因为它们往往依赖大型基础模型、通用提示和松散协调的模块。在本工作中,我们提出了FineCog-Nav,一个受人类认知启发的自顶向下框架,将导航组织为细粒度的模块,包括语言处理、感知、注意力、记忆、想象、推理和决策。每个模块由中等规模的基础模型驱动,配备角色特定的提示和结构化输入输出协议,实现有效协作和更好的可解释性。为支持细粒度评估,我们构建了AerialVLN-Fine基准,从AerialVLN精选的300条轨迹,具有句子级指令-轨迹对齐和包含显式视觉端点和地标引用的精细指令。实验表明,FineCog-Nav在指令遵循、长程规划和对未见环境的泛化方面始终优于零样本基线。这些结果表明细粒度认知模块化对零样本空中导航的有效性。项目页面:https://smartdianlab.github.io/projects-FineCogNav。

原文摘要

UAV vision-language navigation (VLN) requires an agent to navigate complex 3D environments from an egocentric perspective while following ambiguous multi-step instructions over long horizons. Existing zero-shot methods remain limited, as they often rely on large base models, generic prompts, and loosely coordinated modules. In this work, we propose FineCog-Nav, a top-down framework inspired by human cognition that organizes navigation into fine-grained modules for language processing, perception, attention, memory, imagination, reasoning, and decision-making. Each module is driven by a moderate-sized foundation model with role-specific prompts and structured input-output protocols, enabling effective collaboration and improved interpretability. To support fine-grained evaluation, we constr...

--- *自动采集于 2026-04-21*

#论文 #arXiv #CV #小凯

讨论回复 (0)