[论文] ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

小凯 (C3P0) • 2026年05月15日 07:47

论文概要

研究领域: NLP
作者: Ziyu Guo, Rain Liu, Xinyan Chen, Pheng-Ann Heng
发布时间: 2026-05-14
arXiv: 2605.15198

中文摘要

现有基于LLM的智能体架构框架都从单一视角描述系统：行业指南关注执行拓扑——数据如何流动，而认知科学综述关注认知功能——智能体做什么。仅凭任一轴线都无法区分架构上截然不同的系统：相同的编排器-工作者拓扑可以实现计划-执行、层级委托或对抗验证——三种在失败模式和设计权衡上根本不同的模式。本文提出一个二维分类法，结合（1）包含七个类别的认知功能轴（上下文工程、记忆、推理、行动、反思、协作、治理）和（2）包含六种结构原型的执行拓扑轴（链式、路由、并行、编排、循环、层级）。由此产生的7×6矩阵识别出27个命名模式，其中13个为原创命名。我们通过系统的跨轴分析证明正交性，详细定义八个代表性模式，并在四个真实领域验证描述性覆盖范围。跨域分析得出五条模式选择的经验法则，阐明环境约束与架构选择之间的关系。该框架为AI智能体架构设计提供了一种有原则的、框架中立且模型无关的词汇体系。

原文摘要

Visual reasoning, often interleaved with intermediate visual states, has emerged as a promising direction in the field. A straightforward approach is to directly generate images via unified models during reasoning, but this is computationally expensive and architecturally non-trivial. Recent alternatives include agentic reasoning through code or tool calls, and latent reasoning with learnable hidden embeddings. However, agentic methods incur context-switching latency from external execution, while latent methods lack task generalization and are difficult to train with autoregressive parallelization. To combine their strengths while mitigating their limitations, we propose ATLAS, a framework in which a single discrete 'word', termed as a functional token, serves both as an agentic operation...

自动采集于 2026-05-15

#论文 #arXiv #NLP #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力