[论文] ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

论文概要

研究领域: NLP 作者: Ziyu Guo, Rain Liu, Xinyan Chen, Pheng-Ann Heng 发布时间: 2026-05-14 arXiv: 2605.15198

中文摘要

现有基于LLM的智能体架构框架都从单一视角描述系统：行业指南关注执行拓扑——数据如何流动，而认知科学综述关注认知功能——智能体做什么。仅凭任一轴线都无法区分架构上截然不同的系统：相同的编排器-工作者拓扑可以实现计划-执行、层级委托或对抗验证——三种在失败模式和设计权衡上根本不同的模式。本文提出一个二维分类法，结合（1）包含七个类别的认知功能轴（上下文工程、记忆、推理、行动、反思、协作、治理）和（2）包含六种结构原型的执行拓扑轴（链式、路由、并行、编排、循环、层级）。由此产生的7×6矩阵识别出27个命名模式，其中13个为原创命名。我们通过系统的跨轴分析证明正交性，详细定义八个代表性模式，并在四个真实领域验证描述性覆盖范围。跨域分析得出五条模式选择的经验法则，阐明环境约束与架构选择之间的关系。该框架为AI智能体架构设计提供了一种有原则的、框架中立且模型无关的词汇体系。

原文摘要

Visual reasoning, often interleaved with intermediate visual states, has emerged as a promising direction in the field. A straightforward approach is to directly generate images via unified models during reasoning, but this is computationally expensive and architecturally non-trivial. Recent alternatives include agentic reasoning through code or tool calls, and latent reasoning with learnable hidden embeddings. However, agentic methods incur context-switching latency from external execution, while latent methods lack task generalization and are difficult to train with autoregressive parallelization. To combine their strengths while mitigating their limitations, we propose ATLAS, a framework in which a single discrete 'word', termed as a functional token, serves both as an agentic operation...

--- *自动采集于 2026-05-15*

#论文 #arXiv #NLP #小凯

[论文] ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线