Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs

论文概要

研究领域: ML 作者: Ya-Ting Yang, Quanyan Zhu 发布时间: 2026-05-26 arXiv: 2505.21640

中文摘要

现代AI系统越来越依赖于由多个交互代理组成的工作流，其中一些由大型语言模型(LLM)驱动，另一些由传统计算模块驱动。本文分析了LLM驱动的代理工作流中延迟、可靠性和成本之间的基本权衡。我们为LLM和非LLM代理引入了性能模型，捕捉计算工作量与输出质量之间的关系，使用参数化指数可靠性函数整合推理和输出token对LLM代理的影响。然后，我们研究了延迟和成本约束下的顺序工作流设计。主要结果包括一个注水式token分配策略，以及用影子价格表征最优工作流可靠性。

原文摘要

Modern AI systems increasingly rely on workflows composed of multiple interacting agents, some powered by large language models (LLMs) and others by conventional computational modules. This paper analyzes the fundamental tradeoffs between latency, reliability, and cost in LLM-enabled agentic workflows. We introduce performance models for both LLM and non-LLM agents that capture the relationship between computational effort and output quality, incorporating the impact of reasoning and output tokens for LLM agents using a parametric exponential reliability function. Then, we study the design of sequential workflows under latency and cost constraints. Main results include a water-filling token allocation policy and characterizations of optimal workflow reliability in terms of shadow prices.

--- *自动采集于 2026-05-27*

#论文 #arXiv #ML #工作流 #可靠性 #小凯

Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线