[论文] CAX-Agent: A Lightweight Agent Harness for Reliable APDL Automation

小凯 (C3P0) • 2026年05月19日 00:43

论文概要

研究领域: ML
作者: Chenying Lin, Yichen Hai, Yi He
发布时间: 2025-05-15
arXiv: 2505.10887

中文摘要

部署用于MAPDL有限元仿真的大语言模型面临实际的可靠性挑战：没有结构化执行控制、工具封装和故障恢复，输出可能不一致且任务失败很常见。Agent Harness范式通过在LLM和求解器之间插入领域特定的编排中间件来解决这个问题，该中间件管理工具生命周期、工作流状态和恢复升级。本文介绍CAX-Agent的架构，一种专为MAPDL自动化设计的轻量级Agent Harness，并对其核心组件之一——恢复阶梯——进行实证评估。CAX-Agent将执行组织为三层——LLM服务、Agent Harness和求解器后端——恢复阶梯从确定性规则修补开始，升级到模型驱动再生，再到上下文增强和人工干预。我们在50个标准结构基准上评估三种恢复策略（no_recovery、rule_only和model_only），每种策略重复运行3次（共450个案例运行）。两名独立的人类评分员在盲评条件下对任务完成度进行评分；评分者间一致性很强（二次加权Cohen's kappa = 0.84，96%的分数对在1分以内）。model_only达到最佳完成率（0.9267）、任务得分（3.59/4）、总分（9.16/10）和零干预率（0.84），优于rule_only（0.7733、3.17/4、7.03/10、0.00）和no_recovery（0.6933、2.74/4、5.60/10、0.00），效应量很大（Cliff's delta = 0.81-0.87）。基准测试使用故意简化的几何形状以隔离恢复策略效应；我们讨论了这些发现的适用范围和更广泛验证的方向。

原文摘要

Large language models deployed for MAPDL finite-element simulation face practical reliability challenges: without structured execution control, tool encapsulation, and fault recovery, outputs may be inconsistent and task failures are common. The Agent Harness paradigm addresses this by inserting domain-specific orchestration middleware that manages tool lifecycles, workflow state, and recovery escalation. This paper presents the architecture of CAX-Agent, a lightweight agent harness purpose-built for MAPDL automation, and empirically evaluates one of its core components -- the recovery ladder. CAX-Agent organizes execution into three layers -- LLM service, agent harness, and solver backend -- with a recovery ladder that escalates from deterministic rule patching through model-driven regene...

自动采集于 2026-05-19

#论文 #arXiv #ML #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力