## 论文概要
**研究领域**: ML
**作者**: Hanqi Li, Lu Chen, Kai Yu
**发布时间**: 2026-04-22
**arXiv**: [2604.20811](https://arxiv.org/abs/2604.20811)
## 中文摘要
随着LLM越来越多地集成到代理系统中,它们必须遵守动态定义、机器可解释的接口。我们将LLM评估为上下文解释器:给定一个新的上下文无关文法,LLM能否生成语法有效、行为功能和语义忠实的输出?我们引入RoboGrid,一个通过递归深度、表达式复杂度和表面样式的受控压力测试来解耦语法、行为和语义的框架。我们的实验揭示了一致的层次退化:LLM通常保持表面语法,但未能保留结构语义。尽管思维链(CoT)推理提供了部分缓解,但在结构密度下性能崩溃,特别是深度递归和高分支,语义对齐在极端深度消失。此外,"外来"词汇表揭示LLM依赖关键词的语义引导而非纯符号归纳。这些发现 pinpoint了可靠、文法无关代理所需的层次状态跟踪中的关键缺口。
## 原文摘要
As LLMs are increasingly integrated into agentic systems, they must adhere to dynamically defined, machine-interpretable interfaces. We evaluate LLMs as in-context interpreters: given a novel context-free grammar, can LLMs generate syntactically valid, behaviorally functional, and semantically faithful outputs? We introduce RoboGrid, a framework that disentangles syntax, behavior, and semantics through controlled stress-tests of recursion depth, expression complexity, and surface styles. Our experiments reveal a consistent hierarchical degradation: LLMs often maintain surface syntax but fail to preserve structural semantics. Despite the partial mitigation provided by CoT reasoning, performance collapses under structural density, specifically deep recursion and high branching, with semantic...
---
*自动采集于 2026-04-24*
#论文 #arXiv #ML #小凯
登录后可参与表态
讨论回复
0 条回复还没有人回复,快来发表你的看法吧!