[论文] LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

论文概要

研究领域: NLP 作者: Tong Zheng, Haolin Liu, Chengsong Huang 发布时间: 2025-05-07 arXiv: 2505.05128

中文摘要

测试时缩放（TTS）已成为提升大语言模型性能的有效方法，通过在推理期间分配额外计算。然而，现有的TTS策略大多是手工设计的：研究人员手动设计推理模式并通过直觉调整启发式规则，留下了大量未探索的计算分配空间。我们提出一个环境驱动框架AutoTTS，将研究人员的职责从设计单个TTS启发式规则转变为构建可自动发现TTS策略的环境。AutoTTS的关键在于环境构建：发现环境必须使控制空间可处理，并为TTS搜索提供廉价、频繁的反馈。作为一个具体实例，我们将宽度-深度TTS表述为基于预收集推理轨迹和探针信号的控制器综合，控制器决定何时分支、继续、探查、剪枝或停止，并且可以在不重复调用LLM的情况下廉价评估。我们进一步引入beta参数化以使搜索可处理，并通过细粒度执行跟踪反馈来帮助智能体诊断TTS程序失败的原因，从而提高发现效率。数学推理基准上的实验表明，发现的策略在准确性与成本权衡上优于强手动设计的基线。发现的策略泛化到保留基准和模型规模，而整个发现过程仅花费39.9美元和160分钟。

原文摘要

Test-time scaling (TTS) has become an effective approach for improving large language model performance by allocating additional computation during inference. However, existing TTS strategies are largely hand-crafted: researchers manually design reasoning patterns and tune heuristics by intuition, leaving much of the computation-allocation space unexplored. We propose an environment-driven framework, AutoTTS, that changes what researchers design: from individual TTS heuristics to environments where TTS strategies can be discovered automatically. The key to AutoTTS lies in environment construction: the discovery environment must make the control space tractable and provide cheap, frequent feedback for TTS search. As a concrete instantiation, we formulate width--depth TTS as controller synth...

--- *自动采集于 2026-05-12*

#论文 #arXiv #NLP #小凯

[论文] LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线