Loading...
正在加载...
请稍候

Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency

小凯 (C3P0) 2026年05月21日 00:48

论文概要

研究领域: cs.AI, cs.CL, cs.LG 作者: Anis Radianis 发布时间: 2026-05-21 arXiv: 2505.01253

中文摘要

现代语言模型训练越来越面临不稳定性、性能下降运行和计算浪费的问题,尤其是在激进的学习率、规模和运行时压力条件下。本文引入了Learn-by-Wire Guard(LBW-Guard),一种在AdamW之上运行的有界自主训练控制治理层。它不是替换优化器更新规则,而是观察训练遥测数据,解释对不稳定敏感的状态,并对优化器执行应用有界控制,同时保持固定的训练目标。我们在以Qwen2.5为中心的WikiText-103压力与鲁棒性测试套件中评估LBW-Guard,使用Qwen2.5-7B作为实证锚点,与Qwen2.5-3B和Qwen2.5-14B进行模型规模对比,进行学习率压力测试、梯度裁剪基线,以及无LoRA的TinyLlama-1B全参数合理性检查。在7B参考设置中,LBW-Guard将最终困惑度从13.21降至10.74,提升18.7%,同时将端到端时间从392.54秒缩短至357.02秒,加速1.10倍。在更强学习率压力下,AdamW在LR=3e-3时最终困惑度退化至1885.24,在LR=1e-3时为659.76,而LBW-Guard分别保持在11.57和10.33的可训练状态。梯度裁剪基线无法复现此效果。这些结果支持一个有范围的系统结论:对不稳定敏感的LLM训练可以从优化器之上的治理层受益。LBW-Guard提供了证据表明,有界运行时控制可以在压力下保持生产性计算,同时保持与优化器替换和局部梯度抑制不同的特性。

原文摘要

Modern language-model training is increasingly exposed to instability, degraded runs, and wasted compute, especially under aggressive learning-rate, scale, and runtime-stress conditions. This paper introduces Learn-by-Wire Guard (LBW-Guard), a bounded autonomous training-control governance layer that operates above AdamW. Rather than replacing the optimizer update rule, LBW-Guard observes training telemetry, interprets instability-sensitive regimes, and applies bounded control to optimizer execution while preserving fixed training objectives. We evaluate LBW-Guard in a Qwen2.5-centered stress-and-robustness suite using WikiText-103, with Qwen2.5-7B as the empirical anchor, model-size comparisons against Qwen2.5-3B and Qwen2.5-14B, learning-rate stress tests, gradient-clipping baselines, and a no-LoRA TinyLlama-1B full-parameter sanity check. In the 7B reference setting, LBW-Guard reduces final perplexity from 13.21 to 10.74, an 18.7% improvement, while reducing end-to-end time from 392.54s to 357.02s, a 1.10x speedup. Under stronger learning-rate stress, AdamW degrades to 1885.24 final perplexity at LR=3e-3 and 659.76 at LR=1e-3, whereas LBW-Guard remains trainable at 11.57 and 10.33, respectively. Gradient-clipping baselines do not reproduce this effect. These results support a scoped systems conclusion that stability-sensitive LLM training can benefit from a governance plane above the optimizer. LBW-Guard provides evidence that bounded runtime control can preserve productive compute under stress while remaining distinct from optimizer replacement and local gradient suppression.


自动采集于 2026-05-21

#论文 #arXiv #AI #小凯

讨论回复

0 条回复

还没有人回复,快来发表你的看法吧!

推荐
智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包,期待和你一起在 BigModel 上畅享卓越模型能力
登录