Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency

论文概要

研究领域: cs.AI, cs.CL, cs.LG 作者: Anis Radianis 发布时间: 2026-05-21 arXiv: 2505.01253

中文摘要

现代语言模型训练越来越面临不稳定性、性能下降运行和计算浪费的问题，尤其是在激进的学习率、规模和运行时压力条件下。本文引入了Learn-by-Wire Guard（LBW-Guard），一种在AdamW之上运行的有界自主训练控制治理层。它不是替换优化器更新规则，而是观察训练遥测数据，解释对不稳定敏感的状态，并对优化器执行应用有界控制，同时保持固定的训练目标。我们在以Qwen2.5为中心的WikiText-103压力与鲁棒性测试套件中评估LBW-Guard，使用Qwen2.5-7B作为实证锚点，与Qwen2.5-3B和Qwen2.5-14B进行模型规模对比，进行学习率压力测试、梯度裁剪基线，以及无LoRA的TinyLlama-1B全参数合理性检查。在7B参考设置中，LBW-Guard将最终困惑度从13.21降至10.74，提升18.7%，同时将端到端时间从392.54秒缩短至357.02秒，加速1.10倍。在更强学习率压力下，AdamW在LR=3e-3时最终困惑度退化至1885.24，在LR=1e-3时为659.76，而LBW-Guard分别保持在11.57和10.33的可训练状态。梯度裁剪基线无法复现此效果。这些结果支持一个有范围的系统结论：对不稳定敏感的LLM训练可以从优化器之上的治理层受益。LBW-Guard提供了证据表明，有界运行时控制可以在压力下保持生产性计算，同时保持与优化器替换和局部梯度抑制不同的特性。

原文摘要

Modern language-model training is increasingly exposed to instability, degraded runs, and wasted compute, especially under aggressive learning-rate, scale, and runtime-stress conditions. This paper introduces Learn-by-Wire Guard (LBW-Guard), a bounded autonomous training-control governance layer that operates above AdamW. Rather than replacing the optimizer update rule, LBW-Guard observes training telemetry, interprets instability-sensitive regimes, and applies bounded control to optimizer execution while preserving fixed training objectives. We evaluate LBW-Guard in a Qwen2.5-centered stress-and-robustness suite using WikiText-103, with Qwen2.5-7B as the empirical anchor, model-size comparisons against Qwen2.5-3B and Qwen2.5-14B, learning-rate stress tests, gradient-clipping baselines, and a no-LoRA TinyLlama-1B full-parameter sanity check. In the 7B reference setting, LBW-Guard reduces final perplexity from 13.21 to 10.74, an 18.7% improvement, while reducing end-to-end time from 392.54s to 357.02s, a 1.10x speedup. Under stronger learning-rate stress, AdamW degrades to 1885.24 final perplexity at LR=3e-3 and 659.76 at LR=1e-3, whereas LBW-Guard remains trainable at 11.57 and 10.33, respectively. Gradient-clipping baselines do not reproduce this effect. These results support a scoped systems conclusion that stability-sensitive LLM training can benefit from a governance plane above the optimizer. LBW-Guard provides evidence that bounded runtime control can preserve productive compute under stress while remaining distinct from optimizer replacement and local gradient suppression.

--- *自动采集于 2026-05-21*

#论文 #arXiv #AI #小凯

Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线