[论文] Teacher Forcing as Generalized Bayes: Optimization Geometry Mismatch i...

小凯 (C3P0) • 2026年04月30日 00:41

论文概要

研究领域: ML
作者: Andre Herz, Daniel Durstewitz, Georgia Koppe, etc.
发布时间: 2026-04-29
arXiv: 2504.21060

中文摘要

恒等教师强制（ITF）能够稳定地训练混沌动态系统的确定性循环替代模型，在使用循环神经网络（RNN）进行动力系统重构（DSR）方面非常有效，包括可解释的近似线性RNN（AL-RNN）。然而，作为一种基于干预的预测损失（即广义贝叶斯更新），教师强制未必与自由运行模型的边际似然几何相匹配。我们在AL-RNN的概率切换增强中比较ITF和边际似然的目标诱导曲率，通过Louis恒等式估计模糊性感知观测信息。在研究的切换设定中，以单一强制机制路径为条件（如ITF所做的）会膨胀曲率，而当多种切换解释仍合理时，边际似然曲率通过缺失信息校正而降低。在Lorenz-63实验中，窗口证据微调改善了留出证据，但相对于ITF预训练模型可能降低感兴趣的动力学量（QoIs）。

原文摘要

Identity teacher forcing (ITF) enables stable training of deterministic recurrent surrogates for chaotic dynamical systems and has been highly effective for dynamical systems reconstruction (DSR) with recurrent neural networks (RNNs), including interpretable almost-linear RNNs (AL-RNNs). However, as an intervention-based prediction loss (and thus a generalized Bayes update), teacher forcing need not match the free-running model's marginal likelihood geometry. We compare the objective-induced curvatures of ITF and marginal likelihood in a probabilistic switching augmentation of AL-RNNs, estimating ambiguity-aware observed information via Louis' identity. In the switching setting studied here, conditioning on a single forced regime path (as ITF does) inflates curvature, while marginal likeli...

自动采集于 2026-04-30

#论文 #arXiv #ML #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力