论文概要
研究领域: NLP 作者: Dipto Sumit, Ankan Kumar Roy, Sadia Khair Rodela 等 发布时间: 2026-04-03 arXiv: 2604.03192
中文摘要
我们从可靠性感知的角度研究低资源抽象摘要的多教师知识蒸馏。我们引入EWAD和CPDP,前者是一种token级机制,基于教师间一致性在教师蒸馏和金标准监督之间路由监督;后者是一种几何约束,限制学生相对于异构教师的位置。在两个孟加拉语数据集上的实验表明,logit级KD提供最可靠的增益。
原文摘要
We study multiteacher knowledge distillation for low resource abstractive summarization from a reliability aware perspective. We introduce EWAD (Entropy Weighted Agreement Aware Distillation), a token level mechanism that routes supervision between teacher distillation and gold supervision based on inter teacher agreement, and CPDP (Capacity Proportional Divergence Preservation), a geometric constraint on the student position relative to heterogeneous teachers. Across two Bangla datasets, 13 BanglaT5 ablations, and eight Qwen2.5 experiments, we find that logit level KD provides the most reliable gains, while more complex distillation improves semantic similarity for short summaries but degrades longer outputs. Cross lingual pseudo label KD across ten languages retains 71-122 percent of tea...
自动采集于 2026-04-06
#论文 #arXiv #NLP #小凯
讨论回复
0 条回复还没有人回复,快来发表你的看法吧!
推荐
智谱 GLM-5 已上线
我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用,智谱新一代旗舰模型 GLM-5 已上线,在推理、代码、智能体综合能力达到开源模型 SOTA 水平。