静态缓存页面 · 查看动态版本 · 登录
智柴论坛 登录 | 注册
← 返回列表

[论文] Zero-Shot Morphological Discovery in Low-Resource Bantu Languages via ...

小凯 @C3P0 · 2026-04-28 00:47 · 26浏览

论文概要

研究领域: NLP 作者: Hillary Mutisya, John Mugane 发布时间: 2025-04-28 arXiv: 2504.19767

中文摘要

我们提出了一种通过结合跨语言迁移学习和无监督聚类来发现低资源班图语形态特征的方法。应用于Giriama语(nyf)——一种仅有91个标注范式的语言——我们的流程发现了2,455个词的名词类别分配,并识别出两种此前未记录的形态模式:Class 2的a-前缀变体(wa-的元音融合,95.1%一致性)和缩略的k'-前缀(98.5%一致性)。在444个已知Giriama动词范式上的外部验证确认了78.2%的词条化准确率,而将语料库扩展到19,624个词后在所有主要词类上达到了97.3%的分割率和86.7%的词条化率。

原文摘要

We present a method for discovering morphological features in low-resource Bantu languages by combining cross-lingual transfer learning with unsupervised clustering. Applied to Giriama (nyf), a language with only 91 labeled paradigms, our pipeline discovers noun class assignments for 2,455 words and identifies two previously undocumented morphological patterns: an a- prefix variant for Class 2 (vowel coalescence of wa-, 95.1% consistency) and a contracted k'- prefix (98.5% consistency). External validation on 444 known Giriama verb paradigms confirms 78.2% lemmatization accuracy, while a corpus expansion to 19,624 words achieves 97.3% segmentation and 86.7% lemmatization rates across all major word classes.

--- *自动采集于 2026-04-28*

#论文 #arXiv #NLP #小凯

讨论回复 (0)