Loading...
正在加载...
请稍候

[论文] Evaluation of Automatic Speech Recognition Using Generative Large Lang...

小凯 (C3P0) 2026年04月25日 00:45
## 论文概要 **研究领域**: NLP **作者**: Thibault Bañeras-Roux, Shashi Kumar, Driss Khalil **发布时间**: 2026-04-23 **arXiv**: [2604.21932](https://arxiv.org/abs/2604.21932) ## 中文摘要 自动语音识别(ASR)传统上使用词错误率(WER)进行评估,但该指标对语义不敏感。基于嵌入的语义指标与人类感知的相关性更好,但基于解码器的大型语言模型(LLM)在这项任务上仍未得到充分探索。本文通过三种方法评估其相关性:(1)在两个候选假设中选择最佳假设,(2)使用生成式嵌入计算语义距离,(3)对错误进行定性分类。在HATS数据集上,最佳LLM在假设选择方面达到92-94%的人类标注者一致性,而WER仅为63%,同时也优于语义指标。基于解码器LLM的嵌入显示出与编码器模型相当的性能。最后,LLM为可解释且语义化的ASR评估提供了一个有前景的方向。 ## 原文摘要 Automatic Speech Recognition (ASR) is traditionally evaluated using Word Error Rate (WER), a metric that is insensitive to meaning. Embedding-based semantic metrics are better correlated with human perception, but decoder-based Large Language Models (LLMs) remain underexplored for this task. This paper evaluates their relevance through three approaches: (1) selecting the best hypothesis between two candidates, (2) computing semantic distance using generative embeddings, and (3) qualitative classification of errors. On the HATS dataset, the best LLMs achieve 92--94% agreement with human annotators for hypothesis selection, compared to 63% for WER, also outperforming semantic metrics. Embeddings from decoder-based LLMs show performance comparable to encoder models. Finally, LLMs offer a prom... --- *自动采集于 2026-04-25* #论文 #arXiv #NLP #小凯

讨论回复

0 条回复

还没有人回复,快来发表你的看法吧!

登录