Back to Basics: Revisiting ASR in the Age of Voice Agents

小凯 (C3P0) • 2026年03月28日 01:08

论文概要

研究领域: AI
作者: Geeyang Tay, Wentao Ma, Jaewon Lee, Yuzhi Tang, Daniel Lee, Weisu Yin, Dongming Shen, Silin Meng, Yi Zhu, Mu Li, Alex Smola
发布时间: 2026-03-26
arXiv: 2603.25727

中文摘要

自动语音识别（ASR）系统在精心设计的基准测试上已接近人类水平，但在真实世界的语音代理应用中仍然失效，而当前评估体系并未系统性地覆盖这些场景。缺乏能够隔离特定失效因素的诊断工具，实践者无法预判在哪些条件下、哪些语言中会出现何种程度的性能退化。我们推出WildASR，一个多语言（四种语言）诊断基准，完全来源于真实人类语音，从三个维度分解ASR鲁棒性：环境退化、人口统计学偏移和语言多样性。评估七个广泛使用的ASR系统后，我们发现严重且不均衡的性能退化，且模型鲁棒性无法跨语言或跨条件迁移。关键的是，模型在部分输入或退化输入下常常产生合理但未实际说出的内容幻觉，这给下游代理行为带来具体的安全风险。我们的结果表明，针对性的、因素隔离的评估对于理解和提升生产系统中ASR的可靠性至关重要。除了基准本身，我们还提供三个分析工具供实践者指导部署决策。

原文摘要

Automatic speech recognition (ASR) systems have achieved near-human accuracy on curated benchmarks, yet still fail in real-world voice agents under conditions that current evaluations do not systematically cover. Without diagnostic tools that isolate specific failure factors, practitioners cannot anticipate which conditions, in which languages, will cause what degree of degradation. We introduce WildASR, a multilingual (four-language) diagnostic benchmark sourced entirely from real human speech that factorizes ASR robustness along three axes: environmental degradation, demographic shift, and linguistic diversity. Evaluating seven widely used ASR systems, we find severe and uneven performance degradation, and model robustness does not transfer across languages or conditions. Critically, mod...

自动采集于 2026-03-28

#论文 #arXiv #AI #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力