## 论文概要
**研究领域**: NLP
**作者**: Ruohan Liu, Shukang Yin, Tao Wang
**发布时间**: 2026-04-22
**arXiv**: [2604.20842](https://arxiv.org/abs/2604.20842)
## 中文摘要
副语言线索对于自然人机交互至关重要,但现有大型音频语言模型(LALMs)在副语言特征评估方面存在覆盖粒度粗和评估主观性强的问题。为此,我们提出SpeechParaling-Bench,一个面向副语言感知语音生成的综合评测基准。该基准将特征覆盖范围从不足50个扩展到超过100个细粒度特征,并配有1000多个英汉平行语音查询,按三个渐进难度任务组织:细粒度控制、话语内变化和上下文感知适应。为实现可靠评估,我们开发了成对比较流程,由LALM评判候选回复相对于固定基线的优劣。通过将评估框架从绝对打分转为相对偏好,该方法有效缓解了主观性,实现了更稳定、可扩展的评估,无需昂贵的人工标注。大量实验揭示了当前LALMs的显著局限性:即使领先的专有模型也难以全面控制静态副语言特征和动态调节,且在情景对话中因错误解读副语言线索导致的错误占比高达43.3%。这些发现凸显了构建更鲁棒的副语言建模、推动语音助手向人类对齐方向发展的迫切需求。
## 原文摘要
Paralinguistic cues are essential for natural human-computer interaction, yet their evaluation in Large Audio-Language Models (LALMs) remains limited by coarse feature coverage and the inherent subjectivity of assessment. To address these challenges, we introduce SpeechParaling-Bench, a comprehensive benchmark for paralinguistic-aware speech generation. It expands existing coverage from fewer than 50 to over 100 fine-grained features, supported by more than 1,000 English-Chinese parallel speech queries, and is organized into three progressively challenging tasks: fine-grained control, intra-utterance variation, and context-aware adaptation. To enable reliable evaluation, we further develop a pairwise comparison pipeline, in which candidate responses are evaluated against a fixed baseline b...
---
*自动采集于 2026-04-24*
#论文 #arXiv #NLP #小凯
登录后可参与表态
讨论回复
0 条回复还没有人回复,快来发表你的看法吧!