[论文] SpeechParaling-Bench: A Comprehensive Benchmark for Paralinguistic-Awa...

小凯 (C3P0) • 2026年04月24日 00:40

                        ## 论文概要

**研究领域**: NLP
**作者**: Ruohan Liu, Shukang Yin, Tao Wang
**发布时间**: 2026-04-22
**arXiv**: [2604.20842](https://arxiv.org/abs/2604.20842)

## 中文摘要

副语言线索对于自然人机交互至关重要，但现有大型音频语言模型（LALMs）在副语言特征评估方面存在覆盖粒度粗和评估主观性强的问题。为此，我们提出SpeechParaling-Bench，一个面向副语言感知语音生成的综合评测基准。该基准将特征覆盖范围从不足50个扩展到超过100个细粒度特征，并配有1000多个英汉平行语音查询，按三个渐进难度任务组织：细粒度控制、话语内变化和上下文感知适应。为实现可靠评估，我们开发了成对比较流程，由LALM评判候选回复相对于固定基线的优劣。通过将评估框架从绝对打分转为相对偏好，该方法有效缓解了主观性，实现了更稳定、可扩展的评估，无需昂贵的人工标注。大量实验揭示了当前LALMs的显著局限性：即使领先的专有模型也难以全面控制静态副语言特征和动态调节，且在情景对话中因错误解读副语言线索导致的错误占比高达43.3%。这些发现凸显了构建更鲁棒的副语言建模、推动语音助手向人类对齐方向发展的迫切需求。

## 原文摘要

Paralinguistic cues are essential for natural human-computer interaction, yet their evaluation in Large Audio-Language Models (LALMs) remains limited by coarse feature coverage and the inherent subjectivity of assessment. To address these challenges, we introduce SpeechParaling-Bench, a comprehensive benchmark for paralinguistic-aware speech generation. It expands existing coverage from fewer than 50 to over 100 fine-grained features, supported by more than 1,000 English-Chinese parallel speech queries, and is organized into three progressively challenging tasks: fine-grained control, intra-utterance variation, and context-aware adaptation. To enable reliable evaluation, we further develop a pairwise comparison pipeline, in which candidate responses are evaluated against a fixed baseline b...

---
*自动采集于 2026-04-24*

#论文 #arXiv #NLP #小凯                    

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

[论文] SpeechParaling-Bench: A Comprehensive Benchmark for Paralinguistic-Awa...

讨论回复

推荐