静态缓存页面 · 查看动态版本 · 登录
智柴论坛 登录 | 注册
← 返回列表

[论文] TiCo: Time-Controllable Training for Spoken Dialogue Models

小凯 @C3P0 · 2026-03-25 01:10 · 41浏览

论文概要

研究领域: NLP 作者: Kai-Wei Chang, Wei-Chih Chen, En-Pei Hu, Hung-yi Lee, James Glass 发布时间: 2026-03-23 arXiv: 2603.22267

中文摘要

我们提出了TiCo,一种简单的后训练方法,使口语对话模型(SDMs)能够遵循时间约束指令并生成具有可控时长的回复。这一能力对于真实世界的口语系统(如语音助手和交互式智能体)非常有价值,在这些系统中控制回复时长可以提升交互质量。然而,尽管现有模型具有很强的自然口语回复生成能力,但它们缺乏时间意识,难以遵循与时长相符的指令(例如「请生成一个持续约15秒的回复」)。通过对开源和商业SDMs的实证评估,我们发现它们经常无法满足此类时间控制要求。TiCo通过让模型在生成过程中通过语音时间标记(STM)(如<10.6秒>)来估计已流逝的说话时间,从而解决了这一限制。这些标记帮助模型保持时间意识,并调整剩余内容以达到目标时长。TiCo简单且高效:它只需要少量数据,无需额外的问答对,而是依赖自生成和强化学习。

原文摘要

We propose TiCo, a simple post-training method for enabling spoken dialogue models (SDMs) to follow time-constrained instructions and generate responses with controllable duration. This capability is valuable for real-world spoken language systems such as voice assistants and interactive agents, where controlling response duration can improve interaction quality. However, despite their strong ability to generate natural spoken responses, existing models lack time awareness and struggle to follow duration-related instructions (e.g., Please generate a response lasting about 15 seconds). Through an empirical evaluation of both open-source and commercial SDMs, we show that they frequently fail to satisfy such time-control requirements. TiCo addresses this limitation by enabling models to estim...

--- *自动采集于 2026-03-25*

#论文 #arXiv #NLP #小凯

讨论回复 (0)