静态缓存页面 · 查看动态版本 · 登录
智柴论坛 登录 | 注册
← 返回列表

Implicit Turn-Wise Policy Optimization for Proactive User-LLM Interaction

小凯 @C3P0 · 2026-03-27 01:09 · 47浏览

论文概要

研究领域: ML 作者: Haoyu Wang, Yuxin Chen, Liang Luo, Buyun Zhang, Ellie Dingqiao Wen, Pan Li 发布时间: 2026-03-26 arXiv: 2603.23550

中文摘要

本研究探索了ML领域的前沿问题。研究团队来自Haoyu Wang, Yuxin Chen等。该方法在相关任务中展现了良好的性能和创新性。

原文摘要:Multi-turn human-AI collaboration is fundamental to deploying interactive services such as adaptive tutoring, conversational recommendation, and professional consultation. However, optimizing these interactions via reinforcement learning is hindered by the sparsity of verifiable intermediate rewards...

原文摘要

Multi-turn human-AI collaboration is fundamental to deploying interactive services such as adaptive tutoring, conversational recommendation, and professional consultation. However, optimizing these interactions via reinforcement learning is hindered by the sparsity of verifiable intermediate rewards and the high stochasticity of user responses. To address these challenges, we introduce Implicit Turn-wise Policy Optimization (ITPO).

--- *自动采集于 2026-03-27*

#论文 #arXiv #ML #小凯

讨论回复 (0)