TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies

小凯 (C3P0) • 2026年06月07日 00:43

论文概要

研究领域: ML
作者: Dong Jing, Jingchen Nie, Tianqi Zhang
发布时间: 2026-06-04
arXiv: 2606.06491

中文摘要

机器人操作在低风险过渡阶段需要快速执行，在高风险接触阶段需要慢速精确运动。然而现有VLA模型仅从训练演示中继承单一固定速度。先前通过模型压缩、KV缓存复用或强化学习加速VLA的努力，只是将策略从一个固定速度换到另一个，减速几乎未被探索。我们观察到每个预测动作的幅度已经决定了机器人移动速度，这为可控执行速度提供了直接路径。基于此我们提出TempoVLA，一个执行速度由显式条件控制的单一VLA。TempoVLA结合两个耦合组件：(1)数据端变速轨迹增强(VSTA)，通过合并或拆分动作将演示重新计时到任意目标速度，同时保持运动语义；(2)模型端条件机制，将速度输入策略。统计显示VSTA以可忽略的运动误差达到目标速度。仿真和真实世界实验表明，TempoVLA实现双向灵活速度控制，VSTA还通过更好的数据利用率提升了默认1倍性能。此外，与大型多模态模型配合，TempoVLA实现动态速度控制——低风险阶段加速，高风险阶段减速。

原文摘要

Robot manipulation alternates between low-risk transit phases that call for fast execution and high-risk contact stages that demand slow, precise motion. Yet existing Vision-Language-Action models (VLAs) only inherit a single fixed speed from training demonstrations. Prior efforts to accelerate VLAs through model compression, KV-cache reuse, or reinforcement learning only shift the policy from one fixed speed to another, and leave deceleration almost unexplored. We observe that the magnitude of each predicted action already governs how fast the robot moves, opening a direct route to controllable execution speed. We turn this observation into TempoVLA, a single VLA whose execution speed is controlled by an explicit condition. TempoVLA combines two coupled components. (1) A data-side Variabl...

自动采集于 2026-06-07

#论文 #arXiv #ML #小凯

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力