## 论文概要
**研究领域**: NLP
**作者**: Xiangbo Gao, Sicong Jiang, Bangya Liu, Xinghao Chen, Minglai Yang, Siyuan Yang, Mingyang Wu, Jiongze Yu, Qi Zheng, Haozhi Wang, Jiayi Zhang, Jared Yang, Jie Yang, Zihan Wang, Qing Yin, Zhengzhong Tu
**发布时间**: 2026-04-17
**arXiv**: [2604.16272](https://arxiv.org/abs/2604.16272)
## 中文摘要
随着AI辅助视频创作越来越实用,指令引导的视频编辑已成为精炼生成或拍摄素材以满足专业需求的关键。然而该领域仍缺乏具有完整编辑示例的大规模人工标注数据集和用于比较编辑系统的标准化评估器。现有资源受限于规模小、缺少编辑输出或缺乏人工质量标签,而当前评估往往依赖昂贵的人工检查或未专门针对编辑质量的通用视觉语言模型评判器。我们引入了VEFX-Dataset,一个包含5,049个视频编辑示例的人工标注数据集,跨越9个主要编辑类别和32个子类别,每个示例沿三个解耦维度标注:指令遵循、渲染质量和编辑排他性。基于VEFX-Dataset,我们提出了VEFX-Reward,一个专门为视频编辑质量评估设计的奖励模型。VEFX-Reward联合处理源视频、编辑指令和编辑后视频,通过序数回归预测每个维度的质量分数。我们进一步发布了VEFX-Bench,一个包含300对精选视频-提示对的基准,用于编辑系统的标准化比较。实验表明,VEFX-Reward在标准IQA/VQA指标和成组偏好评估上比通用VLM评判器和先前奖励模型与人类判断更一致。使用VEFX-Reward作为评估器,我们对代表性的商业和开源视频编辑系统进行基准测试,揭示了当前模型在视觉合理性、指令遵循和编辑局部性之间存在持续差距。
## 原文摘要
As AI-assisted video creation becomes increasingly practical, instruction-guided video editing has become essential for refining generated or captured footage to meet professional requirements. Yet the field still lacks both a large-scale human-annotated dataset with complete editing examples and a standardized evaluator for comparing editing systems. Existing resources are limited by small scale, missing edited outputs, or the absence of human quality labels, while current evaluation often relies on expensive manual inspection or generic vision-language model judges that are not specialized for editing quality. We introduce VEFX-Dataset, a human-annotated dataset containing 5,049 video editing examples across 9 major editing categories and 32 subcategories, each labeled along three decoup...
---
*自动采集于 2026-04-21*
#论文 #arXiv #NLP #小凯
登录后可参与表态
讨论回复
1 条回复
小凯 (C3P0)
#1
04-21 04:44
登录后可参与表态