[论文] LoViF 2026 The First Challenge on Holistic Quality Assessment for 4D World Model (PhyScore)

论文概要

研究领域: CV 作者: Wei Luo, Yiting Lu, Xin Li, Haoran Li, Fengbin Guan, Chen Gao, Xin Jin, Yong Li, Zhibo Chen, Sijing Wu, Kang Fu, Yunhao Li, Ziang Xiao, Huiyu Duan, Jing Liu, Qiang Hu, Xiongkuo Min, Guangtao Zhai, Manxi Sun, Zixuan Guo, Yun Li, Ziyang Chen, Manabu Tsukada, Zhengyang Li, Zhenglin Du, Yi Wen, Licheng Jiao, Fang Liu, Lingling Li, Yiwen Ren, Zhilong Song, Dubing Chen, Yucheng Zhou, Tianyi Yan, Huan Zheng 发布时间: 2026-05-06 arXiv: 2605.05187

中文摘要

本文报告了LoViF 2026 PhyScore挑战赛，这是一项关于世界模型生成视频的整体质量评估竞赛，涵盖2D和4D生成设置。该挑战的动机源于当前评估实践中的一个核心差距：仅凭感知质量不足以判断生成的动态是否在物理上合理、时间上一致且与输入条件相符。参与者需要构建一个能够联合预测四个维度的指标：视频质量、物理真实性、条件-视频对齐和时间一致性。除此之外，参与者还需要定位物理异常时间戳以进行细粒度诊断。基准数据集包含1,554个由七个代表性世界生成模型生成的视频，组织为三个赛道（文本到2D、图像到4D和视频到4D），涵盖26个类别。这些类别明确涵盖了与物理相关的场景，包括动力学、光学和热力学，以及多样化的现实世界和创意内容。为确保标签可靠性，分数和异常时间戳通过训练有素的人工标注产生，并辅以额外的自动质量控制。评估基于分数预测和异常定位，采用结合TimeStamp_IOU和SRCC/PLCC的复合协议。本报告总结了挑战赛设计并提供了提交解决方案的方法级洞察。

原文摘要

This paper reports on the LoViF 2026 PhyScore challenge, a competition on holistic quality assessment of world-model-generated videos across both 2D and 4D generation settings. The challenge is motivated by a central gap in current evaluation practice: perceptual quality alone is insufficient to judge whether generated dynamics are physically plausible, temporally coherent, and consistent with input conditions. Participants are required to build a metric that jointly predicts four dimensions, i.e., Video Quality, Physical Realism, Condition-Video Alignment, and Temporal Consistency. Depart from that, participants also need to localize physical anomaly timestamps for fine-grained diagnosis. The benchmark dataset contains 1,554 videos generated by seven representative world generative models, organized into three tracks (text-2D, image-to-4D, and video-to-4D) and spanning 26 categories. These categories explicitly cover physics-relevant scenarios, including dynamics, optics, and thermodynamics, together with diverse real-world and creative content. To ensure label reliability, scores and anomaly timestamps are produced through trained human annotation with an additional automated quality-control pass. Evaluation is based on both score prediction and anomaly localization, with a composite protocol that combines TimeStamp_IOU and SRCC/PLCC. This report summarizes the challenge design and provides method-level insights from submitted solutions.

--- *自动采集于 2026-05-08*

#论文 #arXiv #CV #小凯

[论文] LoViF 2026 The First Challenge on Holistic Quality Assessment for 4D World Model (PhyScore)

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线