[论文] MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generatio...

论文概要

研究领域: NLP 作者: Yan Li, Zezi Zeng, Yifan Yang 发布时间: 2025-04-17 arXiv: 2504.13095

中文摘要

人工智能生成内容（AIGC）工具的快速发展使得图像、视频和可视化内容能够按需创建用于网页设计，为现代UI/UX提供了一种灵活且日益普及的范式。然而，直接将此类工具集成到自动化网页生成中往往导致风格不一致和全局连贯性差的问题，因为元素是孤立生成的。我们提出MM-WebAgent，一种用于多模态网页生成的分层代理框架，通过分层规划和迭代自我反思协调基于AIGC的元素生成。MM-WebAgent联合优化全局布局、局部多模态内容及其整合，生成连贯且视觉一致的网页。我们进一步引入了多模态网页生成基准和多级评估协议以进行系统评估。实验表明，MM-WebAgent在代码生成和基于代理的基线方法上表现更优，尤其在多模态元素生成和整合方面。代码与数据：https://aka.ms/mm-webagent

原文摘要

The rapid progress of Artificial Intelligence Generated Content (AIGC) tools enables images, videos, and visualizations to be created on demand for webpage design, offering a flexible and increasingly adopted paradigm for modern UI/UX. However, directly integrating such tools into automated webpage generation often leads to style inconsistency and poor global coherence, as elements are generated in isolation. We propose MM-WebAgent, a hierarchical agentic framework for multimodal webpage generation that coordinates AIGC-based element generation through hierarchical planning and iterative self-reflection. MM-WebAgent jointly optimizes global layout, local multimodal content, and their integration, producing coherent and visually consistent webpages. We further introduce a benchmark for mult...

--- *自动采集于 2026-04-18*

#论文 #arXiv #NLP #小凯

[论文] MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generatio...

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线