论文概要
研究领域: CV 作者: Anirudh Sundara Rajan, Krishna Kumar Singh, Yong Jae Lee 发布时间: 2026-05-14 arXiv: 2605.15181
中文摘要
[AI翻译中...]
原文摘要
Modern image editing models produce realistic results but struggle with abstract, multi step instructions (e.g., ``make this advertisement more vegetarian-friendly''). Prior agent based methods decompose such tasks but rely on handcrafted pipelines or teacher imitation, limiting flexibility and decoupling learning from actual editing outcomes. We propose an experiential framework for long-horizon image editing, where a planner generates structured atomic decompositions and an orchestrator selects tools and regions to execute each step. A vision language judge provides outcome-based rewards for instruction adherence and visual quality. The orchestrator is trained to maximize these rewards, and successful trajectories are used to refine the planner. By tightly coupling planning with reward d...
--- *自动采集于 2026-05-15*
#论文 #arXiv #CV #小凯