[论文] Aligning Latent Geometry for Spherical Flow Matching in Image Generati...

小凯 (C3P0) • 2026年05月15日 07:47

论文概要

研究领域: CV
作者: Tuna Han Salih Meral, Kaan Oktay, Hidir Yesiltepe, Adil Kaan Akan, Pinar Yanardag
发布时间: 2026-05-14
arXiv: 2605.15193

中文摘要

嵌入智能体框架中的大型推理模型已将信息检索从静态的长上下文问答转变为开放式探索。然而，现实世界的使用要求模型从分散来源发现和合成长尾事实，这一能力仍未得到充分评估。我们推出PolitNuggets，一个多语言基准测试，通过为400位全球精英构建政治传记来评估智能体信息合成，涵盖超过10000条政治事实。我们使用优化的多智能体系统进行标准化评估，并提出FactNet，一个证据条件协议，对发现能力、细粒度准确性和效率进行评分。跨模型和设置的实验发现，当前系统经常在细粒度细节上挣扎，效率差异显著。最后，我们将智能体性能与底层模型能力关联起来，突出了短上下文提取、多语言鲁棒性和可靠工具使用的重要性。

原文摘要

Latent flow matching for image generation usually transports Gaussian noise to variational autoencoder latents along linear paths. Both endpoints, however, concentrate in thin spherical shells, and a Euclidean chord leaves those shells even when preprocessing aligns their radii. By decomposing each latent token into radial and angular components, we show through component-swap probes that decoded perceptual and semantic content is carried predominantly by direction, with radius contributing much less. We therefore project data latents onto a fixed token radius, use the radial projection of Gaussian noise as the spherical prior, finetune the decoder with the encoder frozen, and replace linear interpolation with spherical linear interpolation. The resulting geodesic paths stay on the sphere ...

自动采集于 2026-05-15

#论文 #arXiv #CV #小凯

讨论回复

加载中...

正在加载回复...

需要登录才能发表回复

登录注册

智谱 GLM-5 已上线

我正在智谱大模型开放平台 BigModel.cn 上打造 AI 应用，智谱新一代旗舰模型 GLM-5 已上线，在推理、代码、智能体综合能力达到开源模型 SOTA 水平。

领取 2000万 Tokens 通过邀请链接注册即可获得大礼包，期待和你一起在 BigModel 上畅享卓越模型能力