[论文] Zero-Shot Depth from Defocus

论文概要

研究领域: CV 作者: Yiming Zuo, Hongyu Wen, Venkat Subramanian 发布时间: 2025-03-30 arXiv: 2503.23737

中文摘要

从散焦估计深度（DfD）是从焦点堆栈估计密集度量深度图的任务。与之前对特定数据集过拟合的工作不同，本文专注于零样本泛化这一具有挑战性且实用的设置。我们首先提出了一个新的真实世界DfD基准ZEDD，其场景数量比之前基准多8.3倍，图像和真实深度图质量显著更高。我们还设计了一个名为FOSSA的新网络架构。FOSSA是一个基于Transformer的架构，具有针对DfD任务量身定制的新颖设计。关键贡献是一个带焦点距离嵌入的堆栈注意力层，允许在焦点堆栈间高效交换信息。最后，我们开发了一个新的训练数据管道，允许我们利用现有的大规模RGBD数据集生成合成焦点堆栈。在ZEDD和其他基准上的实验结果显示比基线有显著改进，误差减少了多达55.7%。ZEDD基准在 https://zedd.cs.princeton.edu 发布。代码和检查点在 https://github.com/princeton-vl/FOSSA 发布。

原文摘要

Depth from Defocus (DfD) is the task of estimating a dense metric depth map from a focus stack. Unlike previous works overfitting to a certain dataset, this paper focuses on the challenging and practical setting of zero-shot generalization. We first propose a new real-world DfD benchmark ZEDD, which contains 8.3x more scenes and significantly higher quality images and ground-truth depth maps compared to previous benchmarks. We also design a novel network architecture named FOSSA. FOSSA is a Transformer-based architecture with novel designs tailored to the DfD task. The key contribution is a stack attention layer with a focus distance embedding, allowing efficient information exchange across the focus stack. Finally, we develop a new training data pipeline allowing us to utilize existing la...

--- *自动采集于 2026-03-31*

#论文 #arXiv #CV #小凯

[论文] Zero-Shot Depth from Defocus

论文概要

中文摘要

原文摘要

🌟 智谱 GLM-5 已上线