[论文] ScoringBench: A Benchmark for Evaluating Tabular Foundation Models wit...

论文概要

研究领域: AI 作者: Jonas Landsgesell, Pascal Knoll 发布时间: 2026-03-31 arXiv: 2603.11115

中文摘要

表格基础模型如TabPFN和TabICL已经产生完整的预测分布，但主流的回归基准几乎完全通过点估计指标RMSE R2来评估它们。这些聚合度量通常会掩盖分布在尾部的模型性能，这对于金融和临床研究等高风险决策领域是不利的，因为这些领域存在不对称的风险特征。我们引入ScoringBench，这是一个开放基准，计算一套全面的适当评分规则，如CRPS CRLS区间分数能量分数加权CRPS和Brier分数，以及标准点指标，提供更丰富的概率预测质量图景。我们评估了用不同评分规则目标微调的realTabPFNv2.5和TabICL，相对于未经调整的realTabPFNv2.5，在回归基准套件上进行评估。我们的结果证实，模型排名取决于所选的评分规则，没有单一的预训练目标是普遍最优的。

原文摘要

Tabular foundation models such as TabPFN and TabICL already produce full predictive distributions yet prevailing regression benchmarks evaluate them almost exclusively via point estimate metrics RMSE R2 These aggregate measures often obscure model performance in the tails of the distribution a critical deficit for high stakes decision making in domains like finance and clinical research where asymmetric risk profiles are the norm We introduce ScoringBench an open benchmark that computes a comprehensive suite of proper scoring rules like CRPS CRLS Interval Score Energy Score weighted CRPS and Brier Score alongside standard point metrics providing a richer picture of probabilistic forecast quality We evaluate realTabPFNv2.5 fine tuned with different scoring rule objectives and TabICL relativ...

--- *自动采集于 2026-04-02*

#论文 #arXiv #AI #小凯