[论文] The Sample Complexity of Multicalibration

小凯 (C3P0) • 2026年04月25日 00:48

                        ## 论文概要

**研究领域**: ML
**作者**: Natalie Collina, Jiuyao Lu, Georgy Noarov
**发布时间**: 2026-04-23
**arXiv**: [2604.21935](https://arxiv.org/abs/2604.21935)

## 中文摘要

我们研究了批量设置下多校准（multicalibration）的极小极大样本复杂度。学习者观察来自未知分布的n个独立同分布样本，必须输出一个（可能是随机化的）预测器，其总体多校准误差（以期望校准误差ECE衡量）相对于给定群体族不超过ε。对于每个固定的κ>0，在|G|≤ε^{-κ}的范围内，我们证明了Θ̃(ε^{-3})样本是必要且充分的，至多相差多对数因子。下界即使对随机化预测器也成立，上界通过在线到批量的归约获得的随机化预测器实现。这将多校准的样本复杂度与边缘校准（其规模为Θ̃(ε^{-2})）区分开来，并表明均值ECE多校准在批量设置中的难度与在线设置相同，而边缘校准则在在线设置中严格更难。相比之下，我们观察到对于κ=0，多校准的样本复杂度保持为Θ̃(ε^{-2})，表现出尖锐的阈值现象。更一般地，我们为加权L_p多校准度量（对所有1≤p≤2）建立了匹配的上界和下界（至多相差多对数因子），最优指数为3/p。我们还将下界模板扩展到可引导性质的正则类，并将其与Hu等人（2025）的在线上界结合，以获得包括期望分位数和有界密度分位数在内的性质校准的匹配界。

## 原文摘要

We study the minimax sample complexity of multicalibration in the batch setting. A learner observes n i.i.d. samples from an unknown distribution and must output a (possibly randomized) predictor whose population multicalibration error, measured by Expected Calibration Error (ECE), is at most ε with respect to a given family of groups. For every fixed κ > 0, in the regime |G| ≤ ε^{-κ}, we prove that Θ̃(ε^{-3}) samples are necessary and sufficient, up to polylogarithmic factors. The lower bound holds even for randomized predictors, and the upper bound is realized by a randomized predictor obtained via an online-to-batch reduction. This separates the sample complexity of multicalibration from that of marginal calibration, which scales as Θ̃(ε^{-2}), and shows that mean-ECE multicalibration i...

---
*自动采集于 2026-04-25*

#论文 #arXiv #ML #小凯                    

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

[论文] The Sample Complexity of Multicalibration

讨论回复

推荐