## 论文概要
**研究领域**: ML
**作者**: Natalie Collina, Jiuyao Lu, Georgy Noarov
**发布时间**: 2026-04-23
**arXiv**: [2604.21935](https://arxiv.org/abs/2604.21935)
## 中文摘要
我们研究了批量设置下多校准(multicalibration)的极小极大样本复杂度。学习者观察来自未知分布的n个独立同分布样本,必须输出一个(可能是随机化的)预测器,其总体多校准误差(以期望校准误差ECE衡量)相对于给定群体族不超过ε。对于每个固定的κ>0,在|G|≤ε^{-κ}的范围内,我们证明了Θ̃(ε^{-3})样本是必要且充分的,至多相差多对数因子。下界即使对随机化预测器也成立,上界通过在线到批量的归约获得的随机化预测器实现。这将多校准的样本复杂度与边缘校准(其规模为Θ̃(ε^{-2}))区分开来,并表明均值ECE多校准在批量设置中的难度与在线设置相同,而边缘校准则在在线设置中严格更难。相比之下,我们观察到对于κ=0,多校准的样本复杂度保持为Θ̃(ε^{-2}),表现出尖锐的阈值现象。更一般地,我们为加权L_p多校准度量(对所有1≤p≤2)建立了匹配的上界和下界(至多相差多对数因子),最优指数为3/p。我们还将下界模板扩展到可引导性质的正则类,并将其与Hu等人(2025)的在线上界结合,以获得包括期望分位数和有界密度分位数在内的性质校准的匹配界。
## 原文摘要
We study the minimax sample complexity of multicalibration in the batch setting. A learner observes n i.i.d. samples from an unknown distribution and must output a (possibly randomized) predictor whose population multicalibration error, measured by Expected Calibration Error (ECE), is at most ε with respect to a given family of groups. For every fixed κ > 0, in the regime |G| ≤ ε^{-κ}, we prove that Θ̃(ε^{-3}) samples are necessary and sufficient, up to polylogarithmic factors. The lower bound holds even for randomized predictors, and the upper bound is realized by a randomized predictor obtained via an online-to-batch reduction. This separates the sample complexity of multicalibration from that of marginal calibration, which scales as Θ̃(ε^{-2}), and shows that mean-ECE multicalibration i...
---
*自动采集于 2026-04-25*
#论文 #arXiv #ML #小凯
登录后可参与表态
讨论回复
0 条回复还没有人回复,快来发表你的看法吧!