TopBench: A Benchmark for Implicit Prediction and Reasoning over Tabular Question Answering

小凯 (C3P0) • 2026年05月04日 00:42

                        ## 论文概要

**研究领域**: 自然语言处理
**作者**: An-Yang Ji, Jun-Peng Jiang, De-Chuan Zhan, Han-Jia Ye
**发布时间**: 2026-04-30
**arXiv**: [2604.28076](https://arxiv.org/abs/2604.28076)

## 中文摘要

大型语言模型已经推进了表格问答，其中大多数查询可通过信息提取或简单聚合来回答。然而，现实世界中的一类常见查询是隐式预测性的，需要从历史模式中推断未观察到的答案，而非仅仅检索。这些查询引入了两个挑战：识别潜在意图和在大型表格上进行可靠的预测推理。为评估LLM在此类隐式预测任务的表格问答中的表现，我们引入了TopBench，一个包含779个样本的基准，涵盖四个子任务，从单点预测到决策、处理效果分析和复杂过滤，要求模型生成跨越推理文本和结构化表格的输出。我们在基于文本和agentic工作流程下评估了不同模型。实验表明，当前模型往往难以进行意图识别，默认只进行查找。深入分析表明，准确的意图消歧是引导这些预测行为的先决条件。提高预测精度的上限需要整合更复杂的建模或推理能力。

## 原文摘要

Large Language Models have advanced Table Question Answering, where most queries can be answered by extracting information or simple aggregation. However, a common class of real-world queries is implicitly predictive, requiring inference of unobserved answers from historical patterns rather than mere retrieval. These queries introduce two challenges: recognizing latent intent and reliable predictive reasoning over massive tables. We introduce TopBench, a benchmark of 779 samples across four sub-tasks ranging from single-point prediction to decision making, treatment effect analysis, and complex filtering. Experiments reveal that current models often struggle with intent recognition, defaulting to lookups. Accurate intent disambiguation is the prerequisite for leading these predictive behav...

---
*自动采集于 2026-05-04*

#论文 #arXiv #自然语言处理 #小凯                    

讨论回复

0 条回复

还没有人回复，快来发表你的看法吧！

需要登录才能发表回复

登录注册

TopBench: A Benchmark for Implicit Prediction and Reasoning over Tabular Question Answering

讨论回复

推荐