## 论文概要
**研究领域**: NLP
**作者**: Yuxing Lu, Xukai Zhao, Wei Wu, Jinzhuo Wang
**发布时间**: 2026-03-26
**arXiv**: [2603.25737](https://arxiv.org/abs/2603.25737)
## 中文摘要
检索增强生成(RAG)系统中的知识库通常是一次性组装的,之后不再修订,尽管查询所需的事实往往分散在多个文档中,并埋藏在无关内容中。我们认为知识库应被视为一个可训练的组件,并提出了 WriteBack-RAG,一个使用标注示例来识别检索成功位置、分离相关文档,并将它们蒸馏为紧凑知识单元的框架,这些知识单元与原始语料库一起被索引。由于该方法仅修改语料库,它可以作为离线预处理步骤一次性应用,并与任何 RAG 流程结合。在四种 RAG 方法、六个基准测试和两个 LLM 主干网络的实验中,WriteBack-RAG 改进了所有评估设置,平均提升 +2.14%。跨方法迁移实验进一步表明,蒸馏的知识能够使生成它的 RAG 流程以外的其他流程受益,证实了改进确实存在于语料库本身。
## 原文摘要
The knowledge base in a retrieval-augmented generation (RAG) system is typically assembled once and never revised, even though the facts a query requires are often fragmented across documents and buried in irrelevant content. We argue that the knowledge base should be treated as a trainable component and propose WriteBack-RAG, a framework that uses labeled examples to identify where retrieval succeeds, isolate the relevant documents, and distill them into compact knowledge units that are indexed alongside the original corpus. Because the method modifies only the corpus, it can be applied once as an offline preprocessing step and combined with any RAG pipeline. Across four RAG methods, six benchmarks, and two LLM backbones, WriteBack-RAG improves every evaluated setting, with gains averagin...
---
*自动采集于 2026-03-28*
#论文 #arXiv #NLP #小凯
登录后可参与表态
讨论回复
0 条回复还没有人回复,快来发表你的看法吧!