<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Haystack Engineering: Context Engineering for Heterogeneous and Agentic Long-Context Evaluation</title>
<link href="https://fonts.googleapis.com/css2?family=Roboto:wght@300;400;500;700&family=Roboto+Slab:wght@400;700&display=swap" rel="stylesheet">
<link href="https://fonts.googleapis.com/icon?family=Material+Icons" rel="stylesheet">
<style>
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
body {
font-family: 'Roboto', sans-serif;
background-color: #f0f4f8;
color: #333;
line-height: 1.6;
}
.poster-container {
width: 720px;
min-height: 960px;
margin: 0 auto;
background: linear-gradient(135deg, #e6f0ff 0%, #f5f9ff 100%);
padding: 40px 30px;
position: relative;
overflow: hidden;
}
.poster-container::before {
content: "";
position: absolute;
top: 0;
left: 0;
width: 100%;
height: 100%;
background-image:
radial-gradient(circle at 10% 20%, rgba(100, 149, 237, 0.1) 0%, transparent 20%),
radial-gradient(circle at 90% 80%, rgba(65, 105, 225, 0.1) 0%, transparent 20%),
linear-gradient(45deg, rgba(100, 149, 237, 0.05) 0%, transparent 70%);
z-index: 0;
}
.grid-texture {
position: absolute;
top: 0;
left: 0;
width: 100%;
height: 100%;
background-image:
linear-gradient(rgba(255, 255, 255, 0.1) 1px, transparent 1px),
linear-gradient(90deg, rgba(255, 255, 255, 0.1) 1px, transparent 1px);
background-size: 20px 20px;
z-index: 0;
}
.content {
position: relative;
z-index: 1;
}
.header {
text-align: center;
margin-bottom: 30px;
padding-bottom: 20px;
border-bottom: 2px solid #4169e1;
}
.title {
font-family: 'Roboto Slab', serif;
font-size: 36px;
font-weight: 700;
color: #1a3a8f;
margin-bottom: 15px;
line-height: 1.2;
}
.authors {
font-size: 16px;
color: #4169e1;
margin-bottom: 10px;
}
.affiliations {
font-size: 14px;
color: #555;
margin-bottom: 10px;
}
.publication {
font-size: 14px;
color: #666;
font-style: italic;
}
.section {
background-color: rgba(255, 255, 255, 0.85);
border-radius: 12px;
padding: 20px;
margin-bottom: 25px;
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.05);
backdrop-filter: blur(5px);
}
.section-title {
font-family: 'Roboto Slab', serif;
font-size: 24px;
font-weight: 700;
color: #1a3a8f;
margin-bottom: 15px;
display: flex;
align-items: center;
}
.section-title .material-icons {
margin-right: 10px;
color: #4169e1;
}
.section-content {
font-size: 16px;
}
.highlight {
background-color: rgba(65, 105, 225, 0.1);
padding: 2px 5px;
border-radius: 4px;
font-weight: 500;
}
.bullet-list {
padding-left: 25px;
margin-bottom: 15px;
}
.bullet-list li {
margin-bottom: 8px;
}
.two-column {
display: flex;
gap: 20px;
margin-bottom: 15px;
}
.column {
flex: 1;
}
.image-container {
text-align: center;
margin: 15px 0;
}
.image-container img {
max-width: 100%;
border-radius: 8px;
box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);
}
.image-caption {
font-size: 14px;
color: #666;
margin-top: 8px;
text-align: center;
}
.finding-card {
background-color: rgba(65, 105, 225, 0.05);
border-left: 4px solid #4169e1;
padding: 12px 15px;
margin-bottom: 12px;
border-radius: 0 8px 8px 0;
}
.code-link {
display: inline-flex;
align-items: center;
background-color: #4169e1;
color: white;
padding: 8px 15px;
border-radius: 20px;
text-decoration: none;
font-weight: 500;
margin-top: 10px;
}
.code-link .material-icons {
margin-right: 5px;
font-size: 18px;
}
.footer {
text-align: center;
margin-top: 30px;
color: #666;
font-size: 14px;
}
</style>
</head>
<body>
<div class="poster-container">
<div class="grid-texture"></div>
<div class="content">
<!-- Header Section -->
<div class="header">
<h1 class="title">Haystack Engineering: Context Engineering for Heterogeneous and Agentic Long-Context Evaluation</h1>
<p class="authors">Mufei Li, Dongqi Fu, Limei Wang, Si Zhang, Hanqing Zeng, Kaan Sancak, Ruizhong Qiu, Haoyu Wang, Xiaoxin He, Xavier Bresson, Yinglong Xia, Chonglin Sun, Pan Li</p>
<p class="affiliations">Georgia Institute of Technology, Meta AI, University of Illinois Urbana-Champaign, National University of Singapore</p>
<p class="publication">arXiv:2510.07414 (October 2025)</p>
</div>
<!-- Introduction Section -->
<div class="section">
<h2 class="section-title">
<i class="material-icons">lightbulb</i>
Introduction
</h2>
<div class="section-content">
<div class="two-column">
<div class="column">
<ul class="bullet-list">
<li>Modern long-context LLMs perform well on synthetic <span class="highlight">"needle-in-a-haystack" (NIAH)</span> benchmarks</li>
<li>These tests overlook how noisy contexts arise from biased retrieval and agentic workflows</li>
<li>Need for more realistic evaluation that captures real-world factors</li>
</ul>
</div>
<div class="column">
<div class="image-container">
<img src="https://sfile.chatglm.cn/moeSlide/image/75/752c3cec.jpg" alt="Needle in a haystack visualization" width="300">
<p class="image-caption">Traditional needle-in-a-haystack evaluation</p>
</div>
</div>
</div>
</div>
</div>
<!-- Haystack Engineering Section -->
<div class="section">
<h2 class="section-title">
<i class="material-icons">architecture</i>
Haystack Engineering
</h2>
<div class="section-content">
<ul class="bullet-list">
<li>New paradigm to construct realistic noisy long contexts</li>
<li>Captures key real-world factors:
<ul class="bullet-list">
<li>Distraction from heterogeneous biased retrievers</li>
<li>Cascading errors in agentic workflows</li>
</ul>
</li>
<li>Contrast with "context engineering" (optimizing inputs for best performance)</li>
</ul>
</div>
</div>
<!-- HaystackCraft Benchmark Section -->
<div class="section">
<h2 class="section-title">
<i class="material-icons">assessment</i>
HaystackCraft Benchmark
</h2>
<div class="section-content">
<ul class="bullet-list">
<li>Built on full English Wikipedia hyperlink network</li>
<li>Features multi-hop questions</li>
<li>Extends traditional NIAH evaluations in two ways:
<ul class="bullet-list">
<li>Heterogeneous Retrieval-Dependent Haystacks</li>
<li>Dynamic, LLM-Dependent Agentic Context Engineering</li>
</ul>
</li>
</ul>
</div>
</div>
<!-- Heterogeneous Retrieval Strategies Section -->
<div class="section">
<h2 class="section-title">
<i class="material-icons">compare_arrows</i>
Heterogeneous Retrieval Strategies
</h2>
<div class="section-content">
<div class="two-column">
<div class="column">
<p>Evaluates how different retrieval strategies affect:</p>
<ul class="bullet-list">
<li>Distractor composition</li>
<li>Haystack ordering</li>
<li>LLM performance</li>
</ul>
<p>Strategies compared:</p>
<ul class="bullet-list">
<li>Sparse Retrieval (BM25)</li>
<li>Dense Retrieval (Qwen3-Embedding-0.6B)</li>
<li>Hybrid Retrieval (BM25 + Qwen3-Embedding-0.6B)</li>
<li>Graph-Based Reranking (Personalized PageRank - PPR)</li>
</ul>
</div>
<div class="column">
<div class="image-container">
<img src="https://sfile.chatglm.cn/moeSlide/image/9f/9f0f5ca8.jpg" alt="Comparison of retrieval strategies" width="300">
<p class="image-caption">Comparison of different retrieval methods</p>
</div>
</div>
</div>
</div>
</div>
<!-- Agentic Context Engineering Section -->
<div class="section">
<h2 class="section-title">
<i class="material-icons">psychology</i>
Agentic Context Engineering
</h2>
<div class="section-content">
<div class="two-column">
<div class="column">
<p>Extends NIAH to dynamic, LLM-dependent settings</p>
<p>Simulates agentic operations where models:</p>
<ul class="bullet-list">
<li>Refine queries</li>
<li>Reflect on past reasonings</li>
<li>Decide when to stop</li>
</ul>
<p>Two dynamic settings:</p>
<ul class="bullet-list">
<li>Enforced Multi-Round</li>
<li>Variable-Round</li>
</ul>
</div>
<div class="column">
<div class="image-container">
<img src="https://sfile.chatglm.cn/moeSlide/image/47/47288779.jpg" alt="Agentic workflow visualization" width="300">
<p class="image-caption">Agentic workflow with cascading errors</p>
</div>
</div>
</div>
</div>
</div>
<!-- Key Findings Section -->
<div class="section">
<h2 class="section-title">
<i class="material-icons">insights</i>
Key Findings
</h2>
<div class="section-content">
<div class="finding-card">
<p>Dense retrievers introduce more challenging distractors than sparse ones</p>
</div>
<div class="finding-card">
<p>Graph-based reranking with PPR significantly improves retrieval effectiveness</p>
</div>
<div class="finding-card">
<p>Document ordering effects are model-dependent</p>
</div>
<div class="finding-card">
<p>Even advanced models (Gemini 2.5 Pro, GPT-5) suffer from cascading self-distraction</p>
</div>
<div class="finding-card">
<p>Models are more robust to noisy long contexts ("width") than to noisy reasoning iterations ("depth")</p>
</div>
<div class="finding-card">
<p>Most models struggle with appropriate early stopping in variable-round settings</p>
</div>
</div>
</div>
<!-- Conclusion Section -->
<div class="section">
<h2 class="section-title">
<i class="material-icons">flag</i>
Conclusion
</h2>
<div class="section-content">
<ul class="bullet-list">
<li>Robust agentic long-context reasoning remains an unsolved challenge</li>
<li>HaystackCraft established as a valuable testbed for future progress</li>
</ul>
<a href="https://github.com/Graph-COM/HaystackCraft" class="code-link" target="_blank">
<i class="material-icons">code</i>
Code available at GitHub
</a>
</div>
</div>
<div class="footer">
© 2025 Haystack Engineering Research Team
</div>
</div>
</div>
</body>
</html>
登录后可参与表态
讨论回复
1 条回复
✨步子哥 (steper)
#1
12-11 08:38
登录后可参与表态